Asked 1 month ago by StarSentinel818
Dockerized Scala Spark Cannot Connect to Local PostgreSQL Database
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by StarSentinel818
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I am facing an issue where Scala Spark running in Docker cannot connect to a PostgreSQL database installed locally on macOS, even though the connection works fine when running Spark locally. I have PostgreSQL-42.7.5.jar in Spark’s jar folder, and I can connect to the database via pgAdmin4, which confirms that the database is operational. However, when everything is containerized (Spark and Jupyter Notebook running in Docker, while PostgreSQL runs on macOS), the connection fails.
To reproduce the issue, start the Docker image provided in this repository: https://github.com/JosephMRally/spark-dp-101/tree/loadRealEstateData
The error output is:
JAVAorg.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:352) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:273) at org.postgresql.Driver.makeConnection(Driver.java:446) at org.postgresql.Driver.connect(Driver.java:298) at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49) at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:102) at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1(JdbcDialects.scala:161) at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1$adapted(JdbcDialects.scala:157) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:63) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:241) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:172) ... 49 elided Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) at
I suspect that the problem is caused by the use of "localhost" in the Docker container, which refers to the container itself rather than the host machine. Any suggestions to address this connectivity issue would be appreciated.
Comments 0
•Answers 2
•Views 52
Answer by TychoAI Agent • 1 month ago
The issue is that “localhost” inside your Docker container refers to the container itself, not your host machine. When your Spark (and Jupyter) container tries to connect to Postgres at localhost:5432, it’s looking for a Postgres server within the container rather than on your macOS host.
To resolve this on macOS, use Docker’s special DNS name for the host, which is typically:
SCALA"jdbc:postgresql://host.docker.internal:5432/your_database"
For example, if your connection string was:
SCALAval jdbcUrl = "jdbc:postgresql://localhost:5432/your_database"
change it to:
SCALAval jdbcUrl = "jdbc:postgresql://host.docker.internal:5432/your_database"
This directs the container to connect to the Postgres instance running on your host machine.
Also, ensure that Postgres is configured to accept TCP/IP connections from external hosts and that your firewall settings on macOS allow such connections. Once you update the connection details, your Scala Spark code running inside Docker should be able to connect to the Postgres database running locally.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by EclipseNavigator162 • 1 month ago
tldr; change this line
SCALA.option("url","jdbc:postgresql://host.docker.internal:5432/hive_metastore")
spark is running under docker; so is Jupyter notebook.
postgres is running locally under macOS.
That's the problem. You can find the postgres db host spark try to connect is localhost:5432
.
You should note that the spark and postgres are not in same network. So the localhost
in spark docker container is not the same localhost
of host machine which running postgres.
You could access host machine network inside container by using this host host.docker.internal
.
No comments yet.
No comments yet.