Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →A PostgreSQL Interface for HDFS Data
Use the Remoting features of the HDFS JDBC Driver to create a PostgreSQL entry-point for data access.
There are a vast number of PostgreSQL clients available on the Internet. From standard Drivers to BI and Analytics tools, PostgreSQL is a popular interface for data access. Using our JDBC Drivers, you can now create PostgreSQL entry-points that you can connect to from any standard client.
To access HDFS data as a PostgreSQL database, use the CData JDBC Driver for HDFS and a JDBC foreign data wrapper (FDW). In this article, we compile the FDW, install it, and query HDFS data from PostgreSQL Server.
Connect to HDFS Data as a JDBC Data Source
To connect to HDFS as a JDBC data source, you will need the following:
- Driver JAR path: The JAR is located in the lib subfolder of the installation directory.
Driver class:
cdata.jdbc.hdfs.HDFSDriver
- JDBC URL:
The URL must start with "jdbc:hdfs:" and can include any of the connection properties in name-value pairs separated with semicolons.
In order to authenticate, set the following connection properties:
- Host: Set this value to the host of your HDFS installation.
- Port: Set this value to the port of your HDFS installation. Default port: 50070
Built-in Connection String Designer
For assistance in constructing the JDBC URL, use the connection string designer built into the HDFS JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.
java -jar cdata.jdbc.hdfs.jar
Fill in the connection properties and copy the connection string to the clipboard.
A typical JDBC URL is below:
jdbc:hdfs:Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;User=root;
Build the JDBC Foreign Data Wrapper
The Foreign Data Wrapper can be installed as an extension to PostgreSQL, without recompiling PostgreSQL. The jdbc2_fdw extension is used as an example (downloadable here).
- Add a symlink from the shared object for your version of the JRE to /usr/lib/libjvm.so. For example:
ln -s /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server/libjvm.so /usr/lib/libjvm.so
- Start the build:
make install USE_PGXS=1
Query HDFS Data as a PostgreSQL Database
After you have installed the extension, follow the steps below to start executing queries to HDFS data:
- Log into your database.
-
Load the extension for the database:
CREATE EXTENSION jdbc2_fdw;
-
Create a server object for HDFS:
CREATE SERVER HDFS FOREIGN DATA WRAPPER jdbc2_fdw OPTIONS ( drivername 'cdata.jdbc.hdfs.HDFSDriver', url 'jdbc:hdfs:Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;User=root;', querytimeout '15', jarfile '/home/MyUser/CData/CData\ JDBC\ Driver\ for\ Salesforce MyDriverEdition/lib/cdata.jdbc.hdfs.jar');
-
Create a user mapping for the username and password of a user known to the MySQL daemon.
CREATE USER MAPPING for postgres SERVER HDFS OPTIONS ( username 'admin', password 'test');
-
Create a foreign table in your local database:
postgres=# CREATE FOREIGN TABLE files ( files_id text, files_FileId text, files_ChildrenNum numeric) SERVER HDFS OPTIONS ( table_name 'files');
postgres=# SELECT * FROM files;