Configuring HDFS
Configure Tamr Core to use HDFS as its filesystem.
Single-node Tamr deployments use the local filesystem by default. You can configure Tamr Core to use an external Hadoop Distributed File System (HDFS) cluster instead of writing to the local filesystem.
Important: When you use the Connect to Source option to add a data file stored in an HDFS cluster to Tamr Core, the file must already include a primary key column. This is different from adding a file with the Upload File option. The Upload File option allows you to select the column with the primary key or to specify No Primary Key, which tells Tamr Core to create a primary key on import. See Uploading a Dataset into a Project.
Configuring Tamr Core to Use HDFS as Its Filesystem
Before you begin:
- Obtain HDFS configuration files.
- (Optional) Obtain the core Hadoop configuration files, such as
core-site.xml
,hdfs-site.xml
. - (Optional) Obtain any additional files referenced by the core Hadoop files, such as
.xsl
,.sh
, and so on.
- (Optional) Obtain the core Hadoop configuration files, such as
- Verify that you have a readable/writable space in HDFS.
- If HDFS uses Kerberos for authentication, obtain the Kerberos keytab file and principal.
- The principal user must have read/write access.
To configure Tamr Core to use HDFS as its filesystem:
- Set each of the configuration variables listed below using the administrative utility. See Creating or Updating a Configuration Variable.
- Restart Tamr Core and its dependencies. See Restarting Tamr Core.
Configuration Variable | Example and Description |
---|---|
TAMR_UNIFY_DATA_DIR |
A readable/writable path in HDFS where Tamr Core will read/write data. |
TAMR_FS_URI |
Primary filesystem URI. Set to the root of the filesystem. Examples: You can set this variable to the value of If |
TAMR_FS_CONFIG_URIS |
You can create a semicolon-separated list of the URIs of the core Hadoop configuration files, such as Supported URI schemes are |
TAMR_FS_EXTRA_URIS |
A semicolon-separated list of the URIs for the the non-xml configuration files. Supported URI schemes are |
TAMR_FS_CONFIG_DIR |
A directory to store the configuration files specified by This is typically not required to be set because by default Tamr Core uses Hadoop Home Directory for its configuration. |
TAMR_FS_EXTRA_CONFIG |
Dictionary of |
TAMR_FS_KERBEROS_ENABLED | true or false Enables Kerberos for authentication. |
TAMR_KERBEROS_KEYTAB |
Required when the HDFS configuration uses Kerberos for authentication and Path to a Kerberos keytab file. |
TAMR_KERBEROS_KRB5 |
Required when the HDFS configuration uses Kerberos for authentication and Path to a Kerberos krb5.conf file. |
TAMR_KERBEROS_PRINCIPAL |
Required when the HDFS configuration uses Kerberos for authentication and The principal to use in the keytab file. Use the
|
Updated 6 months ago