User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

HDFS

Configure Tamr to use HDFS as its primary storage.

Single-node Tamr deployments use the local filesystem as primary storage by default. Use the following procedure to configure a single-node Tamr deployment that uses an external HDFS cluster in place of writing to the local filesystem.

Checklist before proceeding:

  • HDFS configuration files.
    • Optional. The core Hadoop configuration files, such as core-site.xml, hdfs-site.xml.
    • Optional. Any additional files referenced by the core Hadoop files, such as .xsl, .sh, etc.
  • A readable/writable space in HDFS.
  • The Kerberos keytab file and principal.
    • Required only if HDFS uses Kerberos for authentication.
    • Principal user must have read/write access.

Configuring Tamr to use HDFS as Primary Storage

To configure Tamr to use HDFS as primary storage:

  1. Set each of the configuration variables using the administrative utility. See Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies. See Restarting.

TAMR_UNIFY_DATA_DIR

Configuration VariableExample Value
TAMR_UNIFY_DATA_DIRhdfs://nameservice/tamr/unify-data

A readable/writable path in HDFS where Tamr will read/write data.

TAMR_FS_URI

Configuration VariableExample Value
TAMR_FS_URIhdfs://nameservice

Setting a value for this variable is optional.
If you'd like to set it, you can set this to the value of fs.defaultFS from the configuration files. If fs.defaultFS is not defined, pick an appropriate nameservice from the configuration files and set fs.defaultFS with TAMR_FS_EXTRA_CONFIG.

TAMR_FS_CONFIG_URIS

Configuration VariableExample Value
TAMR_FS_CONFIG_URISfile:///path/to/core-site.xml;file:///path/to/hdfs-site.xml

Setting this variable is optional. If you'd like to set it, you can create a semi-colon separated list of the URIs of the core hadoop configuration files, such as core-site.xml, hdfs-site.xml.

Supported URI schemes are file, http, and zk (zooKeeper).

TAMR_FS_EXTRA_URIS

Configuration VariableExample Value
TAMR_FS_EXTRA_URISzk://localhost:21281/hdfs/config/hadoop-env.sh;zk://localhost:21281/hdfs/config/configuration.xsl

A semi-colon separated list of the URIs for the the non-xml configuration files.

Supported URI schemes are file, http, and zk (zooKeeper).

TAMR_FS_CONFIG_DIR

Configuration VariableExample Value
TAMR_FS_CONFIG_DIR/etc/hadoop/conf/

A directory to store the configuration files. If the configuration files already exist on the filesystem, you can set this to the path that already contains the files to avoid caching them elsewhere.

This is typically not required to be set because by default Tamr uses HADOOP Home Directory for its configuration.

TAMR_FS_EXTRA_CONFIG

Configuration VariableExample Value
TAMR_FS_EXTRA_CONFIG{‘fs.defaultFS’: hdfs://nameservice}

Dictionary of key:value pairs. If fs.defaultFS is not defined in the configuration files, you can set a nameservice here. This is typically not required to be set.

TAMR_FS_KERBEROS_ENABLED

Configuration VariableExample Value
TAMR_FS_KERBEROS_ENABLEDtrue or false

TAMR_KERBEROS_KEYTAB

Configuration VariableExample Value
TAMR_KERBEROS_KEYTAB/path/to/user.keytab

Path to a Kerberos keytab file. Required when the HDFS configuration uses Kerberos for authentiction and TAMR_FS_KERBEROS_ENABLED is set to true.

TAMR_KERBEROS_KRB5

Configuration VariableExample Value
TAMR_KERBEROS_KRB5/path/to/krb5.conf

Path to a Kerberos krb5.conf file. Required when the HDFS configuration uses Kerberos for authentiction and TAMR_FS_KERBEROS_ENABLED is set to true.

TAMR_KERBEROS_PRINCIPAL

Configuration VariableExample Value
TAMR_KERBEROS_PRINCIPALprimary/instance@REALM

The principal to use in the keytab file. Use the klist command to inspect the keytab file to confirm the principal.

klist -k <path-to-keytab>

You must set this variable if HDFS is authenticated with Kerberos and TAMR_FS_KERBEROS_ENABLED is set to true.