User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides


Configure Tamr to use HDFS as its primary storage.

Single-node Tamr deployments use the local filesystem as primary storage by default. Use the following procedure to configure a single-node Tamr deployment that uses an external HDFS cluster in place of writing to the local filesystem.

Checklist before proceeding:

  • HDFS configuration files.
    • Optional. The core Hadoop configuration files, such as core-site.xml, hdfs-site.xml.
    • Optional. Any additional files referenced by the core Hadoop files, such as .xsl, .sh, etc.
  • A readable/writable space in HDFS.
  • The Kerberos keytab file and principal.
    • Required only if HDFS uses Kerberos for authentication.
    • Principal user must have read/write access.

Configuring Tamr to use HDFS as Primary Storage

To configure Tamr to use HDFS as primary storage:

  1. Set each of the configuration variables using the administrative utility. See Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies. See Restarting.


Configuration VariableExample Value

A readable/writable path in HDFS where Tamr will read/write data.


Configuration VariableExample Value

Setting a value for this variable is optional.
If you'd like to set it, you can set this to the value of fs.defaultFS from the configuration files. If fs.defaultFS is not defined, pick an appropriate nameservice from the configuration files and set fs.defaultFS with TAMR_FS_EXTRA_CONFIG.


Configuration VariableExample Value

Setting this variable is optional. If you'd like to set it, you can create a semi-colon separated list of the URIs of the core hadoop configuration files, such as core-site.xml, hdfs-site.xml.

Supported URI schemes are file, http, and zk (zooKeeper).


Configuration VariableExample Value

A semi-colon separated list of the URIs for the the non-xml configuration files.

Supported URI schemes are file, http, and zk (zooKeeper).


Configuration VariableExample Value

A directory to store the configuration files. If the configuration files already exist on the filesystem, you can set this to the path that already contains the files to avoid caching them elsewhere.

This is typically not required to be set because by default Tamr uses HADOOP Home Directory for its configuration.


Configuration VariableExample Value
TAMR_FS_EXTRA_CONFIG{‘fs.defaultFS’: hdfs://nameservice}

Dictionary of key:value pairs. If fs.defaultFS is not defined in the configuration files, you can set a nameservice here. This is typically not required to be set.


Configuration VariableExample Value


Configuration VariableExample Value

Path to a Kerberos keytab file. Required when the HDFS configuration uses Kerberos for authentiction and TAMR_FS_KERBEROS_ENABLED is set to true.


Configuration VariableExample Value

Path to a Kerberos krb5.conf file. Required when the HDFS configuration uses Kerberos for authentiction and TAMR_FS_KERBEROS_ENABLED is set to true.


Configuration VariableExample Value

The principal to use in the keytab file. Use the klist command to inspect the keytab file to confirm the principal.

klist -k <path-to-keytab>

You must set this variable if HDFS is authenticated with Kerberos and TAMR_FS_KERBEROS_ENABLED is set to true.