HDFS
Configure Tamr to use HDFS as its primary storage.
Single-node Tamr deployments use the local filesystem as primary storage by default. Use the following procedure to configure a single-node Tamr deployment that uses an external HDFS cluster in place of writing to the local filesystem.
Checklist before proceeding:
- HDFS configuration files.
- The core Hadoop configuration files, such as
core-site.xml
,hdfs-site.xml
. - Any additional files referenced by the core Hadoop files, such as
.xsl
,.sh
, etc.
- The core Hadoop configuration files, such as
- A readable/writable space in HDFS.
- The Kerberos keytab file and principal.
- Required only if HDFS uses Kerberos for authentication.
- Principal user must have read/write access.
Configuring Tamr to use HDFS as Primary Storage
To configure Tamr to use HDFS as primary storage:
- Set each of the configuration variables using the admin tool, see Creating or Updating a Configuration Variable.
- Restart Tamr and its dependencies(Restarting).
TAMR_UNIFY_DATA_DIR
Configuration Variable | Example Value |
---|---|
TAMR_UNIFY_DATA_DIR | hdfs://nameservice/tamr/unify-data |
A readable/writable path in HDFS where Tamr will read/write data.
TAMR_FS_URI
Configuration Variable | Example Value |
---|---|
TAMR_FS_URI | hdfs://nameservice |
Set this to the value of fs.defaultFS
from the configuration files. If fs.defaultFS
is not defined, pick an appropriate nameservice from the configuration files and set fs.defaultFS
with TAMR_FS_EXTRA_CONFIG
.
TAMR_FS_CONFIG_URIS
Configuration Variable | Example Value |
---|---|
TAMR_FS_CONFIG_URIS | file:///path/to/core-site.xml;file:///path/to/hdfs-site.xml |
A semi-colon separated list of the URIs of the core hadoop configuration files, such as core-site.xml
, hdfs-site.xml
.
Supported URI schemes are file
, http
, and zk
(zooKeeper).
TAMR_FS_EXTRA_URIS
Configuration Variable | Example Value |
---|---|
TAMR_FS_EXTRA_URIS | zk://localhost:21281/hdfs/config/hadoop-env.sh;zk://localhost:21281/hdfs/config/configuration.xsl |
A semi-colon separated list of the URIs for the the non-xml configuration files.
Supported URI schemes are file
, http
, and zk
(zooKeeper).
TAMR_FS_CONFIG_DIR
Configuration Variable | Example Value |
---|---|
TAMR_FS_CONFIG_DIR | /etc/hadoop/conf/ |
A directory to store the configuration files. If the configuration files already exist on the filesystem, you can set this to the path that already contains the files to avoid caching them elsewhere.
This is typically not required to be set.
TAMR_FS_EXTRA_CONFIG
Configuration Variable | Example Value |
---|---|
TAMR_FS_EXTRA_CONFIG | {‘fs.defaultFS’: hdfs://nameservice} |
Dictionary of key:value
pairs. If fs.defaultFS
is not defined in the configuration files, you can set a nameservice here.
This is typically not required to be set.
TAMR_FS_KERBEROS_ENABLED
Configuration Variable | Example Value |
---|---|
TAMR_FS_KERBEROS_ENABLED | true or false |
TAMR_KERBEROS_KEYTAB
Configuration Variable | Example Value |
---|---|
TAMR_KERBEROS_KEYTAB | /path/to/user.keytab |
Path to a Kerberos keytab file. Required when the HDFS configuration uses Kerberos for authentiction and TAMR_FS_KERBEROS_ENABLED
is set to true.
TAMR_KERBEROS_KRB5
Configuration Variable | Example Value |
---|---|
TAMR_KERBEROS_KRB5 | /path/to/krb5.conf |
Path to a Kerberos krb5.conf file. Required when the HDFS configuration uses Kerberos for authentiction and TAMR_FS_KERBEROS_ENABLED
is set to true.
TAMR_KERBEROS_PRINCIPAL
Configuration Variable | Example Value |
---|---|
TAMR_KERBEROS_PRINCIPAL | primary/instance@REALM |
The principal to use in the keytab file. Use the klist
command to inspect the keytab file to confirm the principal.
klist -k <path-to-keytab>
Required when HDFS is authenticated with Kerberos and TAMR_FS_KERBEROS_ENABLED
is set to true.
Updated over 5 years ago