User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Backup

Backing up Tamr instances involves setting a backup directory, and then backing up various parts of the product.

Backing up a Tamr Configuration

To back up a Tamr configuration:

  1. Configure a backup location (Configuring a Backup Location).
  2. Configure Postgres backup and restore binaries (Configuring Postgres Backup and Restore Binaries).
  3. Optional. Configure Elasticsearch backup (Configuring Elasticsearch Backup).
  4. Optional. Set additional configuration variables to be backed up (Additional Configuration Variables for Backup).
  5. Restart Tamr and its dependencies(Restarting).

Configuring a Backup Location

📘

The default backup location is the local filesystem directory ${TAMR_UNIFY_HOME}/tamr/backups.

Configuring a Filesystem Backup Location

To configure a filesystem backup location:

  1. Set the value of the configuration variable TAMR_UNIFY_BACKUP_URI to a local filesystem directory using the admin tool, see Creating or Updating a Configuration Variable.

🚧

Usage of Tamr Temporary Directory

During backup, Tamr defaults to using a temporary directory at /tmp. If there is insufficient disk space available in the directory, backup fails, and a new directory with sufficient disk space must be used.

To configure Tamr to use an alternative backup temporary directory, set the value for the configuration variable TAMR_UNIFY_BACKUP_HADOOP_TMP_DIR to the full path of the new directory, e.g. /data/tamr-unify-backup-tmp-dir.

Configuring an AWS S3 Backup Location

To configure an AWS S3 backup location:

  1. Set each of the following configuration variables using the admin tool, see Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies(Restarting).
Configuration VariableExample Value
TAMR_UNIFY_BACKUP_URIs3a://<bucket-name>/<path-to-backup>
TAMR_UNIFY_BACKUP_AWS_ACCESS_KEY_ID<aws-access-key-id>
TAMR_UNIFY_BACKUP_AWS_SECRET_ACCESS_KEY<aws-secret-access-key>

Configuring an HDFS Backup Location

To configure an HDFS backup location:

  1. Set each of the following configuration variables using the admin tool. See Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies(Restarting).
Configuration VariableDescription / Example Value
TAMR_UNIFY_BACKUP_URISee TAMR_FS_URI.
TAMR_BACKUP_FS_CONFIG_URISSee TAMR_FS_CONFIG_URIS.
TAMR_BACKUP_FS_EXTRA_URISSee TAMR_FS_EXTRA_URIS.
TAMR_BACKUP_FS_CONFIG_DIRSee TAMR_FS_CONFIG_DIR. Additionally, note that this directory must be unique.
TAMR_BACKUP_FS_EXTRA_CONFIGSee TAMR_FS_EXTRA_CONFIG.
TAMR_BACKUP_FS_KERBEROS_ENABLEDTAMR_BACKUP_FS_KERBEROS_ENABLED
TAMR_KERBEROS_KEYTABRequired if TAMR_BACKUP_FS_KERBEROS_ENABLED is set to true. See TAMR_KERBEROS_KEYTAB.
TAMR_KERBEROS_PRINCIPALRequired if TAMR_BACKUP_FS_KERBEROS_ENABLED is set to true. See TAMR_KERBEROS_PRINCIPAL.
TAMR_KERBEROS_KRB5Required if TAMR_BACKUP_FS_KERBEROS_ENABLED is set to true. See TAMR_KERBEROS_KRB5.

Configuring Postgres Backup and Restore Binaries

To configure Postgres backup and restore binaries:

  1. For each of the below configuration variables, set the configuration variable using the admin tool, see Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies(Restarting).
Configuration VariableExample Value
TAMR_PG_DUMP_BINARY/usr/pgsql-9.4/bin/pg_dump
TAMR_PG_RESTORE_BINARY/usr/pgsql-9.4/bin/pg_restore

Configuring Elasticsearch Backup

📘

TAMR_UNIFY_BACKUP_ES

Set whether the generated backup file includes a complete snapshot of all data in Tamr elasticsearch instance.

If set to true (default), the generated backup file includes a complete snapshot of all data in Tamr's elasticsearch instance. Upon restore, the elasticsearch instance is automatically restored from this snapshot.

If set to false, the generated backup file does not include a snapshot of data in Tamr elasticsearch instance. Upon restore, the elasticsearch instance is not automatically restored. Restoring elasticsearch requires manual execution of elasticsearch re-indexing, which may take several hours. Please contact Tamr Support to re-index elasticsearch.

To configure elasticsearch backup:

  1. For each of the below configuration variables, set the configuration variable using the admin tool, see Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies(Restarting).
Configuration VariableExample Values
TAMR_UNIFY_BACKUP_EStrue, false

Additional Configuration Variables for Backup

Optionally, you can back up the following configuration variables and then use the restore procecure to apply them.

Configuration VariableExample Value
TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS["TAMR_DEDUP_NUM_QUESTIONS", "TAMR_ES_MAX_CLAUSE_COUNT"]

Note: The following configuration variables are always restored from the backup.

To configure additional configuration variables to back up:

  1. Set the value of the configuration variable TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS to a comma-separated list of Tamr configuration variables using the admin tool, see Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies(Restarting).
${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh config:set TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS='["TAMR_DEDUP_NUM_QUESTIONS"]'
Configuration Variables Always Restored
TAMR_CATEGORIZATION_FEATURE_SCALING
TAMR_CATEGORIZATION_GRADIENT_DESCENT_ITERATIONS
TAMR_CATEGORIZATION_REGULARIZATION_PARAMETER
TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_HIGH
TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_MEDIUM
TAMR_DELTA_CONSOLIDATION_THRESHOLD
TAMR_ES_ENABLED
TAMR_ES_MAX_RESULT_WINDOW
TAMR_JOB_SPARK_DRIVER_MEM
TAMR_JOB_SPARK_EXECUTOR_MEM
TAMR_JOB_SPARK_EXECUTOR_CORES
TAMR_JOB_SPARK_PROPS
TAMR_LLM_BATCH_SIZE
TAMR_LLM_REFRESH_INTERVAL_IN_MILLISECONDS
TAMR_LLM_TOPK
TAMR_PUBAPI_NAME
TAMR_SPARK_BROADCAST_ROW_LIMIT
TAMR_SPARK_BROADCAST_SIZE_LIMIT_BYTES