Backup
Backing up Tamr instances involves setting a backup directory, and then backing up various parts of the product.
Backing up a Tamr Configuration
To back up a Tamr configuration:
- Configure a backup location (Configuring a Backup Location).
- Configure Postgres backup and restore binaries (Configuring Postgres Backup and Restore Binaries).
- Optional. Configure Elasticsearch backup (Configuring Elasticsearch Backup).
- Optional. Set additional configuration variables to be backed up (Additional Configuration Variables for Backup).
- Restart Tamr and its dependencies(Restarting).
Configuring a Backup Location
The default backup location is the local filesystem directory
${TAMR_UNIFY_HOME}/tamr/backups
.
Configuring a Filesystem Backup Location
To configure a filesystem backup location:
- Set the value of the configuration variable
TAMR_UNIFY_BACKUP_URI
to a local filesystem directory using the admin tool, see Creating or Updating a Configuration Variable.
Usage of Tamr Temporary Directory
During backup, Tamr defaults to using a temporary directory at
/tmp
. If there is insufficient disk space available in the directory, backup fails, and a new directory with sufficient disk space must be used.To configure Tamr to use an alternative backup temporary directory, set the value for the configuration variable
TAMR_UNIFY_BACKUP_HADOOP_TMP_DIR
to the full path of the new directory, e.g./data/tamr-unify-backup-tmp-dir
.
Configuring an AWS S3 Backup Location
To configure an AWS S3 backup location:
- Set each of the following configuration variables using the admin tool, see Creating or Updating a Configuration Variable.
- Restart Tamr and its dependencies(Restarting).
Configuration Variable | Example Value |
---|---|
TAMR_UNIFY_BACKUP_URI | s3a://<bucket-name>/<path-to-backup> |
TAMR_UNIFY_BACKUP_AWS_ACCESS_KEY_ID | <aws-access-key-id> |
TAMR_UNIFY_BACKUP_AWS_SECRET_ACCESS_KEY | <aws-secret-access-key> |
Configuring an HDFS Backup Location
To configure an HDFS backup location:
- Set each of the following configuration variables using the admin tool. See Creating or Updating a Configuration Variable.
- Restart Tamr and its dependencies(Restarting).
Configuration Variable | Description / Example Value |
---|---|
TAMR_UNIFY_BACKUP_URI | See TAMR_FS_URI. |
TAMR_BACKUP_FS_CONFIG_URIS | See TAMR_FS_CONFIG_URIS. |
TAMR_BACKUP_FS_EXTRA_URIS | See TAMR_FS_EXTRA_URIS. |
TAMR_BACKUP_FS_CONFIG_DIR | See TAMR_FS_CONFIG_DIR. Additionally, note that this directory must be unique. |
TAMR_BACKUP_FS_EXTRA_CONFIG | See TAMR_FS_EXTRA_CONFIG. |
TAMR_BACKUP_FS_KERBEROS_ENABLED | TAMR_BACKUP_FS_KERBEROS_ENABLED |
TAMR_KERBEROS_KEYTAB | Required if TAMR_BACKUP_FS_KERBEROS_ENABLED is set to true. See TAMR_KERBEROS_KEYTAB. |
TAMR_KERBEROS_PRINCIPAL | Required if TAMR_BACKUP_FS_KERBEROS_ENABLED is set to true. See TAMR_KERBEROS_PRINCIPAL. |
TAMR_KERBEROS_KRB5 | Required if TAMR_BACKUP_FS_KERBEROS_ENABLED is set to true. See TAMR_KERBEROS_KRB5. |
Configuring Postgres Backup and Restore Binaries
To configure Postgres backup and restore binaries:
- For each of the below configuration variables, set the configuration variable using the admin tool, see Creating or Updating a Configuration Variable.
- Restart Tamr and its dependencies(Restarting).
Configuration Variable | Example Value |
---|---|
TAMR_PG_DUMP_BINARY | /usr/pgsql-9.4/bin/pg_dump |
TAMR_PG_RESTORE_BINARY | /usr/pgsql-9.4/bin/pg_restore |
Configuring Elasticsearch Backup
TAMR_UNIFY_BACKUP_ES
Set whether the generated backup file includes a complete snapshot of all data in Tamr elasticsearch instance.
If set to
true
(default), the generated backup file includes a complete snapshot of all data in Tamr's elasticsearch instance. Upon restore, the elasticsearch instance is automatically restored from this snapshot.If set to
false
, the generated backup file does not include a snapshot of data in Tamr elasticsearch instance. Upon restore, the elasticsearch instance is not automatically restored. Restoring elasticsearch requires manual execution of elasticsearch re-indexing, which may take several hours. Please contact Tamr Support to re-index elasticsearch.
To configure elasticsearch backup:
- For each of the below configuration variables, set the configuration variable using the admin tool, see Creating or Updating a Configuration Variable.
- Restart Tamr and its dependencies(Restarting).
Configuration Variable | Example Values |
---|---|
TAMR_UNIFY_BACKUP_ES | true , false |
Additional Configuration Variables for Backup
Optionally, you can back up the following configuration variables and then use the restore procecure to apply them.
Configuration Variable | Example Value |
---|---|
TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS | ["TAMR_DEDUP_NUM_QUESTIONS", "TAMR_ES_MAX_CLAUSE_COUNT"] |
Note: The following configuration variables are always restored from the backup.
To configure additional configuration variables to back up:
- Set the value of the configuration variable
TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS
to a comma-separated list of Tamr configuration variables using the admin tool, see Creating or Updating a Configuration Variable. - Restart Tamr and its dependencies(Restarting).
${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh config:set TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS='["TAMR_DEDUP_NUM_QUESTIONS"]'
Configuration Variables Always Restored |
---|
TAMR_CATEGORIZATION_FEATURE_SCALING |
TAMR_CATEGORIZATION_GRADIENT_DESCENT_ITERATIONS |
TAMR_CATEGORIZATION_REGULARIZATION_PARAMETER |
TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_HIGH |
TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_MEDIUM |
TAMR_DELTA_CONSOLIDATION_THRESHOLD |
TAMR_ES_ENABLED |
TAMR_ES_MAX_RESULT_WINDOW |
TAMR_JOB_SPARK_DRIVER_MEM |
TAMR_JOB_SPARK_EXECUTOR_MEM |
TAMR_JOB_SPARK_EXECUTOR_CORES |
TAMR_JOB_SPARK_PROPS |
TAMR_LLM_BATCH_SIZE |
TAMR_LLM_REFRESH_INTERVAL_IN_MILLISECONDS |
TAMR_LLM_TOPK |
TAMR_PUBAPI_NAME |
TAMR_SPARK_BROADCAST_ROW_LIMIT |
TAMR_SPARK_BROADCAST_SIZE_LIMIT_BYTES |
Updated over 5 years ago