User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Backup Configuration

To back up Tamr instances, create a backup directory, and back up various parts of the product.

Preparing for a Backup

Note: Server snapshots are not a replacement for Tamr application backups. Therefore, do not take server snapshots with the intention of using them as Tamr backups. Server snapshots do not provide the correct backups of Tamr configuration. Additionally, if Tamr is running, taking a server snapshot can lead to a corrupt HBase configuration if you later attempt to restore from the snapshot. Instead, take Tamr application backups before introducing any changes.

To store backups, Tamr uses the local filesystem directory by default, ${TAMR_UNIFY_HOME}/tamr/backups.

Before you run a backup procedure, you need to decide where to store backup files, and also decide which components you are going to back up. A backup location in Tamr can be a location on the local filesystem, GCP, AWS S3, or HDFS. Tamr recommends using a distributed filesystem instead of the local filesystem for storing the backup files. In this way, you will not need to manually copy the backup files to the destination server on which you restore from a backup.

You can configure Tamr to store the backup in a local filesystem, GCS, AWS S3, or HDFS. For information about configuring a backup location, see the following topics in this section:

Additionally, you can configure backups for PostgreSQL, Elasticsearch and some configuration variables used in Tamr. See these topics in this section:

Configuring a Filesystem Backup Location

To configure a local filesystem backup location:

Set the value of the configuration variable TAMR_UNIFY_BACKUP_URI to a local filesystem directory using the Tamr administration utility. See Creating or Updating a Configuration Variable.

Configuring a Google Cloud Storage (GCS) Backup Location

To configure a GCS backup location:

  1. Set TAMR_UNIFY_BACKUP_URI to the path to the backup and restore directory in this format: gs://<bucket>/<path/to/backup>, such as: gs://backup-bucket/backup1.
  2. Set TAMR_GOOGLE_APPLICATION_CREDENTIALS to an absolute local path to the service account credentials JSON file, such as: /tmp/gcs/creds.json. For more information, see Creating or Updating a Configuration Variable.
  3. Restart Tamr and its dependencies. See Restart Tamr and its dependencies.

Configuring an AWS S3 Backup Location

To configure an AWS S3 backup location:

  1. Set TAMR_UNIFY_BACKUP_URI to s3a://<bucket-name>/<path-to-backup>, TAMR_UNIFY_BACKUP_AWS_ACCESS_KEY_ID to <aws-access-key-id>, and TAMR_UNIFY_BACKUP_AWS_SECRET_ACCESS_KEY to <aws-secret-access-key>. See Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies. See Restart Tamr and its dependencies.

Configuring an HDFS Backup Location

To configure an HDFS backup location:

  1. Configure the following configuration variables using the administration utility: TAMR_UNIFY_BACKUP_URI, TAMR_BACKUP_FS_CONFIG_URIS, TAMR_BACKUP_FS_EXTRA_URIS, AMR_BACKUP_FS_CONFIG_DIR, TAMR_BACKUP_FS_EXTRA_CONFIG, and TAMR_BACKUP_FS_KERBEROS_ENABLED.
    If TAMR_BACKUP_FS_KERBEROS_ENABLED is set to true, then also configure TAMR_KERBEROS_KEYTAB, TAMR_KERBEROS_PRINCIPAL, and TAMR_KERBEROS_KRB5. For more information, see HDFS and Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies. See Restart Tamr and its dependencies.

Configuring Postgres Backup and Restore Binaries

To configure Postgres backup and restore binaries:

  1. Set TAMR_PG_DUMP_BINARY to /usr/pgsql-12/bin/pg_dump and TAMR_PG_RESTORE_BINARY to /usr/pgsql-12/bin/pg_restore. See Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies. See Restart Tamr and its dependencies.

Configuring Elasticsearch Backup

To configure Elasticsearch backup:

  1. Configure the TAMR_UNIFY_BACKUP_ES configuration variable using the Tamr administration utility. TAMR_UNIFY_BACKUP_ES specifies whether the generated backup file includes a complete snapshot of all data in the Tamr Elasticsearch instance.
  • If set to true (default), the generated backup file includes a complete snapshot of all data in Tamr Elasticsearch instance. Upon restore, the Elasticsearch instance is automatically restored from this snapshot.
  • If set to false, the generated backup file does not include a snapshot of data in Tamr Elasticsearch instance. Upon restore, the Elasticsearch instance is not automatically restored. Restoring Elasticsearch requires running the re-indexing process, which may take several hours. Contact Tamr Support to re-index Elasticsearch. See Creating or Updating a Configuration Variable.
  1. Restart Tamr and its dependencies. See Restart Tamr and its dependencies.

Configuring Additional Configuration Variables for Backup

Optionally, you can back up additional configuration variables and then apply them using the Tamr restore process.

Note: Contact Tamr Support if you are not sure whether you need to back up any additional configuration variables.

To configure additional configuration variables for backup:

  1. Set the value of the configuration variable TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS to a comma-separated list of Tamr configuration variables that you want to back up using the administration utility. For example, TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS can be set to ["TAMR_DEDUP_NUM_QUESTIONS", "TAMR_ES_MAX_CLAUSE_COUNT"]. See Creating or Updating a Configuration Variable.
  2. Restart Tamr and its dependencies. See Restart Tamr and its dependencies.
${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh 
config:set 
TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS='["TAMR_DEDUP_NUM_QUESTIONS"]'

Note: The following configuration variables are always restored from the backup. This list may not be complete or up-to-date.

Configuration Variables That Are Always Restored

The following variables are restored by default and you don't need to manually configure their backups.

TAMR_CATEGORIZATION_FEATURE_SCALING, TAMR_CATEGORIZATION_GRADIENT_DESCENT_ITERATIONS, TAMR_CATEGORIZATION_REGULARIZATION_PARAMETER, TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_HIGH, TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_MEDIUM, TAMR_DELTA_CONSOLIDATION_THRESHOLD, TAMR_ES_ENABLED, TAMR_ES_MAX_RESULT_WINDOW, TAMR_JOB_SPARK_DRIVER_MEM, TAMR_JOB_SPARK_EXECUTOR_MEM, TAMR_JOB_SPARK_EXECUTOR_CORES, TAMR_JOB_SPARK_PROPS, TAMR_LLM_BATCH_SIZE, TAMR_LLM_REFRESH_INTERVAL_IN_MILLISECONDS, TAMR_LLM_TOPK, TAMR_PUBAPI_NAME,
TAMR_SPARK_BROADCAST_ROW_LIMIT, TAMR_SPARK_BROADCAST_SIZE_LIMIT_BYTES.