Backup Configuration
To back up Tamr instances, create a backup directory, and back up various parts of the product.
Preparing for a Backup
Note: Server snapshots are not a replacement for Tamr application backups. Therefore, do not take server snapshots with the intention of using them as Tamr backups. Server snapshots do not provide the correct backups of Tamr configuration. Additionally, if Tamr is running, taking a server snapshot can lead to a corrupt HBase configuration if you later attempt to restore from the snapshot. Instead, take Tamr application backups before introducing any changes.
To store backups, Tamr uses the local filesystem directory by default, ${TAMR_UNIFY_HOME}/tamr/backups
.
Before you run a backup procedure, you need to decide where to store backup files, and also decide which components you are going to back up. A backup location in Tamr can be a location on the local filesystem, GCP, AWS S3, or HDFS. Tamr recommends using a distributed filesystem instead of the local filesystem for storing the backup files. In this way, you will not need to manually copy the backup files to the destination server on which you restore from a backup.
You can configure Tamr to store the backup in a local filesystem, GCS, AWS S3, or HDFS. For information about configuring a backup location, see the following topics in this section:
- Configuring a Filesystem Backup Location
- Configuring a GCS Backup Location
- Configuring an AWS S3 Backup Location
- Configuring an HDFS Backup Location
Additionally, you can configure backups for PostgreSQL, Elasticsearch and some configuration variables used in Tamr. See these topics in this section:
- Configuring Postgres Backup and Restore Binaries.
- Optional. Configuring Elasticsearch Backup.
- Optional. Configuring Additional Configuration Variables for Backup.
Configuring a Filesystem Backup Location
To configure a local filesystem backup location:
Set the value of the configuration variable TAMR_UNIFY_BACKUP_URI
to a local filesystem directory using the Tamr administration utility. See Creating or Updating a Configuration Variable.
Configuring a Google Cloud Storage (GCS) Backup Location
To configure a GCS backup location:
- Set
TAMR_UNIFY_BACKUP_URI
to the path to the backup and restore directory in this format:gs://<bucket>/<path/to/backup>
, such as:gs://backup-bucket/backup1
. - Set
TAMR_GOOGLE_APPLICATION_CREDENTIALS
to an absolute local path to the service account credentials JSON file, such as:/tmp/gcs/creds.json
. For more information, see Creating or Updating a Configuration Variable. - Restart Tamr and its dependencies. See Restart Tamr and its dependencies.
Configuring an AWS S3 Backup Location
To configure an AWS S3 backup location:
- Set
TAMR_UNIFY_BACKUP_URI
tos3a://<bucket-name>/<path-to-backup>
,TAMR_UNIFY_BACKUP_AWS_ACCESS_KEY_ID
to<aws-access-key-id>
, andTAMR_UNIFY_BACKUP_AWS_SECRET_ACCESS_KEY
to<aws-secret-access-key>
. See Creating or Updating a Configuration Variable. - Restart Tamr and its dependencies. See Restart Tamr and its dependencies.
Configuring an HDFS Backup Location
To configure an HDFS backup location:
- Configure the following configuration variables using the administration utility:
TAMR_UNIFY_BACKUP_URI
,TAMR_BACKUP_FS_CONFIG_URIS
,TAMR_BACKUP_FS_EXTRA_URIS
,AMR_BACKUP_FS_CONFIG_DIR
,TAMR_BACKUP_FS_EXTRA_CONFIG
, andTAMR_BACKUP_FS_KERBEROS_ENABLED
.
IfTAMR_BACKUP_FS_KERBEROS_ENABLED
is set totrue
, then also configureTAMR_KERBEROS_KEYTAB
,TAMR_KERBEROS_PRINCIPAL
, andTAMR_KERBEROS_KRB5
. For more information, see HDFS and Creating or Updating a Configuration Variable. - Restart Tamr and its dependencies. See Restart Tamr and its dependencies.
Configuring Postgres Backup and Restore Binaries
To configure Postgres backup and restore binaries:
- Set
TAMR_PG_DUMP_BINARY
to/usr/pgsql-12/bin/pg_dump
andTAMR_PG_RESTORE_BINARY
to/usr/pgsql-12/bin/pg_restore
. See Creating or Updating a Configuration Variable. - Restart Tamr and its dependencies. See Restart Tamr and its dependencies.
Configuring Elasticsearch Backup
To configure Elasticsearch backup:
- Configure the
TAMR_UNIFY_BACKUP_ES
configuration variable using the Tamr administration utility.TAMR_UNIFY_BACKUP_ES
specifies whether the generated backup file includes a complete snapshot of all data in the Tamr Elasticsearch instance.
- If set to
true
(default), the generated backup file includes a complete snapshot of all data in Tamr Elasticsearch instance. Upon restore, the Elasticsearch instance is automatically restored from this snapshot. - If set to
false
, the generated backup file does not include a snapshot of data in Tamr Elasticsearch instance. Upon restore, the Elasticsearch instance is not automatically restored. Restoring Elasticsearch requires running the re-indexing process, which may take several hours. Contact Tamr Support to re-index Elasticsearch. See Creating or Updating a Configuration Variable.
- Restart Tamr and its dependencies. See Restart Tamr and its dependencies.
Configuring Additional Configuration Variables for Backup
Optionally, you can back up additional configuration variables and then apply them using the Tamr restore process.
Note: Contact Tamr Support if you are not sure whether you need to back up any additional configuration variables.
To configure additional configuration variables for backup:
- Set the value of the configuration variable
TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS
to a comma-separated list of Tamr configuration variables that you want to back up using the administration utility. For example,TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS
can be set to["TAMR_DEDUP_NUM_QUESTIONS", "TAMR_ES_MAX_CLAUSE_COUNT"]
. See Creating or Updating a Configuration Variable. - Restart Tamr and its dependencies. See Restart Tamr and its dependencies.
${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh
config:set
TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS='["TAMR_DEDUP_NUM_QUESTIONS"]'
Note: The following configuration variables are always restored from the backup. This list may not be complete or up-to-date.
Configuration Variables That Are Always Restored
The following variables are restored by default and you don't need to manually configure their backups.
TAMR_CATEGORIZATION_FEATURE_SCALING
, TAMR_CATEGORIZATION_GRADIENT_DESCENT_ITERATIONS
, TAMR_CATEGORIZATION_REGULARIZATION_PARAMETER
, TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_HIGH
, TAMR_CATEGORIZATION_STRENGTH_THRESHOLD_MEDIUM
, TAMR_DELTA_CONSOLIDATION_THRESHOLD
, TAMR_ES_ENABLED
, TAMR_ES_MAX_RESULT_WINDOW
, TAMR_JOB_SPARK_DRIVER_MEM
, TAMR_JOB_SPARK_EXECUTOR_MEM
, TAMR_JOB_SPARK_EXECUTOR_CORES
, TAMR_JOB_SPARK_PROPS
, TAMR_LLM_BATCH_SIZE
, TAMR_LLM_REFRESH_INTERVAL_IN_MILLISECONDS
, TAMR_LLM_TOPK
, TAMR_PUBAPI_NAME
,
TAMR_SPARK_BROADCAST_ROW_LIMIT
, TAMR_SPARK_BROADCAST_SIZE_LIMIT_BYTES
.
Updated over 4 years ago