User GuidesAPI ReferenceRelease NotesEnrichment APIs
Doc HomeSupportLog In

Configuring Tamr Core Backup

To back up Tamr Core instances, create a backup directory and then back up various parts of the product.

importantimportant Important: Server snapshots are not a replacement for Tamr Core application backups. Therefore, do not take server snapshots with the intention of using them as Tamr Core backups. Server snapshots do not provide the correct backups of Tamr Core configuration. Additionally, if Tamr Core is running, taking a server snapshot can lead to a corrupt HBase configuration if you later attempt to restore from the snapshot. Instead, take Tamr Core application backups before introducing any changes.

Selecting a Backup Location

By default, Tamr Core stores backup files in the local filesystem directory: ${TAMR_UNIFY_HOME}/tamr/backups. Depending on your deployment, you can choose to store the backup files on the local filesystem, Google Cloud Platform (GCP), or AWS S3.

Tamr recommends using a distributed filesystem instead of the local filesystem for storing the backup files. In this way, you will not need to manually copy the backup files to the destination server on which you restore from a backup.

See Configuring a Backup Location, below, for instructions.

Selecting Components to Back Up

In addition to the Tamr Core application, you can configure backups for:

Backup Options for Deployments on GCP

In GCP cloud environments, Tamr Core can use cloud-native APIs to make the backup process faster and more efficient. See Configuring GCP native backup.

Configuring a Backup Location

Depending on your deployment type, configure a backup location on one of the following:

Configuring a Filesystem Backup Location

To configure a local filesystem backup location:

Set the value of the configuration variable TAMR_UNIFY_BACKUP_URI to a local filesystem directory using the administration utility. See Creating or Updating a Configuration Variable.

Configuring a Google Cloud Storage (GCS) Backup Location

To configure a GCS backup location:

  1. Set TAMR_UNIFY_BACKUP_URI to the path to the backup and restore directory in this format: gs://<bucket>/<path/to/backup>, such as: gs://backup-bucket/backup1.
  2. Set TAMR_GOOGLE_APPLICATION_CREDENTIALS to an absolute local path to the service account credentials JSON file, such as: /tmp/gcs/creds.json. For more information, see Creating or Updating a Configuration Variable.
  3. Restart Tamr Core and its dependencies. See Restarting Tamr Core.

Configuring an AWS S3 Backup Location

See AWS Backup and Restore.

Configuring Postgres Backup and Restore Binaries

To configure Postgres backup and restore binaries:

  1. Set TAMR_PG_DUMP_BINARY to /usr/pgsql-12/bin/pg_dump and TAMR_PG_RESTORE_BINARY to /usr/pgsql-12/bin/pg_restore. See Creating or Updating a Configuration Variable.
  2. Restart Tamr Core and its dependencies. See Restarting Tamr Core.

Configuring Elasticsearch Backup

To configure Elasticsearch backup:

  1. Configure the TAMR_UNIFY_BACKUP_ES configuration variable using the Tamr administration utility. See Creating or Updating a Configuration Variable.
  • If set to true (default), the generated backup file includes a complete copy of all data in Tamr ElasticSearch instance. Upon restore, the Elasticsearch instance is automatically restored from this copy.
  • If set to false, the generated backup file does not include a copy of data in the Tamr Elasticsearch instance. Upon restore, the Elasticsearch instance is not automatically restored. Restoring Elasticsearch requires running the re-indexing process, which may take several hours. Consult the Help Center knowledge base for details on re-indexing Elasticsearch.
  1. Restart Tamr Core and its dependencies. See Restarting Tamr Core.

Configuring Additional Configuration Variables for Backup

When restoring from backup, Tamr Core always restores variables that have the Tamr-supplied setting of machineSpecific: false. For up-to-date information about which configuration variables have this setting, see the Configuration Variable Reference.

You can specify additional configuration variables to restore from backup, using the TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS configuration variable.

Note: Contact Tamr Support if you are not sure whether you need to back up any additional configuration variables.

To configure additional configuration variables for backup:

  1. Set the value of the configuration variable TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS to a comma-separated list of configuration variables that you want to back up using the administration utility, as show in the example below. See Creating or Updating a Configuration Variable.
  2. Restart Tamr Core and its dependencies. See Restarting Tamr Core .

Example:

${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh 
config:set 
TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS='["TAMR_DEDUP_NUM_QUESTIONS", "TAMR_ES_MAX_CLAUSE_COUNT"]'

GCP Native Backup

When running on GCP services, Tamr Core uses native features to power its backup/restore function. This applies specifically to data stored in Bigtable, Cloud SQL, and Google Cloud Storage. Details about configuration for each service are below.

Bigtable

When Tamr Core is configured to run on Cloud Bigtable, it can use Bigtable's native backup API. When Tamr Core manages a large amount of data, the native backup API performs significantly faster than the export-based alternative.

The native backup API has the following limitations:

  • Backups may only be restored into the same Bigtable instance.
  • Backups expire after a set period, maximum 30 days.
  • Backups must be restored into new tables.

If needed, disable native backup by setting TAMR_BIGTABLE_BACKUP_NATIVE_ENABLED to false (default is `true').

You can configure the expiration time (in days) of each backup using the variable TAMR_BIGTABLE_BACKUP_NATIVE_TTL. The minimum allowed is 1 day, and the maximum is 30 days. The default is 14 days.

importantimportant Important: Because backups are restored into new tables, Tamr Core restores into a new "namespace" and automatically updates TAMR_HBASE_NAMESPACE accordingly. The old namespace is left alone. In this way, the previous state (and backups) remain present as a fallback. To avoid the additional storage costs, clean up the old namespace manually, when appropriate. In addition, if you are using a yaml file to set Tamr Core configuration, be sure to update the value of TAMR_HBASE_NAMESPACE (if set) before re-applying configuration from the file.

Cloud SQL

When Tamr Core is configured to run on Cloud SQL PostgreSQL, it can use Cloud SQL's native Admin API to perform backup. Backup and restore operations are typically faster with this API than with pg_dump, and the API does not require the pg_dump binary to be available.

If needed, disable Cloud SQL native backups in favor of pg_dump by setting TAMR_BACKUP_CLOUD_SQL_ENABLED to false (default true).

Google Cloud Storage

When using GCS for the Tamr Core filesystem and/or backup filesystem, Tamr Core uses gsutil to copy files efficiently. gsutil provides parallelism and allows direct copying between GCS locations (without downloading/uploading data via an intermediary).

To use gsutils, it must be must be present on the Tamr Core VM and on the PATH of Tamr Core services.

By default, gsutils is disabled. Enable gsutils by setting TAMR_BACKUP_GSUTIL_ENABLED to true.

If necessary, you can pass command line options to gsutil by setting TAMR_BACKUP_GSUTIL_EXTRA_ARGS.


Did this page help you?