HomeTamr Core GuidesTamr Core API Reference
Tamr Core GuidesTamr Core API ReferenceTamr Core TutorialsEnrichment API ReferenceSupport Help CenterLog In

AWS Backup and Restore

Steps to back up and restore a Tamr Core deployment on AWS.

This topic explains how to backup and restore:

Single-Node AWS Deployment Backup and Restore

Before You Begin:

Configuring a Single-Node AWS S3 Backup Location

To configure an AWS S3 backup location using keys:

  1. Set TAMR_UNIFY_BACKUP_URI to s3://<bucket-name>/<path-to-backup>, TAMR_UNIFY_BACKUP_AWS_ACCESS_KEY_ID to <aws-access-key-id>, and TAMR_UNIFY_BACKUP_AWS_SECRET_ACCESS_KEY to <aws-secret-access-key>. See Creating or Updating a Configuration Variable.
  2. Restart Tamr Core and its dependencies. See Restarting.

To configure an AWS S3 backup location using IAM roles:

  1. Set TAMR_UNIFY_BACKUP_AWS_ROLE_BASED_ACCESS to true. See Creating or Updating a Configuration Variable.
  2. Restart Tamr Core and its dependencies. See Restarting.

Note: During a backup to an AWS S3 location, a role with the following permissions is required for read/write access:

  • s3:GetBucketLocation
  • s3:GetBucketCORS
  • s3:GetObjectVersionForReplication
  • s3:GetObject
  • s3:GetBucketTagging
  • s3:GetObjectVersion
  • s3:GetObjectTagging
  • s3:ListMultipartUploadParts
  • s3:ListBucketByTags
  • s3:ListBucket
  • s3:ListObjects
  • s3:ListObjectsV2
  • s3:ListBucketMultipartUploads
  • s3:PutObject
  • s3:PutObjectTagging
  • s3:HeadBucket
  • s3:DeleteObject

Backup Single-Node AWS Deployment

Follow the instructions for single-node Backup.

Restore Single-Node AWS Deployment

Follow the instructions for single-node Restore.

Cloud-Native AWS Deployment Backup and Restore

Before You Begin:

  • Terraform (>= 0.13) is required.
  • Verify that the Tamr Core VM instance profile has read/write S3 permissions to the S3 bucket specified in TAMR_UNIFY_BACKUP_URI.
  • Verify that Tamr Core is deployed following the instructions in Deploying Tamr on AWS.

Configuring a Cloud-Native AWS Backup

  1. Set TAMR_FILE_BASED_HBASE_BACKUP_ENABLED to true.
    To back up HBase files using the AWS CLI on the Tamr Core VM, set TAMR_BACKUP_AWS_CLI_ENABLED to true.
    Note: This mode of EMR HBase backup requires that the AWS CLI is installed on the Tamr Core VM and on the PATH.
    If you would like to back up by running a s3-dist-cp step on EMR, set TAMR_BACKUP_S3DISTCP_ENABLED to true, TAMR_BACKUP_AWS_CLI_ENABLED to false, and TAMR_BACKUP_EMR_CLUSTER_ID to the ID of a static EMR cluster (preferably the static HBase cluster that already exists in an AWS cloud-native deployment).
  2. If you are using the terraform-aws-tamr-config module, set TAMR_UNIFY_BACKUP_URI to an s3:// path, as shown in the example that follows.
  3. Set TAMR_UNIFY_BACKUP_ES to false. (Elasticsearch backup for Tamr Core on AWS cloud-native is not currently supported.)

If you are using the terraform-aws-tamr-config module, you must set values for the following fields to configure backups using the AWS CLI.

module "tamr-config" {
  source = "git::[email protected]:Datatamer/terraform-aws-tamr-config"
  tamr_file_based_hbase_backup_enabled = true
  tamr_unify_backup_path = tamr/backups
  ...
}

If you are configuring backups using s3-dist-cp on EMR, you must set values for the following fields.

module "tamr-config" {
  source = "git::[email protected]:Datatamer/terraform-aws-tamr-config"
  tamr_file_based_hbase_backup_enabled = true
  tamr_backup_emr_cluster_id = module.emr.tamr_emr_cluster_id
  ...
}

Backup Cloud-Native AWS Deployment

Follow the instructions for single-node Backup.

Restore Cloud-Native AWS Deployment

importantimportant Important: This procedure uses the taint and apply commands. Do not taint the S3 buckets that were applied to be used by Tamr Core and EMR cluster(s).

Step 1 Example:

The terraform taint command takes the address of the specific resource you want to delete. The following helper script invokes the terraform taint command against each resource provisioned by a module. Navigate to the directory where your Tamr Core AWS terraform modules are configured. For a module with the local name tamr-rds-postgres, you would run ./taint-module.sh tamr-rds-postgres after making the following script executable.

#!/bin/bash
module=$1
 
for resource in `terraform state list | grep module.${module} | sed -e 's/module.${module}.//'`; do
 terraform taint ${resource}
done

To restore a cloud-native AWS deployment:

  1. Reapply all resources, except S3 buckets.
    a. Use the terraform taint comment to taint the AWS cloud-native deployment terraform resources created from the following modules:
    - terraform-aws-rds-postgres
    - terraform-aws-emr
    - terraform-aws-es
    - terraform-aws-tamr-vm
    The terraform taint command skips data sources, which is expected. The command returns a message that the data sources cannot be tainted. See Deploying Tamr Core on AWS for information about terraform resources.
    b. Run terraform apply to recreate the resources you marked as tainted. If needed, run apply twice; once to apply new resources and a second time to upload the updated Tamr Core configuration to S3.
  2. Start Tamr Core.
    Note: If you are running with a static Spark cluster and chose not to recreate the Spark cluster in the previous step, you can add: TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS: ‘[“TAMR_JOB_EMR_CLUSTER_ID”]’ to the Tamr Core configuration file. This setting restores this value along with the other backup configuration values. This only applies for a cloud-native environment running a static EMR cluster that is dedicated to running only Spark. If you are running both Spark and HBase on the same cluster, you must recreate this resource.
    a. Mount the Tamr Core zip file and updated Tamr Core configuration to the new Tamr Core VM instance.
    b. Set the configuration values. See Creating or Updating a Configuration Variable.
    c. Start Tamr Core and its dependencies. See Restarting. Make sure to reset the Tamr Core configuration with the new values rendered by the terraform-aws-tamr-config module.
  3. Restore from backup by running POST /v1/instance/restore. Specify the path to a timestamped backup located at the URI set by the TAMR_UNIFY_BACKUP_URI Tamr configuration.
    Note: Tamr Core enters read-only mode for the duration of the restore.
  4. Stop Tamr and its dependencies. See Restarting.
  5. Terminate and reapply the HBase cluster.
    a. Run terraform taint 'module.<hbase module local name>.module.emr-cluster.aws_emr_cluster.emr-cluster[0]' to mark the HBase cluster for replacement.
    b. Run terraform apply to reapply the cluster.
  6. Start Tamr Core and its dependencies. See Restarting. Make sure to reset the Tamr Core configuration with the new values rendered by the terraform-aws-tamr-config module.
  7. Repopulate Elasticsearch indices.
    Upon restore, the Elasticsearch instance is not automatically restored. Restoring Elasticsearch requires running the re-indexing process, which may take several hours. Consult the Tamr Help Center for details on re-indexing Elasticsearch.

Did this page help you?