User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

AWS Backup and Restore

Steps to back up and restore a Tamr Core deployment on AWS.

This topic explains how to backup and restore:

Single-Node AWS Deployment Backup and Restore

Before You Begin:

Configuring a Single-Node AWS S3 Backup Location

The following configuration must be set to backup to S3. See Setting Configuration Variables for details on how to set and apply Tamr Core configuration.

---
TAMR_UNIFY_BACKUP_AWS_ROLE_BASED_ACCESS: true
TAMR_UNIFY_BACKUP_URI: "s3a://<bucket-name>/<path-to-backup>"
TAMR_BACKUP_FS_EXTRA_CONFIG: "{'fs.s3a.server-side-encryption-algorithm':'AES256'}"

Requirements to Backup to S3

Backup to AWS S3 requires the EC2 instance to be configured with an instance profile with a role with the following permissions for read/write access:

  • s3:GetBucketLocation
  • s3:GetBucketCORS
  • s3:GetObjectVersionForReplication
  • s3:GetObject
  • s3:GetBucketTagging
  • s3:GetObjectVersion
  • s3:GetObjectTagging
  • s3:ListMultipartUploadParts
  • s3:ListBucketByTags
  • s3:ListBucket
  • s3:ListObjects
  • s3:ListObjectsV2
  • s3:ListBucketMultipartUploads
  • s3:PutObject
  • s3:PutObjectTagging
  • s3:HeadBucket
  • s3:DeleteObject

Backup Single-Node AWS Deployment

Follow the instructions for single-node Backup.

Restore Single-Node AWS Deployment

Follow the instructions for single-node Restore.

Cloud-Native AWS Deployment Backup and Restore

Configuring a Cloud-Native AWS Backup

The Tamr terraform-aws-tamr-config module sets up default backup configuration. Follow the examples in the module to configure backup settings.

The following settings require configuration:

  • Set tamr_unify_backup_path to an s3 path in the tamr_data_bucket to which the Tamr Core VM and EMR instance profiles have read/write permission.
  • Set tamr_backup_emr_cluster_id to the EMR cluster ID of the HBase cluster.
module "tamr-config" {
  source = "git::[email protected]:Datatamer/terraform-aws-tamr-config"
  tamr_backup_emr_cluster_id = module.emr.tamr_emr_cluster_id
  tamr_unify_backup_path = “tamr/backups”
  ...
}

Backup Cloud-Native AWS Deployment

Follow the instructions for single-node Backup.

Restore Cloud-Native AWS Deployment

important Important: This procedure uses the Terraform taint and apply commands. Do not taint the S3 buckets that were applied to be used by Tamr Core and EMR cluster(s).

Step 1 Example:

The terraform taint command takes the address of the specific resource you want to delete. The following helper script invokes the terraform taint command against each resource provisioned by a module. Navigate to the directory where your Tamr Core AWS terraform modules are configured. For a module with the local name tamr-rds-postgres, you would run ./taint-module.sh tamr-rds-postgres after making the following script executable.

#!/bin/bash
module=$1
 
for resource in `terraform state list | grep module.${module} | sed -e 's/module.${module}.//'`; do
 terraform taint ${resource}
done

To restore a cloud-native AWS deployment:

  1. Reapply all resources, except S3 buckets.
    a. Use the terraform taint command to taint the AWS cloud-native deployment terraform resources created from the following modules:
    - terraform-aws-rds-postgres
    - terraform-aws-emr
    - terraform-aws-es
    - terraform-aws-tamr-vm
    The terraform taint command skips data sources, which is expected. The command returns a message that the data sources cannot be tainted. See Deploying Tamr Core on AWS for information about terraform resources.
    b. Run terraform apply to recreate the resources you marked as tainted. If needed, run apply twice; once to apply new resources and a second time to upload the updated Tamr Core configuration to S3.
  2. Start Tamr Core.
    Note: If you are running with a static Spark cluster and chose not to recreate the Spark cluster in the previous step, you can add: TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS: ‘[“TAMR_JOB_EMR_CLUSTER_ID”]’ to the Tamr Core configuration file. This setting restores this value along with the other backup configuration values. This only applies for a cloud-native environment running a static EMR cluster that is dedicated to running only Spark. If you are running both Spark and HBase on the same cluster, you must recreate this resource.
    a. Mount the Tamr Core zip file and updated Tamr Core configuration to the new Tamr Core VM instance.
    b. Set the configuration values. See Setting Configuration Variables.
    c. Start Tamr Core and its dependencies. See Restarting. Make sure to reset the Tamr Core configuration with the new values rendered by the terraform-aws-tamr-config module.
  3. Restore from backup by running POST /v1/instance/restore. Specify the path to a timestamped backup located at the URI set by the TAMR_UNIFY_BACKUP_URI Tamr configuration.
    Note: Tamr Core enters read-only mode for the duration of the restore.
  4. Stop Tamr and its dependencies. See Restarting.
  5. Terminate and reapply the HBase cluster.
    a. Run terraform taint 'module.<hbase module local name>.module.emr-cluster.aws_emr_cluster.emr-cluster[0]' to mark the HBase cluster for replacement.
    b. Run terraform apply to reapply the cluster.
  6. Start Tamr Core and its dependencies. See Restarting. Make sure to reset the Tamr Core configuration with the new values rendered by the terraform-aws-tamr-config module.
  7. Repopulate OpenSearch indices.
    Upon restore, the OpenSearch instance is not automatically restored. Restoring OpenSeasrch requires running the re-indexing process, which may take several hours. Consult the Tamr Help Center for details on re-indexing OpenSearch (formerly "Elasticsearch").