AWS Backup and Restore
Steps to back up and restore a Tamr Core deployment on AWS.
This topic explains how to backup and restore:
Single-Node AWS Deployment Backup and Restore
Before You Begin:
- Understand Backup Configuration requirements.
- Verify that Tamr Core is deployed following the instructions in Deploying Tamr Core on AWS.
Configuring a Single-Node AWS S3 Backup Location
The following configuration must be set to backup to S3. See Setting Configuration Variables for details on how to set and apply Tamr Core configuration.
---
TAMR_UNIFY_BACKUP_AWS_ROLE_BASED_ACCESS: true
TAMR_UNIFY_BACKUP_URI: "s3a://<bucket-name>/<path-to-backup>"
TAMR_BACKUP_FS_EXTRA_CONFIG: "{'fs.s3a.server-side-encryption-algorithm':'AES256'}"
Requirements to Backup to S3
Backup to AWS S3 requires the EC2 instance to be configured with an instance profile with a role with the following permissions for read/write access:
- s3:GetBucketLocation
- s3:GetBucketCORS
- s3:GetObjectVersionForReplication
- s3:GetObject
- s3:GetBucketTagging
- s3:GetObjectVersion
- s3:GetObjectTagging
- s3:ListMultipartUploadParts
- s3:ListBucketByTags
- s3:ListBucket
- s3:ListObjects
- s3:ListObjectsV2
- s3:ListBucketMultipartUploads
- s3:PutObject
- s3:PutObjectTagging
- s3:HeadBucket
- s3:DeleteObject
Backup Single-Node AWS Deployment
Follow the instructions for single-node Backup.
Restore Single-Node AWS Deployment
Follow the instructions for single-node Restore.
Cloud-Native AWS Deployment Backup and Restore
Configuring a Cloud-Native AWS Backup
The Tamr terraform-aws-tamr-config module sets up default backup configuration. Follow the examples in the module to configure backup settings.
The following settings require configuration:
- Set
tamr_unify_backup_path
to an s3 path in thetamr_data_bucket
to which the Tamr Core VM and EMR instance profiles have read/write permission. - Set
tamr_backup_emr_cluster_id
to the EMR cluster ID of the HBase cluster.
module "tamr-config" {
source = "git::[email protected]:Datatamer/terraform-aws-tamr-config"
tamr_backup_emr_cluster_id = module.emr.tamr_emr_cluster_id
tamr_unify_backup_path = “tamr/backups”
...
}
Backup Cloud-Native AWS Deployment
Follow the instructions for single-node Backup.
Restore Cloud-Native AWS Deployment
Important: This procedure uses the Terraform taint and apply commands. Do not taint the S3 buckets that were applied to be used by Tamr Core and EMR cluster(s).
Step 1 Example:
The terraform taint
command takes the address of the specific resource you want to delete. The following helper script invokes the terraform taint
command against each resource provisioned by a module. Navigate to the directory where your Tamr Core AWS terraform modules are configured. For a module with the local name tamr-rds-postgres
, you would run ./taint-module.sh tamr-rds-postgres
after making the following script executable.
#!/bin/bash
module=$1
for resource in `terraform state list | grep module.${module} | sed -e 's/module.${module}.//'`; do
terraform taint ${resource}
done
To restore a cloud-native AWS deployment:
- Reapply all resources, except S3 buckets.
a. Use theterraform taint
command to taint the AWS cloud-native deployment terraform resources created from the following modules:
- terraform-aws-rds-postgres
- terraform-aws-emr
- terraform-aws-es
- terraform-aws-tamr-vm
Theterraform taint
command skips data sources, which is expected. The command returns a message that the data sources cannot be tainted. See Deploying Tamr Core on AWS for information about terraform resources.
b. Runterraform apply
to recreate the resources you marked as tainted. If needed, runapply
twice; once to apply new resources and a second time to upload the updated Tamr Core configuration to S3. - Start Tamr Core.
Note: If you are running with a static Spark cluster and chose not to recreate the Spark cluster in the previous step, you can add:TAMR_UNIFY_BACKUP_EXTRA_CONFIG_PROPS: ‘[“TAMR_JOB_EMR_CLUSTER_ID”]’
to the Tamr Core configuration file. This setting restores this value along with the other backup configuration values. This only applies for a cloud-native environment running a static EMR cluster that is dedicated to running only Spark. If you are running both Spark and HBase on the same cluster, you must recreate this resource.
a. Mount the Tamr Core zip file and updated Tamr Core configuration to the new Tamr Core VM instance.
b. Set the configuration values. See Setting Configuration Variables.
c. Start Tamr Core and its dependencies. See Restarting. Make sure to reset the Tamr Core configuration with the new values rendered by the terraform-aws-tamr-config module. - Restore from backup by running
POST /v1/instance/restore
. Specify the path to a timestamped backup located at the URI set by theTAMR_UNIFY_BACKUP_URI
Tamr configuration.
Note: Tamr Core enters read-only mode for the duration of the restore. - Stop Tamr and its dependencies. See Restarting.
- Terminate and reapply the HBase cluster.
a. Runterraform taint 'module.<hbase module local name>.module.emr-cluster.aws_emr_cluster.emr-cluster[0]'
to mark the HBase cluster for replacement.
b. Runterraform apply
to reapply the cluster. - Start Tamr Core and its dependencies. See Restarting. Make sure to reset the Tamr Core configuration with the new values rendered by the terraform-aws-tamr-config module.
- Repopulate OpenSearch indices.
Upon restore, the OpenSearch instance is not automatically restored. Restoring OpenSeasrch requires running the re-indexing process, which may take several hours. Consult the Tamr Help Center for details on re-indexing OpenSearch (formerly "Elasticsearch").
Updated over 2 years ago