How to Automate Backup of Tamr on a Daily or Weekly Basis

Scheduling Tamr Backups

It is important to regularly take backups of your Tamr environment in order to recover from unexpected failures. Backups can be automated in a number of ways, but the simplest is to use a python script scheduled with the command-line utility crontab).

An example python script to create a backup can be seen here:

Note: tamr-toolbox should be installed as a prerequisite for the below python script.


import tamr_toolbox as tbox
import click
from typing import Dict

@click.command()
@click.option("--config_file_path")
def main(config_file_path: str) -> Dict[str, str]:
   """Takes a backup of a Tamr instance

   Args:
       config_file_path: File path to the yaml file containing configuration information

   Returns:
       JSON of completed backup information

   Raises:
       RuntimeError: Raised if the backup state is not "SUCCEEDED" upon completion
   """

   # Load the configuration from the file path provided
   config = tbox.utils.config.from_yaml(path_to_file=config_file_path)

   # setup a logger
   logger = tbox.utils.logger.create(__name__, log_directory=config["logging_dir"])

   # Create the tamr client
   tamr_client = tbox.utils.client.create(**config["my_instance_name"])

   # Run a backup of Tamr and wait until it completes
   logger.info("About to run backup")
   op = tbox.workflow.backup.initiate_backup(tamr_client, connection_retry_timeout_seconds=3600)
   state = op.json()["state"]

   if state == "SUCCEEDED":
       logger.info(f"Completed backup successfully: {op.json()}")
       return op.json()
   else:
       failure_message = f"Backup failed: {op.json()}"
       logger.error(failure_message)
       raise RuntimeError(failure_message)

if __name__ == "__main__":
   main()

It can be used with a yaml file like this one:

my_instance_name:
   host: "0.0.0.0"
   protocol: "http"
   port: "9100"
   username: "my_user_name"
   password: $MY_PASSWORD

logging_dir: "../logs"

You will need to update this configuration file with connection information for your instance of Tamr.

To schedule the script you need to add a line to your cron file (/etc/crontab). Two example schedules are shown below

To run every Saturday at 1AM:

0 1 * * 6 python /home/username/scripts/backup.py --config_file_path /home/username/conf/standard-config.yaml >/dev/null 2>&1

To run on the first of the month at 11PM:

0 23 1 * * python /home/username/scripts/backup.py --config_file_path /home/username/conf/standard-config.yaml >/dev/null 2>&1

Important

It is recommended that backups are stored on a separate system than the Tamr install to minimize the risk that both the backup and the Tamr installation are impacted by the same failure event. We recommend configuring your instance of Tamr to write backups directly to cloud storage on S3, GCS, or ADLS. See our public docs on backup and restore.