Tamr Documentation

Upgrading Tamr

Upgrade a single-node Tamr installation.

Upgrading from Versions Earlier than 2019.019

If you are upgrading from a Tamr version earlier than 2019.019, contact Tamr Support at [email protected] for upgrade assistance.

Checkpoint Versions and Upgrades

To upgrade Tamr version 2019.019 or greater to a more recent version, you must upgrade to each of the checkpoint versions released between your version and the newer, target version. Checkpoint versions

The following Tamr versions are checkpoint versions:

  • v2021.002
  • v2020.016
  • v2020.004

For example, to upgrade from v2020.012 to v2020.019 requires two upgrade stages: from v2020.012 to v2020.016, and then to v2020.019.

The upgrade utility prevents you from upgrading past a checkpoint version. The Release Notes also indicate each checkpoint release.

Upgrading from Any Version to a Patched Version

Patches provide critical updates, such as fixes for support issues and security improvements. We strongly recommend upgrading to available patches for your release version. The upgrade process is the same as upgrading to a newer version of Tamr.

Upgrading to a Patched Checkpoint Release

If you are upgrading to patched checkpoint release v2020.004.2, run the upgrade with the --skipUpgradeStatusValidation flag to ignore the check for a patch release. Otherwise, you will receive a validation error indicating that you need to first upgrade to the non-patched version of the checkpoint release (v2020.004.0).

When upgrading to patched checkpoint release v2020.016.4, v2021.002.1, or later, you do not need to run with the --skipUpgradeStatusValidation flag.

For more information about upgrade validation checks, see Validation.

About Spark Upgrades

Periodically, Tamr upgrades its version of Spark. This change is reflected in release notes for the Tamr version in which Spark was upgraded. Upgrading the Spark version occurs as part of the upgrade process to the Tamr version that contains the upgraded version of Spark.

Starting with Tamr v2020.015.0, Tamr uses Spark 2.4.5. When you upgrade to Tamr v2020.015.0 or greater, the upgrade process leaves the Spark 2.2 directory, ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7, as is. After you complete the upgrade and run the upgrade validation checks, you can copy any files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7 directory that you wish to keep and move them to the corresponding directory for Spark 2.4. You can then remove the Spark 2.2 directory.

About Elasticsearch Upgrades

Periodically, Tamr upgrades its version of Elasticsearch. This change is reflected in release notes for the Tamr version in which ElasticSearch was upgraded. When an upgrade to Elasticsearch is required, Tamr must reindex all projects and datasets after upgrading. This causes upgrading from any version with previous Elasticsearch version to a Tamr version with the upgraded Elasticsearch version to take longer than a normal upgrade.

Upgrading and Primary Key Management for LOOKUP Statements

Starting with Tamr v2020.016 and greater, Tamr automatically assigns primary keys to all LOOKUP statements with non-equality join conditions that you add in this version or in subsequent versions. This means that Tamr will change primary keys (tamr_ids) for such LOOKUP statements.
To avoid disruptions to LOOKUP statements written in versions before v2020.016, during the upgrade to this version, Tamr automatically runs an upgrade script that disables automatic assignment of primary keys for existing LOOKUP statements with non-equality join conditions. For more information, see Lookup.

The script prevents breaking any current projects that contain LOOKUP statements with non-equality join conditions and that depend on primary keys staying the same as in the Tamr version from which you are upgrading.

The script adds the text hint(pkmanagement.manual) in front of these statements. See Labels, Hints, and Scope. Once the upgrade script completes, it issues a report listing all the projects and their transformations that were changed. It also lists any projects and transformations that could not be updated with the text hint(pkmanagement.manual) due to parsing or linting errors.

Upgrading Tamr

Checklist before proceeding:

  • The current Tamr version is at least 2019.019.
  • The current user is the functional user, such as tamr.
  • The Tamr software bundle unify.zip of the target version, and any interim checkpoint versions, is available.
  • Tamr and its dependencies are running.
  • PostgreSQL is upgraded to the required version. See Requirements and Upgrading Postgres.
  • If starting from version v2020.021.0 or later, run the CleanupIncompletelyDeletedProjects maintenance utility and then delete any unnecessary datasets. See Dataset Cleanup.
  • (v2021.016.0 and earlier) Verify that ulimit and vm.max_map_count are set correctly for the target version. See Seeting ulimit Limits.
  • Verify that there is at least 30-40% of free disk space available on the instance to store backups. (Elasticsearch will not allocate shards if more than 85% of disk space is utilized.) See the Support Help Center knowledge base for instructions.

Pre-upgrade Health Checks

Prior to upgrade ensure that the following complete successfully:

  • The re-index data scale API (/api/reindex/all-datascale) and the jobs it starts in Tamr (this can take a few hours or more depending on the data scale)
  • The re-index human scale API (/api/reindex/all-humanscale)

If any of these fail, the failure must be resolved before upgrade can continue. Contact Customer Support ([email protected]) for assistance.

Skipping Validation Checks before Upgrades

Validation checks run before upgrades by default and we recommend that you do not skip them. However, the --skipEnvironmentValidation flag for the <tamr-home-directory>/tamr/utils/unify-admin.sh --upgrade command allows you to skip system validation checks at the start of the upgrade command.

This flag is useful, for example, if you have upgraded Tamr dependent components, such as Postgres, in your current version of Tamr, and before upgrading to the Tamr version in which a specific version of Postgres is required. Since the upgrade process checks for the required versions of all dependent components for both release versions involved in the upgrade, you may use this flag to avoid an upgrade check failure.

If set to true, this flag allows an upgrade process for Tamr to proceed with a potentially invalid configuration which could cause it to fail. For more information about validation checks in Tamr, see Validation.

Upgrade Options

  • --backup [multi-node, single-node] [optional] Set the system to backup before upgrading.
  • --healthcheckTimeout <healthcheckTimeout> [optional] Set how long to wait for the healthchecks to time out.
  • --help [optional] Print out the help message
  • --installDir <installDir> [single-node] The current installation on disk.
  • --nobackup [multi-node, single-node] [optional] Set the system not to backup before upgrading.
  • --options <options> [multi-node] [optional] The options used to build the marathon application configuration.
  • --rerun [multi-node, single-node] [optional] Re-run the upgrade against the current version of the product. Useful for when an error occurs during upgrade and the user wants to re-attempt the upgrade.
  • --upgradeDir <upgradeDir> [single-node] [optional] The directory where the upgrade version of Tamr exists, if the upgrade ZIP file has been extracted.
  • --zipFile <zipFile> [single-node] [optional] The path to the target upgrade zip file.
  • --tempDir <tempDir> [single-node] [optional] A path to which to extract the Zip file. If not specified, defaults to system temp directory.
  • --zookeeper <full-zk-conf-node-url> [single-node] The ZooKeeper URL of the Tamr configuration node, such as zk://localhost:21281/tamr/unify001/conf.
  • --skipEnvironmentValidation [multi-node, single-node] [optional] Avoid running scripts to validate whether the current environment meets the requirements for the upgrade version of the product. Checks include Postgres version compatibility.
  • --forceDatasetMaterialize [multi-node, single-node] [optional] After the upgrade process completes, run scripts to re-materialize all datasets (this includes unified datasets, results datasets, and internal datasets) to Elasticsearch. This triggers reindexing jobs in Tamr.

Upgrade Procedure

To Upgrade Tamr to a Newer Version:

  1. Back up the Tamr version you are upgrading from, by following the backup procedure. See Backup.
  2. If you are using any auxiliary services, disable them before proceeding with the Tamr upgrade. See Disabling an Auxiliary Service.
  3. If upgrading from a non-patched version 2019.019 or greater, run the administrative utility unify-admin.sh with the command upgrade and the arguments --zipFile, --installDir and --zookeeper. Optionally include --tempDir. For example:
cd <tamr-home-directory>/tamr/utils
./unify-admin.sh --upgrade --zipFile <full-path-to-target-version-unify-zip>  --installDir <full-path-to-tamr-unify-home>  --zookeeper zk://localhost:21281/tamr/unify001/conf --tempDir <full-path-to-target-unzip-directory>
  1. If you are upgrading from a patched Tamr version, for example, v2020.008.1, and restored from a backup of a major version (without a patch), for example, v2020.008.0, then run the upgrade with the --skipUpgradeStatusValidation flag to automatically ignore the check for a patched release.
  2. Validate the upgrade. See Validation.
  3. If you are using any auxiliary services, install the version that matched your upgraded Tamr instance. See Installing an Auxiliary Service.
  4. Clear your web browser cache before logging into Tamr.

Post-Upgrade Steps

  • Due to the upgrade of Spark from 2.2 to 2.4, which occurred in the Tamr v2020.015.0 release, after you upgrade to this Tamr version or greater, you may need to examine the files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7 directory that you wish to keep and move them to the corresponding directory for Spark 2.4.x. This is step is rarely needed. In most cases, Tamr deployments do not contain any Spark customizations.
  • Due to changes in the schema mapping model, you may need to rerun the Learn from mappings job before using the Generate mapping suggestions job on the Schema Mapping page.

Upgrade Troubleshooting Tips

If Tamr times out when starting up:

  • Do not stop and restart Tamr; upgrade scripts may still be running. Interrupting the scripts can break the system and/or result in the need to rerun the upgrade.
  • Use the service health API to investigate the issue.
  • Refer to the unify.log file to check whether progress is being made in starting Tamr.

If upgrade fails due to an Elasticsearch issue:

  • Do not immediately clear Elasticsearch.
  • Refer to the Elasticsearch logs to troubleshoot the underlying issue. (See [Elasticsearch logging} (doc:logging-in-single-node-on-premise-deployments#section-elasticsearch) for single-node on-premise deployments] or cloud platform service logs for cloud deployments.) When you have corrected the issue, rerun the the upgrade with --rerun.

See the Support Help Center knowledge base for additional upgrade troubleshooting information.

Updated 21 days ago

Upgrading Tamr

Upgrade a single-node Tamr installation.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.