User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Upgrading Tamr

Upgrade a single-node Tamr installation.

Upgrading from Older Versions to the Current Version

If you are upgrading from Tamr version 0.40.0 or earlier, first upgrade to version 2019.019. For more information, see Version 2019.019 Upgrading.

If you are upgrading from Tamr version 2019.19 or greater, first upgrade to version 2020.16.0. Tamr v2020.16.0 is a checkpoint release and it has a patch. You first need to upgrade to v2020.016.0 and then upgrade to the patched version. If you want to upgrade directly to the patched version v2020.016.4, specify the --skipCheckpointReleaseValidation flag when upgrading.

Note: If a particular Tamr version is a checkpoint release, this is indicated in this documentation and in the release notes. A checkpoint release is a release that you use as a stepping stone to upgrade your Tamr version to the most recent current version.

Upgrading from Any Version to a Patched Version

Patches provide critical updates, such as fixes for support issues and security improvements. We strongly recommend upgrading to available patches for your release version. The upgrade process is the same as upgrading to a newer version of Tamr.

Upgrading to a Patched Checkpoint Release

If you are upgrading to patched checkpoint release v2020.004.2, run the upgrade with the --skipUpgradeStatusValidation flag to ignore the check for a patch release. Otherwise, you will receive a validation error indicating that you need to first upgrade to the non-patched version of the checkpoint release (v2020.004.0).

When upgrading to patched checkpoint release v2020.016.4, v2021.002.1, or later, you do not need to run with the --skipUpgradeStatusValidation flag.

For more information about upgrade validation checks, see Validation.

About Spark Upgrades

Periodically, Tamr upgrades its version of Spark. This change is reflected in release notes for the Tamr version in which Spark was upgraded. Upgrading the Spark version occurs as part of the upgrade process to the Tamr version that contains the upgraded version of Spark.

Starting with Tamr v2020.015.0, Tamr uses Spark 2.4.5. When you upgrade to Tamr v2020.015.0 or greater, the upgrade process leaves the Spark 2.2 directory, ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7, as is. After you complete the upgrade and run the upgrade validation checks, you can copy any files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7 directory that you wish to keep and move them to the corresponding directory for Spark 2.4. You can then remove the Spark 2.2 directory.

About Elasticsearch Upgrades

Periodically, Tamr upgrades its version of Elasticsearch. This change is reflected in release notes for the Tamr version in which ElasticSearch was upgraded. When an upgrade to Elasticsearch is required, Tamr must reindex all projects and datasets after upgrading. This causes upgrading from any version with previous Elasticsearch version to a Tamr version with the upgraded Elasticsearch version to take longer than a normal upgrade.

Upgrading and Primary Key Management for LOOKUP Statements

Starting with Tamr v2020.016 and greater, Tamr automatically assigns primary keys to all LOOKUP statements with non-equality join conditions that you add in this version or in subsequent versions. This means that Tamr will change primary keys (tamr_ids) for such LOOKUP statements.
To avoid disruptions to LOOKUP statements written in versions before v2020.016, during the upgrade to this version, Tamr automatically runs an upgrade script that disables automatic assignment of primary keys for existing LOOKUP statements with non-equality join conditions. For more information, see Lookup.

The script prevents breaking any current projects that contain LOOKUP statements with non-equality join conditions and that depend on primary keys staying the same as in the Tamr version from which you are upgrading.

The script adds the text hint(pkmanagement.manual) in front of these statements. See Labels, Hints, and Scope. Once the upgrade script completes, it issues a report listing all the projects and their transformations that were changed. It also lists any projects and transformations that could not be updated with the text hint(pkmanagement.manual) due to parsing or linting errors.

Upgrading Tamr

Checklist before proceeding:

  • The current Tamr version is at least 2019.019
  • The current user is the functional user, such as tamr.
  • The Tamr software bundle unify.zip of the target version is available.
  • Tamr and its dependencies are running.
  • PostgreSQL is upgraded to the required version. See Requirements and Upgrading Postgres.

Skipping Validation Checks before Upgrades

Validation checks run before upgrades by default and we recommend that you do not skip them. However, the -skipEnvironmentValidation flag for the <tamr-home-directory>/tamr/utils/unify-admin.sh --upgrade command allows you to skip system validation checks at the start of the upgrade command.

This flag is useful, for example, if you have upgraded Tamr dependent components, such as Postgres, in your current version of Tamr, and before upgrading to the Tamr version in which a specific version of Postgres is required. Since the upgrade process checks for the required versions of all dependent components for both release versions involved in the upgrade, you may use this flag to avoid an upgrade check failure.

If set to true, this flag allows an upgrade process for Tamr to proceed with a potentially invalid configuration which could cause it to fail. For more information about validation checks in Tamr, see Validation.

Upgrade Options

  • --backup [multi-node, single-node] [optional] Set the system to backup before upgrading.
  • --healthcheckTimeout <healthcheckTimeout> [optional] Set how long to wait for the healthchecks to time out.
  • --help [optional] Print out the help message
  • --installDir <installDir> [single-node] The current installation on disk.
  • --nobackup [multi-node, single-node] [optional] Set the system not to backup before upgrading.
  • --options <options> [multi-node] [optional] The options used to build the marathon application configuration.
  • --rerun [multi-node, single-node] [optional] Re-run the upgrade against the current version of the product. Useful for when an error occurs during upgrade and the user wants to re-attempt the upgrade.
  • --upgradeDir <upgradeDir> [single-node] [optional] The directory where the upgrade version of Tamr exists, if the upgrade ZIP file has been extracted.
  • --zipFile <zipFile> [single-node] [optional] The path to the target upgrade zip file.
  • --tempDir <tempDir> [single-node] [optional] A path to which to extract the Zip file. If not specified, defaults to system temp directory.
  • --zookeeper <full-zk-conf-node-url> [single-node] The ZooKeeper URL of the Tamr configuration node, such as zk://localhost:21281/tamr/unify001/conf.
  • --skipEnvironmentValidation [multi-node, single-node] [optional] Avoid running scripts to validate whether the current environment meets the requirements for the upgrade version of the product. Checks include Postgres version compatibility.
  • --forceDatasetMaterialize [multi-node, single-node] [optional] After the upgrade process completes, run scripts to re-materialize all datasets (this includes unified datasets, results datasets, and internal datasets) to Elasticsearch. This triggers reindexing jobs in Tamr.

Upgrade Procedure

To Upgrade Tamr to a Newer Version:

  1. Back up the Tamr version you are upgrading from, by following the backup procedure. See Backup.
  2. If you are using any auxiliary services, disable them before proceeding with the Tamr upgrade. See Disabling an Auxiliary Service.
  3. If upgrading from version 2019.019 or greater, run the administrative utility unify-admin.sh with the command upgrade and the arguments --zipFile, --installDir and --zookeeper. Optionally include --tempDir. For example:
cd <tamr-home-directory>/tamr/utils
./unify-admin.sh --upgrade --zipFile <full-path-to-target-version-unify-zip>  --installDir <full-path-to-tamr-unify-home>  --zookeeper zk://localhost:21281/tamr/unify001/conf --tempDir <full-path-to-target-unzip-directory>
  1. Validate the upgrade. See Validation.
  2. If you are upgrading from a patched Tamr version, for example, v2020.008.1, and restored from a backup of a major version (without a patch), for example, v2020.008.0, then run the upgrade with the --skipUpgradeStatusValidation flag to automatically ignore the check for a patched release.
  3. If you are using any auxiliary services, install the version that matched your upgraded Tamr instance. See Installing an Auxiliary Service.
  4. Clear your web browser cache before logging into Tamr.

Post-Upgrade Steps

  • Due to the upgrade of Spark from 2.2 to 2.4, which occurred in the Tamr v2020.015.0 release, after you upgrade to this Tamr version or greater, you may need to examine the files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7 directory that you wish to keep and move them to the corresponding directory for Spark 2.4.x. This applies only to rare cases when you might have customizations to Spark.
  • Due to changes in the schema mapping model, you may need to rerun the Learn from mappings job before using the Generate mapping suggestions job on the Schema Mapping page.