HomeTamr Core GuidesTamr Core API Reference
Tamr Core GuidesTamr Core API ReferenceTamr Core TutorialsEnrichment API ReferenceSupport Help CenterLog In

Upgrading Tamr Core

Upgrade a single-node Tamr Core installation.

Checkpoint Versions and Upgrades

When you upgrade, you must upgrade to each of the checkpoint versions released between your version and the newer, target version.

Checkpoint Versions:

The following Tamr Core versions are checkpoint versions:

  • v2021.002
  • v2020.016
  • v2020.004

For example, to upgrade from v2020.012 to v2020.019 requires two upgrade stages: from v2020.012 to v2020.016, and then to v2020.019.

The upgrade utility prevents you from upgrading past a checkpoint version. The Release Notes also indicate each checkpoint release.

Upgrading from Any Version to a Patched Version

Patches provide critical updates, such as fixes for support issues and security improvements. Tamr strongly recommends upgrading to available patches for your release version. The upgrade process is the same as upgrading to a newer version of Tamr Core.

Upgrading to a Patched Checkpoint Release

If you are upgrading to the patched v2020.004.3 checkpoint release, run the upgrade with the --skipUpgradeStatusValidation option to ignore the check for a patch release. Otherwise, a validation error indicates that you need to first upgrade to the non-patched version of the checkpoint release (v2020.004.0).

You do not need to include --skipUpgradeStatusValidation when upgrading to the patched versions of the v2020.016 or v2021.002 checkpoint releases.

For more information about upgrade validation checks, see Validation.

About Spark Upgrades

Periodically, Tamr Core upgrades the Spark version. The release notes indicate these changes. Upgrading the Spark version occurs as part of the upgrade process to the version that contains the upgraded version of Spark.

Starting with v2020.015.0, Tamr Core uses Spark 2.4.5. When you upgrade to v2020.015.0 or greater, the upgrade process leaves the Spark 2.2 directory, ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7, as is. After you complete the upgrade and run the upgrade validation checks, you can copy any files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7 directory that you wish to keep and move them to the corresponding directory for Spark 2.4. You can then remove the Spark 2.2 directory.

About Elasticsearch Upgrades

Periodically, Tamr Core upgrades the Elasticsearch version. The release notes indicate these changes. When an upgrade to Elasticsearch is required, Tamr Core must reindex all projects and datasets after upgrading. As a result, it takes longer to upgrade to a release with a new version of Elasticsearch than other release upgrades.

Upgrading and Primary Key Management for LOOKUP Statements

Starting with v2020.016 and greater, Tamr Core automatically assigns primary keys to all LOOKUP statements with non-equality join conditions that you add in this version or in subsequent versions. This means that Tamr Core will change primary keys (tamr_ids) for such LOOKUP statements.

To avoid disruptions to LOOKUP statements written in versions before v2020.016, during the upgrade to this version, Tamr Core automatically runs an upgrade script that disables automatic assignment of primary keys for existing LOOKUP statements with non-equality join conditions. For more information, see Lookup.

The script prevents breaking any current projects that contain LOOKUP statements with non-equality join conditions and that depend on primary keys staying the same as in the Tamr Core version from which you are upgrading.

The script adds the text hint(pkmanagement.manual) in front of these statements. See Labels, Hints, and Scope. Once the upgrade script completes, it issues a report listing all the projects and their transformations that were changed. It also lists any projects and transformations that could not be updated with the text hint(pkmanagement.manual) due to parsing or linting errors.

Prerequisites for Upgrading Tamr Core

Before You Begin:

  • The current Tamr Core version is at least 2019.019.
  • The current user is the functional user, such as tamr.
  • The software bundle unify.zip of the target version, and any interim checkpoint versions, is available.
  • Tamr Core and its dependencies are running.
  • PostgreSQL is upgraded to the required version. See Requirements and Upgrading Postgres.
  • Version v2020.021.0 or later: Run the CleanupIncompletelyDeletedProjects maintenance utility and then delete any unnecessary datasets. See Dataset Cleanup.
  • Version v2021.016.0 and earlier: Verify that ulimit and vm.max_map_count are set correctly for the target version. See Setting ulimit Limits.
  • Verify that there is at least 30-40% of free disk space available on the instance to store backups. (Elasticsearch does not allocate shards if more than 85% of disk space is utilized.) See the Support Help Center knowledge base for instructions.

Pre-upgrade Health Checks

Prior to upgrade ensure that the following complete successfully:

  • The re-index data scale API (/api/reindex/all-datascale) and the jobs it starts in Tamr Core (this can take a few hours or more depending on the data scale).
  • The re-index human scale API (/api/reindex/all-humanscale).

If any of these fail, the failure must be resolved before upgrade can continue. Contact Customer Support at [email protected] for assistance.

Skipping Validation Checks Before Upgrades

Validation checks run before upgrades by default and Tamr Core recommends that you do not skip them. However, the --skipEnvironmentValidation flag for the <tamr-home-directory>/tamr/utils/unify-admin.sh --upgrade command allows you to skip all, or a specified, system validation check at the start of the upgrade command.

This flag is useful, for example, if you have upgraded Tamr Core-dependent components, such as PostgreSQL, in your current version of Tamr Core, and before upgrading to the Tamr Core version in which a specific version of PostgreSQL is required. Since the upgrade process checks for the required versions of all dependent components for both release versions involved in the upgrade, you can use this flag to avoid an upgrade check failure.

If used, this flag allows an upgrade process for Tamr Core to proceed with a potentially invalid configuration which can cause it to fail. For more information about validation checks, see Validation.

Upgrade Options

The following options are required:

  • --installDir <installDir> The current installation on disk.
  • --zipFile <zipFile> or --upgradeDir <upgradeDir> The path to the target upgrade ZIP file or to the directory that contains the extracted upgrade ZIP file. Use only one of these options.

The following options are optional:

  • --zookeeper <full-zk-conf-node-url> The Zookeeper URL of the Tamr Core configuration node, such as zk://localhost:21281/tamr/unify001/conf. If not included, the script checks the admin utilities properties file for this URL.
  • --backup Set the system to back up before upgrading.
  • --healthcheckTimeout <healthcheckTimeout> Set how long to wait for the health checks to time out.
  • --help Print out a help message.
  • --nobackup Set the system to not back up before upgrading.
  • --rerun Re-run the upgrade against the current version of the product. Useful if an error occurs during upgrade and you want to re-attempt the upgrade. To use, include --rerun immediately after --upgrade.
  • --tempDir <tempDir> A path to which to extract the ZIP file. If not specified, defaults to system temp directory.
  • --skipEnvironmentValidation <name of validator> Avoid running all, or a specified, script to validate whether the current environment meets the requirements for the upgrade version of the product. See Skipping Validation Checks Before Upgrading. To use, include --skipEnvironmentValidation as the final option.
  • --forceDatasetMaterialize After the upgrade process completes, run scripts to re-update all datasets (this includes unified datasets, results datasets, and internal datasets) to Elasticsearch. This triggers reindexing jobs in Tamr Core.

Upgrade Procedure

To upgrade Tamr Core to a newer version:

  1. Back up the Tamr Core version you are upgrading from. See Backup.
  2. If you are using any auxiliary services, disable them before proceeding with the upgrade. See Disabling an Auxiliary Service.
  3. If upgrading from an unpatched version, run the administrative utility unify-admin.sh with the command upgrade and the options --zipFile and --installDir. Optionally include --zookeeper and --tempDir. For example:
cd <tamr-home-directory>/tamr/utils
./unify-admin.sh --upgrade --zipFile <full-path-to-target-version-unify-zip>  --installDir <full-path-to-tamr-unify-home>  --zookeeper zk://localhost:21281/tamr/unify001/conf --tempDir <full-path-to-target-unzip-directory>
  1. If you are upgrading from a patched version, for example, v2020.008.1, and restored from a backup of a major version (without a patch), for example, v2020.008.0, then run the upgrade with the --skipUpgradeStatusValidation flag to ignore the check for a patched release.
  2. Validate the upgrade. See Validation.
  3. If you are using any auxiliary services, install the version that matches your upgraded Tamr Core instance. See Installing an Auxiliary Service.
  4. Clear your web browser cache before signing in to Tamr Core.

Post-Upgrade Steps

  • Version v2020.015.0 or later: Due to the upgrade of Spark from 2.2 to 2.4, which occurs in the v2020.015.0 release, after you upgrade to this version or greater you may need to examine the files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7 directory that you wish to keep and move them to the corresponding directory for Spark 2.4.x. This precaution is rarely needed. In most cases, Tamr Core deployments do not contain any Spark customizations.

Upgrade Troubleshooting Tips

If Tamr Core times out when starting up:

  • Do not stop and restart Tamr Core; upgrade scripts could still be running. Interrupting the scripts can break the system and/or result in the need to rerun the upgrade.
  • Use the service health API to investigate the issue.
  • Refer to the unify.log file to check whether progress is being made in starting Tamr Core.

If upgrade fails due to an Elasticsearch issue:

  • Do not immediately clear Elasticsearch.
  • Refer to the Elasticsearch logs to troubleshoot the underlying issue. See Elasticsearch logging for single-node on-premises deployments or cloud platform service logs for cloud deployments. When you have corrected the issue, rerun the the upgrade with --rerun.

See the Support Help Center knowledge base for additional upgrade troubleshooting information.

Did this page help you?