Upgrading Tamr
Upgrade a single-node Tamr installation.
Upgrading from Older Versions to the Current Version
If you are upgrading from Tamr version 0.40.0 or earlier, first upgrade to version 2019.019. For more information, see Version 2019.019 Upgrading.
If you are upgrading from Tamr version 2019.19 or greater, first upgrade to version 2020.16.0. Tamr v2020.16.0 is a checkpoint release and it has a patch. You first need to upgrade to v2020.016.0 and then upgrade to the patched version. If you want to upgrade directly to the patched version v2020.016.4, specify the --skipCheckpointReleaseValidation
flag when upgrading.
Note: If a particular Tamr version is a checkpoint release, this is indicated in this documentation and in the release notes. A checkpoint release is a release that you use as a stepping stone to upgrade your Tamr version to the most recent current version.
Upgrading from Any Version to a Patched Version
Patches provide critical updates, such as fixes for support issues and security improvements. We strongly recommend upgrading to available patches for your release version. The upgrade process is the same as upgrading to a newer version of Tamr.
Upgrading to a Patched Checkpoint Release
If you are upgrading to patched checkpoint release v2020.004.2, run the upgrade with the --skipUpgradeStatusValidation
flag to ignore the check for a patch release. Otherwise, you will receive a validation error indicating that you need to first upgrade to the non-patched version of the checkpoint release (v2020.004.0).
When upgrading to patched checkpoint release v2020.016.4, v2021.002.1, or later, you do not need to run with the --skipUpgradeStatusValidation
flag.
For more information about upgrade validation checks, see Validation.
About Spark Upgrades
Periodically, Tamr upgrades its version of Spark. This change is reflected in release notes for the Tamr version in which Spark was upgraded. Upgrading the Spark version occurs as part of the upgrade process to the Tamr version that contains the upgraded version of Spark.
Starting with Tamr v2020.015.0, Tamr uses Spark 2.4.5. When you upgrade to Tamr v2020.015.0 or greater, the upgrade process leaves the Spark 2.2 directory, ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7
, as is. After you complete the upgrade and run the upgrade validation checks, you can copy any files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7
directory that you wish to keep and move them to the corresponding directory for Spark 2.4. You can then remove the Spark 2.2 directory.
About Elasticsearch Upgrades
Periodically, Tamr upgrades its version of Elasticsearch. This change is reflected in release notes for the Tamr version in which ElasticSearch was upgraded. When an upgrade to Elasticsearch is required, Tamr must reindex all projects and datasets after upgrading. This causes upgrading from any version with previous Elasticsearch version to a Tamr version with the upgraded Elasticsearch version to take longer than a normal upgrade.
Upgrading and Primary Key Management for LOOKUP Statements
Starting with Tamr v2020.016 and greater, Tamr automatically assigns primary keys to all LOOKUP
statements with non-equality join conditions that you add in this version or in subsequent versions. This means that Tamr will change primary keys (tamr_ids
) for such LOOKUP
statements.
To avoid disruptions to LOOKUP
statements written in versions before v2020.016, during the upgrade to this version, Tamr automatically runs an upgrade script that disables automatic assignment of primary keys for existing LOOKUP
statements with non-equality join conditions. For more information, see Lookup.
The script prevents breaking any current projects that contain LOOKUP
statements with non-equality join conditions and that depend on primary keys staying the same as in the Tamr version from which you are upgrading.
The script adds the text hint(pkmanagement.manual)
in front of these statements. See Labels, Hints, and Scope. Once the upgrade script completes, it issues a report listing all the projects and their transformations that were changed. It also lists any projects and transformations that could not be updated with the text hint(pkmanagement.manual)
due to parsing or linting errors.
Upgrading Tamr
Checklist before proceeding:
- The current Tamr version is at least
2019.019
- The current user is the functional user, such as
tamr
. - The Tamr software bundle
unify.zip
of the target version is available. - Tamr and its dependencies are running.
- PostgreSQL is upgraded to the required version. See Requirements and Upgrading Postgres.
Skipping Validation Checks before Upgrades
Validation checks run before upgrades by default and we recommend that you do not skip them. However, the -skipEnvironmentValidation
flag for the <tamr-home-directory>/tamr/utils/unify-admin.sh --upgrade
command allows you to skip system validation checks at the start of the upgrade command.
This flag is useful, for example, if you have upgraded Tamr dependent components, such as Postgres, in your current version of Tamr, and before upgrading to the Tamr version in which a specific version of Postgres is required. Since the upgrade process checks for the required versions of all dependent components for both release versions involved in the upgrade, you may use this flag to avoid an upgrade check failure.
If set to true
, this flag allows an upgrade process for Tamr to proceed with a potentially invalid configuration which could cause it to fail. For more information about validation checks in Tamr, see Validation.
Upgrade Options
--backup
[multi-node, single-node] [optional] Set the system to backup before upgrading.--healthcheckTimeout <healthcheckTimeout>
[optional] Set how long to wait for the healthchecks to time out.--help
[optional] Print out the help message--installDir <installDir>
[single-node] The current installation on disk.--nobackup
[multi-node, single-node] [optional] Set the system not to backup before upgrading.--options <options>
[multi-node] [optional] The options used to build the marathon application configuration.--rerun
[multi-node, single-node] [optional] Re-run the upgrade against the current version of the product. Useful for when an error occurs during upgrade and the user wants to re-attempt the upgrade.--upgradeDir <upgradeDir>
[single-node] [optional] The directory where the upgrade version of Tamr exists, if the upgrade ZIP file has been extracted.--zipFile <zipFile>
[single-node] [optional] The path to the target upgrade zip file.--tempDir <tempDir>
[single-node] [optional] A path to which to extract the Zip file. If not specified, defaults to systemtemp
directory.--zookeeper <full-zk-conf-node-url>
[single-node] The ZooKeeper URL of the Tamr configuration node, such aszk://localhost:21281/tamr/unify001/conf
.--skipEnvironmentValidation
[multi-node, single-node] [optional] Avoid running scripts to validate whether the current environment meets the requirements for the upgrade version of the product. Checks include Postgres version compatibility.--forceDatasetMaterialize
[multi-node, single-node] [optional] After the upgrade process completes, run scripts to re-materialize all datasets (this includes unified datasets, results datasets, and internal datasets) to Elasticsearch. This triggers reindexing jobs in Tamr.
Upgrade Procedure
To Upgrade Tamr to a Newer Version:
- Back up the Tamr version you are upgrading from, by following the backup procedure. See Backup.
- If you are using any auxiliary services, disable them before proceeding with the Tamr upgrade. See Disabling an Auxiliary Service.
- If upgrading from version
2019.019
or greater, run the administrative utilityunify-admin.sh
with the commandupgrade
and the arguments--zipFile
,--installDir
and--zookeeper
. Optionally include--tempDir
. For example:
cd <tamr-home-directory>/tamr/utils
./unify-admin.sh --upgrade --zipFile <full-path-to-target-version-unify-zip> --installDir <full-path-to-tamr-unify-home> --zookeeper zk://localhost:21281/tamr/unify001/conf --tempDir <full-path-to-target-unzip-directory>
- Validate the upgrade. See Validation.
- If you are upgrading from a patched Tamr version, for example, v2020.008.1, and restored from a backup of a major version (without a patch), for example, v2020.008.0, then run the upgrade with the
--skipUpgradeStatusValidation
flag to automatically ignore the check for a patched release. - If you are using any auxiliary services, install the version that matched your upgraded Tamr instance. See Installing an Auxiliary Service.
- Clear your web browser cache before logging into Tamr.
Post-Upgrade Steps
- Due to the upgrade of Spark from 2.2 to 2.4, which occurred in the Tamr v2020.015.0 release, after you upgrade to this Tamr version or greater, you may need to examine the files in the
${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7
directory that you wish to keep and move them to the corresponding directory for Spark 2.4.x. This applies only to rare cases when you might have customizations to Spark. - Due to changes in the schema mapping model, you may need to rerun the Learn from mappings job before using the Generate mapping suggestions job on the Schema Mapping page.
Updated about 3 years ago