Upgrading Tamr Core
Upgrade a single-node Tamr Core installation.
Preparing for Your Upgrade
Before You Upgrade to this Version: | Be Sure to Review: |
---|---|
v2022.005 or later | - v2022.005.0 Upgrade Considerations - Prerequisites for Upgrading Tamr Core |
v2022.002 (Checkpoint) | - Checkpoint Versions and Upgrades - Prerequisites for Upgrading Tamr Core - About HBase Upgrades |
v2022.001 (Checkpoint) | - Checkpoint Versions and Upgrades - Prerequisites for Upgrading Tamr Core |
v2021.002 (Checkpoint) | - Checkpoint Versions and Upgrades - Prerequisites for Upgrading Tamr Core |
v2020.021.0 or later | - Prerequisites for Upgrading Tamr Core & - Dataset Cleanup |
v2020.016 (Checkpoint) | - Checkpoint Versions and Upgrades - Upgrading and Primary Key Management for LOOKUP Statements - Prerequisites for Upgrading Tamr Core & - Setting ulimit Limits |
v2020.015.0 or later | - About Spark Upgrades - Prerequisites for Upgrading Tamr Core & - Setting ulimit Limits |
v2020.004 (Checkpoint) | - Checkpoint Versions and Upgrades - Upgrading to a Patched Checkpoint Release - Prerequisites for Upgrading Tamr Core & - Setting ulimit Limits |
Checkpoint Versions and Upgrades
When you upgrade, you must upgrade to each of the checkpoint versions released between your version and the newer, target version.
The following Tamr Core versions are checkpoint versions:
- v2022.002
Important: For single-node deployments, during upgrade to v2022.002.x you must provide an additional flag,
--exportHBaseSnapshots
, to the admin utility (unify-admin.sh). To prevent data loss, see prerequisites before upgrade. - v2022.001
- v2021.002
- v2020.016
- v2020.004
For example, an upgrade from v2020.012 to v2020.019 requires two upgrade stages: from v2020.012 to v2020.016, and then to v2020.019.
The upgrade utility prevents you from upgrading past a checkpoint version. The release notes also indicate each checkpoint release.
Upgrading from Any Version to a Patched Version
Patches provide critical updates, such as fixes for support issues and security improvements. Tamr strongly recommends upgrading to available patches for your release version. The upgrade process is the same as upgrading to a newer version of Tamr Core.
Upgrading to a Patched Checkpoint Release
If you are upgrading to the patched v2020.004.3 checkpoint release, run the upgrade with the --skipUpgradeStatusValidation
option to ignore the check for a patch release. Otherwise, a validation error indicates that you need to first upgrade to the non-patched version of the checkpoint release (v2020.004.0).
You do not need to include --skipUpgradeStatusValidation
when upgrading to the patched versions of the v2020.016 or v2021.002 checkpoint releases.
For more information about upgrade validation checks, see Validation.
About HBase Upgrades
This section lists information and requirements for upgrading HBase 1.3.1 to HBase 2.3.6, as part of the process of upgrading from v2022.001 to v2022.002.
Prerequisites for v2022.002
Important:
- For single-node deployments, you must provide an additional flag,
--exportHBaseSnapshots
, to the admin utility (unify-admin.sh) during upgrade.
Note: If you’re upgrading to v2022.002 and the upgrade attempt fails for any reason after HBase snapshots have been successfully taken, rerun your upgrade with the--rerun
flag but do not include the--exportHBaseSnapshots
flag again. - Check that the configuration variable
TAMR_HBASE_SNAPSHOT_URI
backs up to a directory with sufficient disk space. This variable sets the location to which HBase table snapshots are exported and imported. See Configuring HBase. The HBase upgrade is orchestrated via exporting and importing snapshots, so you must have sufficient disk space to store these snapshots. Do not let your disk exceed 80% utilization during the upgrade process. - Upgrading HBase versions requires significant upgrade time; expect upgrade to take longer than usual for this release. Upgrade time is highly dependent on the number of projects in your pipeline. For example, if you have 20 projects, expect that upgrade to take at least 3 hours.
HBase Upgrade Process
Because of the significant impact of upgrading HBase versions, two checkpoint releases are required to facilitate this upgrade: to v2022.001, which includes an upgrade to the last HBase 1.x release, and then to v2022.002, which includes the first HBase 2.x release.
When you run the upgrade command to upgrade from v2022.001 to v2022.002, you must provide an additional flag, --exportHBaseSnapshots
, to the admin utility (unify-admin.sh). In addition to running the upgrade, this flag takes a snapshot of each table in HBase 1.3.1, and exports to a given directory, which you specify using the configuration variable TAMR_HBASE_SNAPSHOT_URI
. Once Tamr Core upgrades to v2022.002, snapshots automatically import into HBase 2.3.6 as part of the post-upgrade scripts.
Note: If you’re upgrading to v2022.002 and the first upgrade attempt fails for any reason, rerun your upgrade with the --rerun
flag but do not include the --exportHBaseSnapshots
flag again.
About Spark Upgrades
Periodically, Tamr Core upgrades the Spark version. The release notes indicate these changes. Upgrading the Spark version occurs as part of the upgrade process to the version that contains the upgraded version of Spark.
Starting with v2020.015.0, Tamr Core uses Spark 2.4.5. When you upgrade to v2020.015.0 or greater, the upgrade process leaves the Spark 2.2 directory, ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7
, as is. After you complete the upgrade and run the upgrade validation checks, you can copy any files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7
directory that you wish to keep and move them to the corresponding directory for Spark 2.4. You can then remove the Spark 2.2 directory.
About Elasticsearch Upgrades
Periodically, Tamr Core upgrades the Elasticsearch version. The release notes indicate these changes. When an upgrade to Elasticsearch is required, Tamr Core must reindex all projects and datasets after upgrading. As a result, it takes longer to upgrade to a release with a new version of Elasticsearch than other release upgrades.
v2022.005.0 Upgrade Considerations
Previously, Tamr Core provided an API-only auxiliary service, df-connect, which enabled developers to import and export data files between Tamr Core and a variety of cloud storage providers. Beginning in v2022.005.0, Tamr integrated this service into Tamr Core as the Core Connect feature. This feature is currently available through the Connect API. To learn more about Core Connect, see Core Connect.
The Core Connect APIs can also be used in place of the Data Movement Service (DMS).
Important notes on Parquet support: In versions v2022.005 through v2022.008, Core Connect does not support Parquet files. See For Customers Using the Data Movement Service.
Discontinuing Use of the df-connect Auxiliary Service
Starting with v2022.005, the Core Connect service installs as part of the standard Tamr Core installation and upgrade. As a result, you will no longer use the df-connect auxiliary service.
When you upgrade from a version prior to v2022.005, such as the v2022.002 checkpoint release, to v2022.005 or later, you discontinue df-connect and begin using Core Connect in its place. Follow the before and after upgrading instructions that follow.
Before Upgrading to v2022.005 or Later
Disable the df-connect auxiliary service to avoid errors during upgrade.
After Upgrading to v2022.005 or Later
- As part of integrating the df-connect service into Tamr Core, the new Core Connect API is significantly expanded and improved. Additionally, the default port for the Core Connect service changes from 9030 to 9050 during upgrade. The new Core Connect API is also available through port 9100 (for example,
http://localhost:9100/api/connect/jdbcIngest
). These differences require updates to your import/export scripts. - Several new keys are available in the response JSON for calls to the Connect API ingest and export endpoints. Optionally, update your import/export scripts to use these keys.
config
, which includes:authTokenEndpoint
clientID
datasetName
primaryKey
region
url
jobID
jobStatus
submittedTimeUtc
See the Connect API Swagger documentation, available at http://<localhost>:9100/docs
, for complete information about each API call.
Tamr plans additional changes and enhancements to the Core Connect API. Be sure to reference the release notes and Using the Core Connect API for more information.
If you have questions, contact Tamr Support at [email protected].
For Customers Using the Data Movement Service
Important notes on Parquet support: Currently, Core Connect does not support Parquet files. If you currently use the Data Movement Service (DMS) API and require Parquet support, continue to use DMS. If you currently use the DMS API and do not require Parquet support, you can update your scripts to use Connect instead, following the guidance in the bullets below. If you have questions, contact Tamr Support at [email protected].
- Truncation on import: To import datasets destructively (that is, to delete all data from the target dataset before file import), update import scripts to include the
truncateTamrDataset
parameter. - Profiling on import: If you need to profile datasets upon upload, you must do this manually beginning in v2022.005. Due to this change, the record count in the Tamr Core UI may be incorrect. To fix the count, re-profile and refresh. You can do this by updating your scripts to call the profile API after running an ingest job in Core Connect.
If you have any questions, contact Tamr Support at [email protected].
Upgrading and Primary Key Management for LOOKUP Statements
Starting with v2020.016 and greater, Tamr Core automatically assigns primary keys to all LOOKUP
statements with non-equality join conditions that you add in this version or in subsequent versions. This means that Tamr Core will change primary keys (tamr_ids
) for such LOOKUP
statements.
To avoid disruptions to LOOKUP
statements written in versions before v2020.016, during the upgrade to this version, Tamr Core automatically runs an upgrade script that disables automatic assignment of primary keys for existing LOOKUP
statements with non-equality join conditions. For more information, see Lookup.
The script prevents breaking any current projects that contain LOOKUP
statements with non-equality join conditions and that depend on primary keys staying the same as in the Tamr Core version from which you are upgrading.
The script adds the text hint(pkmanagement.manual)
in front of these statements. See Labels, Hints, and Scope. Once the upgrade script completes, it issues a report listing all the projects and their transformations that were changed. It also lists any projects and transformations that could not be updated with the text hint(pkmanagement.manual)
due to parsing or linting errors.
Prerequisites for Upgrading Tamr Core
Before You Begin:
Important: Version v2020.021.0 or later: You must run the dataset cleanup maintenance utility,
CleanupIncompletelyDeletedProjects
, and delete any unnecessary datasets. See Dataset Cleanup.
- The current Tamr Core version must be at least
2019.019
. - The current user is the functional user, such as
tamr
. - The software bundle
unify.zip
of the target version, and any interim checkpoint versions, is available. - Tamr Core and its dependencies are running.
- PostgreSQL is upgraded to the required version. See Requirements and Upgrading Postgres.
- Version v2021.016.0 and earlier: Verify that
ulimit
andvm.max_map_count
are set correctly for the target version. See Setting ulimit Limits. - Verify that there is at least 30-40% of free disk space available on the instance to store backups. (Elasticsearch does not allocate shards if more than 85% of disk space is utilized.) See the Tamr Core Help Center for instructions.
Skipping Validation Checks Before Upgrades
Validation checks run before upgrades by default and Tamr Core recommends that you do not skip them. However, the --skipEnvironmentValidation
flag for the <tamr-home-directory>/tamr/utils/unify-admin.sh --upgrade
command allows you to skip all, or a specified, system validation check at the start of the upgrade command.
This flag is useful, for example, if you have upgraded Tamr Core-dependent components, such as PostgreSQL, in your current version of Tamr Core, and before upgrading to the Tamr Core version in which a specific version of PostgreSQL is required. Since the upgrade process checks for the required versions of all dependent components for both release versions involved in the upgrade, you can use this flag to avoid an upgrade check failure.
If used, this flag allows an upgrade process for Tamr Core to proceed with a potentially invalid configuration which can cause it to fail. For more information about validation checks, see Validation.
Upgrade Options
The following options are required:
--installDir <installDir>
The current installation on disk.--zipFile <zipFile>
or--upgradeDir <upgradeDir>
The path to the target upgrade ZIP file or to the directory that contains the extracted upgrade ZIP file. Use only one of these options.--exportHBaseSnapshots
Required only for single-node deployments when upgrading from v2022.001.0 to v2022.002 as part of HBase upgrade. After Tamr Core upgrades to v2022.002, snapshots automatically import into HBase 2.3.6 as part of the post-upgrade scripts.
Note: If you’re upgrading to v2022.002 and the first upgrade attempt fails for any reason, rerun your upgrade with the--rerun
flag but do not include the--exportHBaseSnapshots
flag again.
The following options are optional:
--exportHBaseSnapshots
Optional before v2022.001. Takes a snapshot of each table in HBase 1.3.1, and exports to a given directory, which you specify using the configuration variableTAMR_HBASE_SNAPSHOT_URI
.--zookeeper <full-zk-conf-node-url>
The Zookeeper URL of the Tamr Core configuration node, such aszk://localhost:21281/tamr/unify001/conf
. If not included, the script checks the admin utilities properties file for this URL.--backup
Set the system to back up before upgrading.--healthcheckTimeout <healthcheckTimeout>
Set how long to wait for the health checks to time out.--help
Print out a help message.--nobackup
Set the system to not back up before upgrading.--rerun
Re-run the upgrade against the current version of the product. Useful if an error occurs during upgrade and you want to re-attempt the upgrade. To use, include--rerun
immediately after--upgrade
.--tempDir <tempDir>
A path to which to extract the ZIP file. If not specified, defaults to systemtemp
directory.--skipEnvironmentValidation <name of validator>
Avoid running all, or a specified, script to validate whether the current environment meets the requirements for the upgrade version of the product. See Skipping Validation Checks Before Upgrading. To use, include--skipEnvironmentValidation
as the final option.--forceDatasetMaterialize
After the upgrade process completes, run scripts to re-update all datasets (this includes unified datasets, results datasets, and internal datasets) to Elasticsearch. This triggers reindexing jobs in Tamr Core.
Upgrade Procedure
Tip: When you run the upgrade script, it automatically deletes any files that are unrelated to Tamr Core from the home directory. Be sure to move any scripts, data directories, or other files to another directory before you upgrade.
To upgrade Tamr Core to a newer version:
- Back up the Tamr Core version you are upgrading from. See Backup.
- If you are using any auxiliary services, disable them before proceeding with the upgrade. See Disabling an Auxiliary Service.
- If upgrading from an unpatched version, run the administrative utility
unify-admin.sh
with the commandupgrade
and the options--zipFile
and--installDir
. Optionally include--zookeeper
,--tempDir
, and other options above. For example:
cd <tamr-home-directory>/tamr/utils
./unify-admin.sh --upgrade --zipFile <full-path-to-target-version-unify-zip> --installDir <full-path-to-tamr-unify-home> --zookeeper zk://localhost:21281/tamr/unify001/conf --tempDir <full-path-to-target-unzip-directory>
Important: If you are upgrading from checkpoint v2022.001.0 to v2022.002.0, for single-node deployments, you must provide an additional flag,
--exportHBaseSnapshots
, to the admin utility (unify-admin.sh) during upgrade. The export can require significant time: see the Prerequisites for v2022.002 for more information.
Note: If you’re upgrading to v2022.002 and the first upgrade attempt fails for any reason, rerun your upgrade with the--rerun
flag but do not include the--exportHBaseSnapshots
flag again.
- If you are upgrading from a patched version, for example, v2020.008.1, and restored from a backup of a major version (without a patch), for example, v2020.008.0, then run the upgrade with the
--skipUpgradeStatusValidation
flag to ignore the check for a patched release. - Validate the upgrade. See Validation.
- If you are using any auxiliary services, install the version that matches your upgraded Tamr Core instance. See Installing an Auxiliary Service.
- Clear your web browser cache before signing in to Tamr Core.
Post-Upgrade Steps
Version v2020.015.0 or later: Due to the upgrade of Spark from 2.2 to 2.4, which occurs in the v2020.015.0 release, after you upgrade to this version or greater you may need to examine the files in the ${TAMR_HOME}/spark-2.2.0-bin-hadoop2.7
directory that you wish to keep and move them to the corresponding directory for Spark 2.4.x. This precaution is rarely needed. In most cases, Tamr Core deployments do not contain any Spark customizations.
Upgrade Troubleshooting Tips
If Tamr Core times out when starting up:
- Do not stop and restart Tamr Core; upgrade scripts could still be running. Interrupting the scripts can break the system and/or result in the need to rerun the upgrade.
- Use the service health API to investigate the issue.
- Refer to the unify.log file to check whether progress is being made in starting Tamr Core.
If upgrade fails due to an Elasticsearch issue:
- Do not immediately clear Elasticsearch.
- Refer to the Elasticsearch logs to troubleshoot the underlying issue. See Elasticsearch logging for single-node on-premises deployments or cloud platform service logs for cloud deployments. When you have corrected the issue, rerun the the upgrade with
--rerun
.
See the Tamr Core Help Center for additional upgrade troubleshooting information.
Updated about 1 year ago