Tamr Documentation

Release Notes

Notes for Tamr v2021.3.0-2021.6.0

Tamr v2021.002.0 is a checkpoint release. When you upgrade from an earlier version of Tamr you must upgrade to each of the intervening checkpoint versions, including Tamr v2021.002 and v2020.016, before upgrading to a later version. The upgrade utility prevents you from upgrading past a checkpoint version. For example, the following upgrade paths are allowed: v2020.017 -> v2021.002, v2021.001 -> v2021.002. These upgrade paths are prevented: v2020.016 -> v2021.003, v2019.019 -> v2021.002.

For information about installing a new version, see Upgrading Tamr.

Release Notes - Version v2021.004.0

These release notes list corrected issues in this release and known issues.

Fixed Support Issues

This release corrects the following errors.

  • Tamr often fails to provide error messages for job failures on AWS scale out. Affects versions: v2020.020.2. Fix versions: v2021.004.0.
  • Publish clusters job initially fails without an error message, and succeeds after resubmission. Affects versions: v2020.020.1. Fix versions: v2021.004.0.
  • Clusters records job initially fails with '"TreeNodeException", and succeeds after resubmission. Affects versions: v2020.020.2. Fix versions: v2021.004.0.
  • Publish clusters job initially fails with "NullPointerException", and succeeds after resubmission. Affects versions: v2020.020.2. Fix versions: v2021.004.0.
  • "Generate SM suggestions" button not clickable after model import. Affects versions: v2020.012.0. Fix versions: v2021.004.0.

Release Notes - Version v2021.003.0

These release notes list what's new in this release, corrected issues, and known issues.

What's New

This release includes:

Spark config overrides are changing for ephemeral EMR spark.

There are now only two fields that are supported to override within the sparkDeploymentConfig map, clusterNamePrefix and runJobFlowRequest. The values are representative of what you would set for the following Tamr configurations:

  • clusterNamePrefix > TAMR_DATASET_EMR_CLUSTER_NAME_PREFIX
  • runJobFlowRequest > TAMR_DATASET_EMR_RUN_JOB_FLOW_REQUEST

TAMR_JOB_SPARK_CONFIG_OVERRIDES: '[{"name": "adjustedInstanceCount", "sparkDeploymentConfig": {"clusterNamePrefix":"", "runJobFlowRequest": "..."} }]'

Fixed Support Issues

This release corrects the following errors.

  • CSV export download does not work on AWS scale-out. Affects versions: v2020.013.0. Fix versions: v2021.003.0.

Known Issues

The following are known issues.

  • If no clusters are generated, the project's dashboard and the project's clusters page throw an HTTP 500 error.
  • In golden records projects, custom expressions that use the collect_set function now produce an "Unable to fetch preview argument type mismatch" error. A possible workaround is to replace collect_set with collect_list in your expression. See Aggregate Functions.
  • When you upload a dataset into Tamr, the datatype of the primary key column must be a string-type. Tamr does not verify the datatype, resulting in errors in subsequent processes if the primary key has an integer or other datatype.
  • Status field (text and icon) on the Jobs page is not centered and the icon is truncated.
  • Mapped/Unmapped attribute filters are not working on any downstream project in a project with chained datasets, after an upgrade to v.2019.023.1 and greater. If you encounter this issue, contact Tamr Support for information about a workaround (running an internal-only API request that calculates attribute mappings in this case).
  • Column resizing on the Users page does not behave as expected.
  • The schema mapping project is not showing out-of-dateness for projects.
  • The Unified Dataset page throws an error in the user interface when you are logged in as a reviewer.
  • Job submission for chained projects may not appear immediately on the Jobs page after choosing Submit. Submit is not disabled in this case. Pre-processing of dataset versions takes place before Tamr submits the jobs to Spark and Tamr is not currently accounting for this time on the Jobs page.
  • The job for Updating results is not showing the project it is associated with on the Jobs page.
  • The upgrade process updates all record pair feedback to use unified record IDs instead of origin record IDs. This process runs automatically when upgrading. However, this process depends on Elasticsearch index being up-to-date for the unified dataset before you start an upgrade process. In cases where the index is not up-to-date at the time of upgrading to version v.2019.024 or greater, the upgrade process will have no effect and the pre-upgrade pair feedback will not be migrated or deleted. As a workaround, before you upgrade, index the unified dataset in Elasticsearch, and after you upgrade, run the following endpoint manually: /api/dedup/pairs/feedback/migrate, and then run the job that updates pairs for your project.

Known Issues with Geospatial Support

Features for working with geospatial data are currently available for beta testing only. For information about the current feature set, see Working with Geospatial Data.

Updated 5 days ago


Release Notes


Notes for Tamr v2021.3.0-2021.6.0

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.