Tamr Documentation

Release Notes

Notes for Tamr v2021.007.0-2021.010.0

Tamr v2021.002.0 is a checkpoint release. When you upgrade from an earlier version of Tamr you must upgrade to each of the intervening checkpoint versions, including Tamr v2021.002 and v2020.016, before upgrading to a later version. The upgrade utility prevents you from upgrading past a checkpoint version. For example, the following upgrade paths are allowed: v2020.017 -> v2021.002, v2021.001 -> v2021.002. These upgrade paths are prevented: v2020.016 -> v2021.003, v2019.019 -> v2021.002.

For information about installing a new version, see Upgrading Tamr.

Release Notes - Version v2021.010.0

Not yet released.

Release Notes - Version v2021.009.0

These release notes list what's new in this release, corrected issues, and known issues.

New Features and Improvements

The following new features are included in this release.

  • Make HBASE Peak/OffPeak windows configurable.
  • SUP-4847 Implement conditionality for 'View cluster metrics'.

Fixed Support Issues

This release corrects the following errors.

  • AWS EMR Ephemeral Spark cluster instance groups not being named correctly. Affects versions: v2021.008.0. Fix versions: v2021.009.0.
  • Make HBASE Peak/OffPeak windows configurable. Affects versions: . Fix versions: v2021.009.0.
  • SUP-4879 LLM not working during backup. Affects versions: . Fix versions: v2021.009.0.
  • SUP-4847 Implement conditionality for 'View cluster metrics'. Affects versions: . Fix versions: v2021.009.0.

Release Notes - Version v2021.008.0

These release notes list what's new in this release, corrected issues, and known issues.

New Features and Improvements

The following new features are included in this release.

  • The dropdown list for attributes on the blocking model page of a mastering project now provides a tooltip on mouseover with the full attribute name. Previously, the list was too narrow to show long attribute names.
  • New Tamr configuration variable for setting AMI in RunJobFlowRequest.

Fixed Support Issues

This release corrects the following errors.

  • Default value for TAMR_BIGQUERY_ENABLED gives errors in dataset.log when not using bigquery. Affects versions: v2021.006.0.
  • Show full attribute name on blocking model page. Affects versions: v2019.023.1.
  • Unified Attribute side of schema mapping does not show correct number of source attributes. Affects versions: v0.39.0, v2021.001.0.

Release Notes - Version v2021.007.0

These release notes list what's new in this release, corrected issues, and known issues.

What's New

This release includes:

  • We now support overriding the following Databricks-specific parameters using TAMR_JOB_SPARK_CONFIG_OVERRIDES:
    • minWorkers - Maps to TAMR_JOB_DATABRICKS_MIN_WORKERS
    • maxWorkers - Maps to TAMR_JOB_DATABRICKS_MAX_WORKERS
    • databricksNodeType - Maps to TAMR_JOB_DATABRICKS_NODE_TYPE

These are members of the sparkDeploymentConfig map.

An example of overriding only these values can be found below (with required property name included):

TAMR_JOB_SPARK_CONFIG_OVERRIDES: "[{
name: databricksOverrides,
sparkDeploymentConfig: {
minWorkers: 5,
maxWorkers: 6,
databricksNodeType: Standard_DS4_v2;
}
}]

New Features and Improvements

The following new features are included in this release.

  • Support spark overrides for Databricks cluster specifications.

Fixed Support Issues

This release corrects the following errors.

  • Unable to apply feedback and updates classification results, receiving error java.lang.OutOfMemoryError: Java heap space . Affects versions: v2021.004.0. Fix versions: v2021.007.0.

Recently Added Support Help Center Articles

We recently added several articles to the Support Help Center knowledge base:

Visit the Support Help Center. If you do not have a Help Center account, you will need to create one in order to view the articles.

Known Issues

The following are known issues.

  • In Categorization projects, attempting to un-verify a categorization by clicking the verified icon in the record’s Details tab on the Transactions page results in an error. To work around this issue, un-verify a categorization by selecting New Categorization from the top menu or Details tab. Then, choose Clear categorization.
  • If no clusters are generated, the mastering project's dashboard and the project's clusters page throw an HTTP 500 error.
  • In golden records projects, custom expressions that use the collect_set function now produce an "Unable to fetch preview argument type mismatch" error. A possible workaround is to replace collect_set with collect_list in your expression. See Aggregate Functions.
  • When you upload a dataset into Tamr, the datatype of the primary key column must be a string-type. Tamr does not verify the datatype, resulting in errors in subsequent processes if the primary key has an integer or other datatype.
  • Status field (text and icon) on the Jobs page is not centered and the icon is truncated.
  • Mapped/Unmapped attribute filters are not working on any downstream project in a project with chained datasets, after an upgrade to v.2019.023.1 and greater. If you encounter this issue, contact Tamr Support for information about a workaround (running an internal-only API request that calculates attribute mappings in this case).
  • Column resizing on the Users page does not behave as expected.
  • The schema mapping project is not showing out-of-dateness for projects.
  • The Unified Dataset page throws an error in the user interface when you are logged in as a reviewer.
  • Job submission for chained projects may not appear immediately on the Jobs page after choosing Submit. Submit is not disabled in this case. Pre-processing of dataset versions takes place before Tamr submits the jobs to Spark and Tamr is not currently accounting for this time on the Jobs page.
  • The job for Updating results is not showing the project it is associated with on the Jobs page.
  • The upgrade process updates all record pair feedback to use unified record IDs instead of origin record IDs. This process runs automatically when upgrading. However, this process depends on Elasticsearch index being up-to-date for the unified dataset before you start an upgrade process. In cases where the index is not up-to-date at the time of upgrading to version v.2019.024 or greater, the upgrade process will have no effect and the pre-upgrade pair feedback will not be migrated or deleted. As a workaround, before you upgrade, index the unified dataset in Elasticsearch, and after you upgrade, run the following endpoint manually: /api/dedup/pairs/feedback/migrate, and then run the job that updates pairs for your project.

Known Issues with Geospatial Support

Features for working with geospatial data are currently available for beta testing only. For information about the current feature set, see Working with Geospatial Data.

Important Notes for DMS

  • The current version of the data movement service (DMS) supports API interaction through command-line utilities, including cURL, only.
  • DMS does not support Parquet files that include arrays with nulls.
  • DMS jobs:
    • DMS jobs are not persisted; upon restart, previous and in progress DMS jobs are no longer listed.
    • For DMS jobs, the job ID is a GUID created by DMS and uses a different format than the numeric job
      IDs created by Tamr.
    • For failed DMS jobs, tmp/.tmp files created during the upload process are not deleted as expected, and can consume a large amount of disk space. For successful DMS jobs, the tmp/.tmp files are deleted.
    • DMS job status does not appear immediately and the progress bar is not in sync with the job status.
    • For successfully completed DMS jobs, the status is completed, instead of succeeded which is reported for other Tamr jobs.

Updated 15 days ago


Release Notes


Notes for Tamr v2021.007.0-2021.010.0

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.