User GuidesAPI ReferenceRelease NotesEnrichment APIs
Doc HomeHelp CenterLog In

2021 Tamr Core Release Notes

These release notes describe new features, improvements, and corrected issues in each Tamr Core 2021 release.

See Tamr Core Release Notes for important information for all releases, including upgrade instructions and checkpoint releases.

Other Tamr Core releases:

Tamr Core 2021 Releases

v2021.021.0 Release Notes

Important Support Notes for this Release

This release is not supported for GCP Native deployments.

If you have configured Data Movement Service (DMS) for data import and export, in this release Tamr only supports using up to 4 threads for data import.

13651365

In release v2021.021.0, Tamr only supports using up to 4 threads for data import.

New Features and Improvements

The following new features are included in this release.

Update Tamr Core to use log4j 2.17.1. This release addresses the following Apache Log4j vulnerabilities by updating Tamr Core to use Apache Log4j version 2.17.1:

  • Apache Log4j CVE-2021-44832
  • Apache Log4j CVE-2021-45105
  • Apache Log4j CVE-2021-45046
  • Apache Log4j CVE-2021-44228

For full details regarding these vulnerabilities and Tamr Core, refer to Tamr’s page Updates on Apache Log4j Vulnerabilities.

This release fully remediates these vulnerabilities in Tamr Core and Elasticsearch. Install this release regardless of whether you have taken any of the remediation steps in the page above.

  • Add sorting for Submitted and Ended columns to the Jobs table. In the Jobs table, sorting is now available for the “Submitted” and “Ended” columns.
  • Add “Categorized by” filter for Categorization projects. In the categorized records page, a new “Categorized by” filter is available to filter to records categorized by specific users.

Fixed Issues

This release corrects the following errors.

  • Logging in via SAML authentication results in HTTP 500 error. Found in: v2021.014.0. Fix versions: v2021.021.0, v2021.014.1.
  • Tamr starts despite ulimit validation failing. Found in: v2021.018.0. Fix versions: v2021.021.0.
  • Input transformations not updated when a dataset is removed from a project. Found in: v2021.009.1. Fix versions: v2021.021.0.
  • Searching username metadata in categorization project doesn't work . Found in: v2021.006.0, v2021.002.2. Fix versions: v2021.021.0.
  • Single node Tamr configured for one-job-at-a-time running concurrent jobs leading to failures. Found in: v2021.010.0. Fix versions: v2021.021.0.
  • Schema Mapping mapped/unmapped filters are not working. Found in: v2020.007.0, v2020.004.1, v2021.002.1. Fix versions: v2021.021.0.
  • Long cluster names hide the “two-pane” button, in Clusters page. Found in: v2021.010.0. Fix versions: v2021.021.0.
  • Filter in Dataset 'Sample, Group-by and API-derived' page in dataset catalog page is broken. Found in: v2021.006.0. Fix versions: v2021.021.0.
  • Filter in Datasets page returns incorrect results. Found in: v2021.006.0. Fix versions: v2021.021.0.
  • Filtering to mapped source attributes in schema mapping does not load correctly. Found in: v2021.006.0. Fix versions: v2021.021.0.

Back to top


v2021.020.2 Patch Release Notes

This patch provides the following updates to the Data Movement Service (DMS):

  • The ability to import and export files with over 20M records in Parquet format on ADLS Gen2.
  • Intermediary data from Tamr Core in Avro format, instead of JSON allows faster and more compressed streaming.
  • Local download, instead of streaming, works around HTTPIdleTimeouts and reduces load on server resources while allowing for a variety of Parquet formats (including the slower gzipped Parquet files).
  • Partitioned Parquet in chunks of 2.5M records allows for parallel processing of Parquet.
  • Snappy compression on Parquet reduces file size, saves storage space, and improves performance. See the important notes below for compression recommendations.
  • Bloom filters, which allow for enhanced downstream analytics capabilities.

importantimportant Important:

DMS Hadoop Logging must be turned off. In order for ingest operations to succeed on ADLS Gen2, DMS Hadoop logging must be turned off.
Compression recommendations. If using compression, Tamr recommends using:

  1. Snappy compression.
  2. GZip at or below level 6. Using GZip above level 6 results in significantly increased load time (16x higher).

v2021.020.1 Patch Release Notes

This patch addresses the following Apache Log4j vulnerabilities by updating Tamr Core to use Apache Log4j version 2.17.0:

  • Apache Log4j CVE-2021-45105
  • Apache Log4j CVE-2021-45046
  • Apache Log4j CVE-2021-44228

For full details regarding these vulnerabilities and Tamr Core, refer to Tamr's Updates on Apache Log4j Vulnerabilities article.

This patch fully remediates these three vulnerabilities in Tamr Core and Elasticsearch. Install this patch regardless of whether you have taken any of the remediation steps in the article referenced above.

v2021.020.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • Add a config for maxEdgesPerPartition. A new configuration variable, TAMR_MAX_EDGES_PER_PARTITION, is available to help you tune clustering performance. See the Configuration Variable Reference
  • Provide the ability to search the Jobs page. The Jobs page now offers a Search field to filter the list of jobs by description. See Viewing Job Details.
  • Allow performing incremental enrichment.

Fixed Issues

This release corrects the following errors.

  • Enrichment project generated dataset producing a 400 error. Found in: v2021.010.2. Fix versions: v2021.020.0.
  • Project export job failing. Found in: v2021.014.0. Fix versions: v2021.020.0.
  • Project name can contain characters that break project export. Found in: v2021.006.0. Fix versions: v2021.020.0. This change addresses problems caused by project names that include unsupported characters. Beginning with this release, the naming convention for files created by the project movement export operation is export-<project_id>-<timestamp>, where the <project_id> replaces the `<project_name>'. Note that this change only affects newly created exports. Existing exports will still have the old naming format.
  • Start DMS via start-unify, and not start-dependencies. Found in: . Fix versions: v2021.020.0.
  • Security enhancements for user lists.

Back to top


v2021.019.1 Patch Release Notes

  • This patch release corrects auxiliary service installation failures due to missing configuration.

v2021.019.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • Browser tab naming for easier browser tab navigation.
  • Add project name as page header in Golden Records > Rules page.
  • Resolve warning messages when running start-dependencies.sh or unify-admin.sh.
  • Security enhancements for user lists.

Fixed Support Issues

This release corrects the following errors.

  • High-impact pairs don't show up in UI after running train/predict with new feedback. Affects versions: v2021.015.0. Fix versions: v2021.019.0.
  • Add project name as page header in Golden Records > Rules page. Affects versions: . Fix versions: v2021.019.0.
  • Memory allocated to Tamr dependencies and micro-services is greater than available memory in the machine. Affects versions: v2020.016.4. Fix versions: v2021.019.0.
  • Categorization dashboard now available for reviewer role. Affects versions: v2021.006.0. Fix versions: v2021.019.0.
  • Improve base memory and validator calculations. Affects versions: v2021.010.0. Fix versions: v2021.019.0.

Back to top


v2021.018.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • DMS now makes source data array types for newly created datasets from csv files. Existing datasets created by DMS will continue to have string types and the data will continue to be appended as strings.
  • A new TAMR_JOB_SPARK_NAME_TEMPLATE Zookeeper variable is now appended to job names in Spark, to allow for customization of Spark job names.

Fixed Support Issues

This release corrects the following errors.

  • Fixed versioned GET taxonomy API issues for projects with taxonomies uploaded via UI before v2021.016. Affects versions: v2021.002.2, v2021.006.0. Fix versions: v2021.018.0.
  • Fixed Tamr UI performance issue. Affects versions: v2020.012.0. Fix versions: v2021.018.0, v2021.002.3, v2020.016.5.
  • Fixed Bulk Match API intermittently failing on AWS with "AmazonS3Exception Slow Down" error. Affects versions: v2020.020.1. Fix versions: v2021.018.0, v2020.020.3.
  • Fixed issue with reliable pair estimate at scale on AWS due to "AmazonS3Exception Slow Down" error. Affects versions: v2020.020.2. Fix versions: v2021.018.0, v2020.020.3.

Back to top


v2021.017.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • Optimize pair generation jobs when using unweighted tokenizations.

Validation improvements:

  • Validate that the vm.max_map_count (which specifies the maximum number of memory map areas) is at least the required value of 262144.
  • Run ulimit validator as part of preupgrade validation.
  • For the ulimit validator, check the open file ulimit against a maximum of 1000000, instead of 66000.
  • Run certain validators, including ulimits every time Tamr is started.

Fixed Support Issues

This release corrects the following errors.

  • Improve behavior of Tamr when dataset migration takes a long time during upgrade. Affects versions: v2020.014.0. Fix versions: v2021.017.0.
  • Ingesting a file via DMS fails when values include unescaped commas. Affects versions: v2021.006.1. Fix versions: v2021.017.0.
  • Parquet files greater than 2GB created by DMS cannot be read. Affects versions: v2021.006.0. Fix versions: v2021.017.0.
  • Input transformation json lacks input dataset information. Affects versions: v2020.024.1. Fix versions: v2021.017.0.
  • Use validation scripts and add them to start-dependencies and start-unify by default, not just upgrade. Affects versions: . Fix versions: v2021.017.0.
  • Project Movement failing to materialize transformations. Affects versions: v2021.010.2. Fix versions: v2021.017.0.
  • UX improvements to reduce confusion when uploading dataset via DMS. Affects versions: v2021.010.2. Fix versions: v2021.017.0.
  • Selecting multiple input dataset in Project transformations does not update backend. Affects versions: . Fix versions: v2021.017.0.

Back to top


v2021.016.0 Release Notes

What's New

Persistent Cluster ID Improvements

Tamr Core now assigns persistent cluster IDs to records in mastering projects the first time you run the "Apply feedback and update results" or "Update results only" job. Cluster IDs are re-assigned and updated to reflect your feedback each time you run the Review and publish clusters job. Previously, persistent IDs were not assigned to clusters on creation.

New Verifier Guide to Support Verifier Role

A Verifier Guide is available to support users in the Verifier role. Verifiers are subject matter experts who use Tamr Core to provide input to mastering and categorization workflows and manage review assignments. Verifiers can complete all tasks that Reviewers complete. Additionally, similar to Curators, Verifiers act as coordinators in mastering and categorization projects by assigning tasks and verifying pairs and clusters or categorizations. To support this new role, topics in the Curator Guide that apply to both Verifiers and Curators have been moved to the Verifier Guide.

New Topics Related to Deploying Tamr Core on AWS

Two new topics are available in the System Administrator Guide related to deploying Tamr Core on AWS:

Fixed Support Issues

This release corrects the following errors.

  • Ingesting via DMS UI does not expose correct attributes in csv. Affects versions: v2021.010.2. Fix versions: v2021.016.0.
  • Using the DMS UI browser for ADLS crashes DMS. Affects versions: v2021.010.2. Fix versions: v2021.016.0.
  • TAMR_JOB_SPARK_CONFIG_OVERRIDES not getting deserialized, preventing Tamr from starting. Affects versions: v2021.010.0. Fix versions: v2021.016.0.
  • Failed DMS jobs shows state as running indefinitely, preventing further DMS jobs . Affects versions: v2021.006.0. Fix versions: v2021.016.0.
  • Spark overrides should not need to be fully specified. Affects versions: v2021.003.0. Fix versions: v2021.016.0.
  • Versioned get taxonomy endpoint does not work. Affects versions: v2021.002.0, v2021.006.0. Fix versions: v2021.016.0.

Back to top


v2021.015.0 Release Notes

These release notes list what's new in this release, corrected issues, and known issues.

New Features and Improvements

The following new features are included in this release.

New API endpoint for LLM

A new GET /api/v1/projects/{projectName}/lastLlmUpdate endpoint and call has been added to the match service API, which returns the last modification timestamp. This timestamp represents the start of the job to take a snapshot for the LLM update. The timestamp is returned in ISO 8601/RFC 3339 format if this project is LLM-enabled. If the project exists but is not LLM-enabled, an empty response is returned. If the specified project does not exist, a 404 error code is returned.

importantimportant Important: After upgrading to this version, calls to this endpoint for existing LLM projects will return empty responses until either:

  • A user calls updateLLM (http://<host>:9100/api/versioned/v1/projects/{project}:updateLLM).
  • A user publishes clusters.

Note: The LLM feature is in limited release.

Fixed Support Issues

This release corrects the following errors.

  • Project Movement failIfNotPresent flag too sensitive to use. Affects versions: v2021.010.2. Fix versions: v2021.015.0.
  • Cannot filter to attributes mapped to unified attributes. Affects versions: v2021.010.0, v2021.011.0. Fix versions: v2021.015.0.

Back to top


v2021.014.1 Patch Release Notes

This patch release corrects the following issues.

  • Logging in via SAML authentication results in HTTP 500 error

v2021.014.0 Release Notes

What's New

Verifier Role

This release includes a new Verifier role. This role is best suited for subject matter experts who will use Tamr Core only to provide input to your workflow and assign review tasks to other users. Verifiers cannot perform any actions that affect the underlying model.

This role allows subject matter experts to perform the following actions in mastering projects:

  • View unified data
  • Create and manage pair and cluster review assignments
  • Label pairs and clusters
  • Verify pairs and clusters

In categorization projects, Verifiers can assign and verify categories.

If users require permission to configure and run the underlying model, grant them Curator role access instead.

For more information about Verifier role permissions, see the Permissions Matrix by User Role.

New Features and Improvements

  • The system is set to read-only state when using the Project Movement API to import or export projects. This change ensures that the project and its data are protected during the operation. When the operation is complete, the system returns to read-write, and you can then perform actions and run jobs.
  • Additional configuration options are available the Databricks client library features.

Fixed Support Issues

This release corrects the following errors.

  • The backup process did not clear the tmp directory, causing the database connection to fail. Affects versions: v2021.010.0. Fix versions: v2021.014.0.
  • Due to an issue with the cluster metrics dataset, the “Estimate cluster metrics” option was unavailable and predict clusters jobs were cancelled. Affects versions: v2021.010.0. Fix versions: v2021.014.0, v2021.010.2.
  • When running unify-admin.sh validate, erroneous warnings were generated for missing optional files. Affects versions: v2021.010.0. Fix versions: v2021.014.0.
  • Upgrade failed due to inability to connect to dataset service. Affects versions: v2021.010.0. Fix versions: v2021.014.0, v2021.010.2.
  • When using relative Hausdorff distance, you cannot get to the geo shape overlay on the pairs page. Affects versions: v2019.017.0, v2020.024.1. Fix versions: v2021.014.0.
  • Datasets appear to have a smaller number of records than expected in deployments that use HBase. Affects versions: v2021.010.0. Fix versions: v2021.014.0, v2021.010.2.
  • Using geometry fields with PreGroupBy caused the generate pairs job to fail. Affects versions: v2021.009.0. Fix versions: v2021.014.0.
  • Project Import fails due to changes to Unified Attributes causing invalid transformations. Affects versions: v2021.010.1. Fix versions: v2021.014.0.
  • Match service should throw an error if the project does not exist. Affects versions: v2021.006.0. Fix versions: v2021.014.0.

Back to top


v2021.012.1 Patch Release Notes

The patch version addresses the following issues:

  • For single-node deployments, the unify-data directory was not included in the backup which could potentially cause dataset exports to fail.
  • Restored mastering workflows failed due to the Tamr restore process failing to pull in a needed piece of information from the backup file.
  • Restore failed if the Low-Latency Match (LLM) service automatically polled for updates during restore. The LLM service no longer automatically polls when the system in read-only mode.

v2021.012.0 Release notes

New Features and Improvements

The following new features are included in this release.

  • Enable change order of columns for clusters in GR project.

Fixed Support Issues

This release corrects the following errors.

  • Cannot change order of columns for clusters in GR project. Affects versions: v2020.017.0. Fix versions: v2021.012.0.
  • Investigation to why dozens of hbase_configNNNN... folders in /tmp folder. Affects versions: v2021.002.0. Fix versions: v2021.012.0.
  • Adding a return character at the end of the license key doesn't break Installation but breaks Upgrade. Affects versions: . Fix versions: v2021.012.0.
  • Enable change order of columns for clusters in GR project. Affects versions: . Fix versions: v2021.012.0.

Back to top


v2021.011.0 Release Notes

New Features and Improvements

The following new feature is included in this release.

In Tamr mastering projects, the pairs page now automatically shows pairs with Tamr suggestions that have a medium (M) confidence level, as well as suggestions with high (H) and low (L) confidence levels. This change is the result of a new, lower default value for the TAMR_PAIR_CONFIDENCE_THRESHOLD_MEDIUM configuration variable. For more information, see Configuring Tamr.

Fixed Support Issues

This release corrects the following errors.

  • Add ability to group on nulls/empties for specific aggregation fields in pregroupby. Affects versions: v2020.020.0. Fix versions: v2021.011.0.
  • Medium-confidence pairs should be possible. Affects versions: . Fix versions: v2021.011.0.

Back to top


v2021.010.2 Patch Release Notes

The patch version addresses the following issues:

  • For single-node deployments, the unify-data directory was not included in the backup which could potentially cause dataset exports to fail.
  • Restored mastering workflows failed due to the Tamr restore process failing to pull in a needed piece of information from the backup file.
  • Restore failed if the Low-Latency Match (LLM) service automatically polled for updates during restore. The LLM service no longer automatically polls when the system in read-only mode.
  • Upgrade failed due to inability to connect to the dataset service.
  • Ingestion of CSV files from ADLS Gen2 failed when using the Data Movement Service (DMS).
  • Unable to select a primary key when adding a dataset from Azure cloud storage via DMS in the Tamr UI, because the Primary Key dropdown menu was not populated.
  • Datasets appear to have a smaller number of records than expected in deployments that use HBase.
  • An issue with the cluster metrics dataset which caused the “Estimate cluster metrics” option to be disabled and predict clusters jobs to be cancelled.

v2021.010.1 Patch Release Notes

This patch corrects a timeout error that occurred when deleting datasets with interdependencies on derived datasets, including published clusters. Due to the inability to delete datasets, users were not able to delete related Mastering projects after publishing clusters.

Additionally, this patch corrects a related issue in which the DELETE API returned the following timeout error: "com.tamr.platform.tasq.TaskException: Timed out waiting for tasks to finish".

v2021.010.0 Release Notes

New Features and Improvements

The following new features and improvements are included in this release.

  • For Clusters, when a metrics estimation job completes, the "Estimate" option automatically updates to the "View cluster metrics" option.
  • For Data Movement Service, improved append to dataset functionality when uploading a dataset.
  • For Data Movement Service, support for uploading Parquet files with complex schema.

Fixed Support Issues

This release corrects the following errors.

  • Expose highImpactThreshold as a configurable variable in CategorizationInfo recipe . Affects versions: v2021.002.1, v2021.002.2. Fix versions: v2021.010.0.
  • Tag filter not working in adding datasets to project page. Affects versions: v2021.006.0. Fix versions: v2021.010.0.
  • UI-uploaded filenames are not encoded as UTF-8 Strings. Affects versions: v2021.005.0. Fix versions: v2021.010.0.
  • Unable to see 'View cluster metrics' button after upgrading to v2020.20.0. Affects versions: v2020.020.0. Fix versions: v2021.010.0.
  • Expose highImpactThreshold as a configurable variable in CategorizationInfo recipe . Affects versions: . Fix versions: v2021.010.0, v2021.002.2.
  • View Metrics appears without user clicking to another page. Affects versions: Fix versions: v2021.010.0.
  • Tag filter not working in adding datasets to project page. Affects versions: v2021.006.0. Fix versions: v2021.010.0.
  • Update copy API docs - LLM. Affects versions: v2021.006.0. Fix versions: v2021.010.0.
  • UI-uploaded filenames are not encoded as UTF-8 Strings. Affects versions: v2021.005.0. Fix versions: v2021.010.0.

Back to top


Earlier Releases in 2021

v2021.009.1 Patch Release Notes

The patch version addresses the following issues:

  • For single-node deployments, the unify-data directory was not included in the backup which could potentially cause dataset exports to fail.
  • Restored mastering workflows failed due to the Tamr restore process failing to pull in a needed piece of information from the backup file.
  • Restore failed if the Low-Latency Match (LLM) service automatically polled for updates during restore. The LLM service no longer automatically polls when the system in read-only mode.

v2021.009.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • Make HBASE Peak/OffPeak windows configurable.
  • Implement conditionality for 'View cluster metrics'.

Fixed Support Issues

This release corrects the following errors.

  • AWS EMR Ephemeral Spark cluster instance groups not being named correctly. Affects versions: v2021.008.0. Fix versions: v2021.009.0.
  • Make HBASE Peak/OffPeak windows configurable. Affects versions: . Fix versions: v2021.009.0.
  • LLM not working during backup. Affects versions: . Fix versions: v2021.009.0.
  • Implement conditionality for 'View cluster metrics'. Affects versions: . Fix versions: v2021.009.0.

Back to top


v2021.008.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • The dropdown list for attributes on the blocking model page of a mastering project now provides a tooltip on mouseover with the full attribute name. Previously, the list was too narrow to show long attribute names.
  • New Tamr configuration variable for setting AMI in RunJobFlowRequest.

Fixed Support Issues

This release corrects the following errors.

  • Default value for TAMR_BIGQUERY_ENABLED gives errors in dataset.log when not using bigquery. Affects versions: v2021.006.0.
  • Show full attribute name on blocking model page. Affects versions: v2019.023.1.
  • Unified Attribute side of schema mapping does not show correct number of source attributes. Affects versions: v0.39.0, v2021.001.0.

Back to top


v2021.007.0 Release Notes

What's New

This release includes:

  • We now support overriding the following Databricks-specific parameters using TAMR_JOB_SPARK_CONFIG_OVERRIDES:
    • minWorkers - Maps to TAMR_JOB_DATABRICKS_MIN_WORKERS
    • maxWorkers - Maps to TAMR_JOB_DATABRICKS_MAX_WORKERS
    • databricksNodeType - Maps to TAMR_JOB_DATABRICKS_NODE_TYPE

These are members of the sparkDeploymentConfig map.

An example of overriding only these values can be found below (with required property name included):

TAMR_JOB_SPARK_CONFIG_OVERRIDES: "[{
name: databricksOverrides,
sparkDeploymentConfig: {
minWorkers: 5,
maxWorkers: 6,
databricksNodeType: Standard_DS4_v2;
}
}]

New Features and Improvements

The following new features are included in this release.

  • Support spark overrides for Databricks cluster specifications.

Fixed Support Issues

This release corrects the following errors.

  • Unable to apply feedback and updates classification results, receiving error java.lang.OutOfMemoryError: Java heap space . Affects versions: v2021.004.0. Fix versions: v2021.007.0.

Back to top


v2021.006.4 Patch Release Notes

This patch addresses the following Apache Log4j vulnerabilities by updating Tamr Core to use Apache Log4j version 2.17.0:

  • Apache Log4j CVE-2021-45105
  • Apache Log4j CVE-2021-45046
  • Apache Log4j CVE-2021-44228

For full details regarding these vulnerabilities and Tamr Core, refer to Tamr's Updates on Apache Log4j Vulnerabilities article.

This patch fully remediates these three vulnerabilities in Tamr Core and Elasticsearch. Install this patch regardless of whether you have taken any of the remediation steps in the article referenced above.

v2021.006.3 Patch Release Notes

This patch corrects an issue for single-node deployments in which the unify-data directory was not included in the backup, which could potentially cause dataset exports to fail.

v2021.006.2 Patch Release Notes

This patch addresses two issues:

  • The restore service was failing to pull in a needed piece of information from the backup file, causing restored mastering workflows to fail.
  • Automatic polling by the Low-Latency Match (LLM) service during restore caused restore to fail. The LLM service no longer automatically polls when the system is in read-only mode.

v2021.006.1 Patch Release Notes

This patch corrects an issue that affected AWS backup and restore.

v2021.006.0 Release Notes

What's New

This release includes:

  • As long as TAMR_HBASE_REMOTE_DOWNLOAD_ENABLED is set to false, the filesystem connection info will not be in the job spec.
  • Security improvements

Fixed Support Issues

This release corrects the following errors.

  • Get all datasets API failed when searching for a deleted dataset. Affects versions: v2020.024.1. Fix versions: v2021.006.0.
  • Tamr job status never updates for Terminated Databricks cluster on Azure. Affects versions: v2021.002.1. Fix versions: v2021.006.0.
  • Pages in schema mapping load slowly. Affects versions: v2020.012.0, v2020.016.3. Fix versions: v2021.006.0.
  • Job status doesn't update from Azure Databricks, possibly due to ADLS Gen 1. Affects versions: v2020.026.0. Fix versions: v2021.006.0.
  • Project steps dialogue in UI does not reflect the updates that have been done via API. Affects versions: v2020.015.0. Fix versions: v2021.006.0.

Back to top


v2021.005.0 Release Notes

What's New

This release includes the following new features.

Project Movement
The new Tamr Core project movement feature can be used to create, update, or back up project artifacts within and across instances. Use the project movement API to export projects and then, optionally, import them into existing or distinct new projects.

To learn more about project movement, see:

Data Movement Service (DMS)
You can now import and export data files between Tamr Core and your cloud storage with the new Data Movement Service (DMS).

To learn more about DMS, see following:

Important Notes for DMS

  • The current version of DMS supports API interaction through command-line utilities, including cURL, only.
  • DMS does not support Parquet files that include arrays with nulls.
  • Appending uploaded data to an existing dataset:
    • When appending uploaded data with multiple threads (and multiple files) to an existing dataset, the original data is overwritten by the uploaded data and no longer appears in the dataset. This issue is fixed in release v2021.010.0.
    • When appending an uploaded dataset to an existing dataset, if the new dataset does not include all of the columns in the original dataset, the schema of the existing dataset is changed to have only the columns included in the new dataset. This issue is fixed in release v2021.010.0; the schema no longer changes and omitted columns have null values in their respective cells.
  • DMS jobs:
    • DMS jobs are not persisted; upon restart, previous and in progress DMS jobs are no longer listed.
    • For DMS jobs, the job ID is a GUID created by DMS and uses a different format than the numeric job
      IDs created by Tamr.
    • For failed DMS jobs, tmp/.tmp files created during the upload process are not deleted as expected, and can consume a large amount of disk space. For successful DMS jobs, the tmp/.tmp files are deleted.
    • DMS job status does not appear immediately and the progress bar is not in sync with the job status.
    • For successfully completed DMS jobs, the status is completed, instead of succeeded which is reported for other Tamr jobs.

New Features and Improvements

The following new features and improvements are included in this release.

  • Create versioned APIs for project movement.
  • Show the number of blocks per record when estimating the stats of the blocking model.
  • Need configuration parameter for Elasticsearch to avoid "too_long_frame_exception" with the reason "An HTTP line is larger than 4096 bytes".

Fixed Support Issues

This release corrects the following errors.

  • ADLS Gen 1 credentials exposed in multiple logs for HBaseSiteConnectionHandler. Affects versions: v2020.026.0. Fix versions: v2021.005.0, v2021.002.1.
  • Give more information to user about pair estimate complexity. Affects versions: v2020.020.2. Fix versions: v2021.005.0.
  • Error loading similar entities when clicking on clusters. Affects versions: v2020.023.0. Fix versions: v2021.005.0.
  • Need configuration parameter for Elasticsearch to avoid "too_long_frame_exception" with the reason "An HTTP line is larger than 4096 bytes". Affects versions: v2020.012.0, v2021.001.0. Fix versions: v2021.005.0.
  • Show the number of blocks per record when estimating the stats of the blocking model. Affects versions: . Fix versions: v2021.005.0.
  • Error loading similar entities when clicking on clusters. Affects versions: . Fix versions: v2021.005.0.

Back to top


v2021.004.0 Release Notes

Fixed Support Issues

This release corrects the following errors.

  • Tamr often fails to provide error messages for job failures on AWS scale out. Affects versions: v2020.020.2. Fix versions: v2021.004.0.
  • Publish clusters job initially fails without an error message, and succeeds after resubmission. Affects versions: v2020.020.1. Fix versions: v2021.004.0.
  • Clusters records job initially fails with '"TreeNodeException", and succeeds after resubmission. Affects versions: v2020.020.2. Fix versions: v2021.004.0.
  • Publish clusters job initially fails with "NullPointerException", and succeeds after resubmission. Affects versions: v2020.020.2. Fix versions: v2021.004.0.
  • "Generate SM suggestions" button not clickable after model import. Affects versions: v2020.012.0. Fix versions: v2021.004.0.

Back to top


v2021.003.0 Release Notes

What's New

This release includes:

Spark config overrides are changing for ephemeral EMR spark.

There are now only two fields that are supported to override within the sparkDeploymentConfig map, clusterNamePrefix and runJobFlowRequest. The values are representative of what you would set for the following Tamr configurations:

  • clusterNamePrefix > TAMR_DATASET_EMR_CLUSTER_NAME_PREFIX
  • runJobFlowRequest > TAMR_DATASET_EMR_RUN_JOB_FLOW_REQUEST

TAMR_JOB_SPARK_CONFIG_OVERRIDES: '[{"name": "adjustedInstanceCount", "sparkDeploymentConfig": {"clusterNamePrefix":"", "runJobFlowRequest": "..."} }]'

Fixed Support Issues

This release corrects the following errors.

  • CSV export download does not work on AWS scale-out. Affects versions: v2020.013.0. Fix versions: v2021.003.0.

Back to top


v2021.002.5 Patch Release Notes

Fixed Issues

This patch release corrects the following issue.

Verifying clusters or records that are already verified causes all records in the project to be stored in the cluster log dataset, which causes the verification operation to take a long time.

Back to top


v2021.002.4 Patch Release Notes

This patch addresses the following Apache Log4j vulnerabilities by updating Tamr Core to use Apache Log4j version 2.17.0:

  • Apache Log4j CVE-2021-45105
  • Apache Log4j CVE-2021-45046
  • Apache Log4j CVE-2021-44228

For full details regarding these vulnerabilities and Tamr Core, refer to Tamr's Updates on Apache Log4j Vulnerabilities article.

This patch fully remediates these three vulnerabilities in Tamr Core and Elasticsearch. Install this patch regardless of whether you have taken any of the remediation steps in the article referenced above.

v2021.002.3 Patch Release Notes

This patch release improves Tamr Core UI performance.

v2021.002.2 Patch Release Notes

This patch release provides a fix for large scale categorization projects.

v2021.002.1 Patch Release Notes

This patch release provides a security improvement.

v2021.002.0 Release Notes

Tamr Core v2021.002.0 is a checkpoint release. For information about how checkpoint releases affect upgrades, see Upgrading Tamr.

What's New

This release includes:

  • For categorization projects, you can now upload and re-use a taxonomy file in multiple projects without requiring a unique name in each project. Tamr now generates a new dataset for the taxonomy in each project, (unified_dataset_name)_categories, which you can view and export from the Dataset Catalog page.
  • The workflow for categorization projects has also changed. Now, you must create the unified dataset for the project before you upload the taxonomy file.

Fixed Support Issues

This release corrects the following errors.

  • Enrichment returns empty results. Affects versions: v2021.001.0. Fix versions: v2021.002.0.
  • WriteLockException on _unified_dataset_dedup_suggested_clusters_log table after concurrent record verification actions. Affects versions: v2020.020.2, v2020.024.1. Fix versions: v2021.002.0.
  • case statement not defaulting to null when ELSE is not specified. Affects versions: v2020.020.0. Fix versions: v2021.002.0.

Back to top


v2021.001.0 Release Notes

This release contains minor updates that improve the experience of using Tamr Core.

Back to top