User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

2022 Tamr Core Release Notes

These release notes describe new features, improvements, and corrected issues in each Tamr Core 2022 release.

See Tamr Core Release Notes for important information for all releases, including upgrade instructions and checkpoint releases.

Other Tamr Core releases:

Tamr Core 2022 Releases

v2022.013.1 Patch Release Notes

This patch release corrects the following issues.

  • Post upgrade long delay between job submission, showing up on UI, getting to running. Patch to fix regression in processing upstream datasets which resulted in long job start times for datasets with many upstream datasets, especially from chained mastering projects.
  • Jobs stuck "waiting for resources". Patch to fix regression in processing upstream datasets which resulted in long job start times for datasets with many upstream datasets, especially from chained mastering projects.

v2022.013.0 Release Notes

New Features and Improvements

This release includes the following new features.

Record Grouping

This release introduces an optional feature for mastering projects, record grouping. As the first de-duplication stage in the workflow, record grouping compares values for certain key attributes to find obviously-matching records. Tamr Core organizes records that have identical values for all of the selected "grouping key" attributes into groups. For the other attributes, curators choose an aggregation function to apply to the set of values that might be found. Applying this stage to one or more of the input datasets can make the pair-labeling and clustering stages in the mastering workflow more efficient.

You enable and configure record grouping in a mastering project on the new Group Records page. To help you fine-tune the settings for each attribute, this page provides metrics and a preview of the groups that Tamr Core will create when you save the current settings. See Grouping Obvious Duplicates.

Add control to disable Core Connect for testing

This release also adds a new configuration variable, TAMR_CONNECT_ENABLED. This variable controls whether Core Connect is started, and should be disabled only at the direction of Tamr support.

Back to top


v2022.012.1 Patch Release Notes

This patch release corrects the following issues.

  • Post upgrade long delay between job submission, showing up on UI, getting to running. Patch to fix regression in processing upstream datasets which resulted in long job start times for datasets with many upstream datasets, especially from chained mastering projects.
  • Jobs stuck "waiting for resources". Patch to fix regression in processing upstream datasets which resulted in long job start times for datasets with many upstream datasets, especially from chained mastering projects.

v2022.012.0 Release Notes

New Features and Improvements

This release includes the following new features.

New Author Role Allows Project Creation

Author role grants ability to create projects and add datasets without access to all projects/datasets

A new user role, Author, is now available. An Author has all of the permissions of a Curator, plus privileges to create projects, add datasets (without requiring access to all datasets), and edit projects and datasets in the authorization policies in which they have the Author role. Policies that include members with the Author role must also include the policy itself as a resource. See Author Tasks and Responsibilities and Using Policies to Control Access.

Core Connect Replaces DMS

Implement user interface for Core Connect

This release adds user interface options for importing and exporting datasets. You can now:

  • Import or export to S3, GCP, or ADLS2 cloud storage (comma- or tab-separated values files or Avro files). Requires configuration: see Configuring Core Connect.
  • Use a JDBC driver to import and export to relational databases or cloud storage. JDBC drivers support import and export to SQL databases (Oracle, Hive, and so on), Parquet files in cloud storage, Google Big Query, and more.

You can monitor the status of jobs initiated by these options on the Jobs page.

See Uploading a Dataset into a Project and Exporting a Dataset.

Add profile and recipeId to all ingest endpoints in Core Connect including JDBC

This release updates the Core Connect API service to add a new POST /jdbcIngest/preview endpoint to allow a preview of the dataset before upload to Tamr Core. It also adds the following optional keys to ingest endpoints: profile, to queue the profile job after upload, and recipeId to specify the project to add the dataset to.

Allow Core Connect to create datasets within policies

This release updates the Core Connect API service to support the new Author role. You can now specify policyIds on all ingest endpoints (optional). When datasets are created, they are added as a resource to the authorization policies specified in the policyIds list. This list does NOT update the policyIds of a dataset that already exists and is being updated.

This release also removes primaryKey and truncateTamrDataset from non-ingestion endpoints where they were present. These parameters are only relevant for ingestion endpoints.

Data Movement Service retired

Tamr offers the Core Connect service to facilitate the import and export of large data files between Tamr Core and your cloud storage provider or other data store. Starting with this release, Tamr no longer supports use of the Data Movement Service (DMS). See Core Connect and Upgrading Tamr Core.

API Authentication by JWT Now Offers Dynamic Retrieval of Public Keys

Add support for pulling dynamic keys for the JWT

This release adds the TAMR_AUTH_JWKS_URI configuration variable to use as an alternate authentication method for JWTs. See Authenticating API Requests.

Fixed Issues

This release corrects the following errors.

  • Issue with JDBC Ingest. Found in: v2022.001.1. Fix versions: v2022.012.0. This release upgrades the JDBC driver for Snowflake from 3.9.0 to 3.13.23, and the JDBC driver for Redshift from 1.2.10.1009 to 2.1.0.9.
  • Curators should be able to "create unified dataset". Found in: v2020.016.3. Fix versions: v2022.012.0. Curators can now use the Create unified dataset option in recently-created projects. Previously, a "permission denied" error message appeared.

Back to top


v2022.011.1 Patch Release Notes

This patch release corrects the following issues.

  • Post upgrade long delay between job submission, showing up on UI, getting to running. Patch to fix regression in processing upstream datasets which resulted in long job start times for datasets with many upstream datasets, especially from chained mastering projects.
  • Jobs stuck "waiting for resources". Patch to fix regression in processing upstream datasets which resulted in long job start times for datasets with many upstream datasets, especially from chained mastering projects.

v2022.011.0 Release Notes

Important Note for this Release

After upgrading to this release, you must republish clusters before running golden record jobs. This requirement is due to a schema change for clusters that added several internally-used fields.

New Features and Improvements

This release includes the following new features.

  • Add No Clustering Within Sources option to blocking model.
    For mastering projects, this release adds a new Exclude clustering within these sources option for the blocking model. Clusters produced by this mastering project will contain at most one record originating from each dataset selected here.
  • Add INPUT_DATASET_DO_NOT_MAPS to project movement.
    Do-not-map information for attributes of input datasets can now be included in project imports and exports. The new artifact INPUT_DATASET_DO_NOT_MAPS is imported additively by default, but it can also be imported destructively or excluded from import. This artifact is a Schema Mapping project artifact.
  • Core Connect - Jobs API.
    This release adds Jobs API endpoints for Core Connect. This enables polling for import and export jobs.
  • Support for RHEL 8.6.
    This release adds support for RHEL v8.6, in addition to RHEL 7.
  • Dialog windows close if the user clicks outside the dialog
    To prevent inadvertent loss of work, clicking outside of an open dialog box no longer closes that box. All dialog boxes now include an X (close) icon or Close button.
  • Project selector should be wider to allow more text.
    When you select a project by moving your cursor over the Tamr logo, this release increases the width of the dropdown from 240px to 650px to accommodate longer project names.
  • Update project home page styling.
    To improve readability and usability, this release includes updates to the width, height, and color of elements on the project home page and project creation dialog box.

Fixed Issues

This release corrects the following errors.

  • Golden record custom expressions causing UI linting errors on v2022.008.0. Found in: v2022.008.0. Fix versions: v2022.008.1, v2022.011.0, v2022.010.2.
  • null values are transferred as "null" string values with JDBC ingest from Core Connect. Found in: v2022.004.0, v2022.009.0, v2022.010.0. Fix versions: v2022.011.0. When using JDBC ingestion, true null values were being ingested as the string "null" instead of as true null values. This release fixes this error.

Back to top


v2022.010.3 Patch Release Notes

This release corrects the following issue.

Export to S3 fails with Cannot construct instance of com.tamr.core.connect.model.jdbc.ExportInfo (no Creators, like default constructor, exist).
This release corrects an issue that caused Core Connect exports and imports to S3, GCS, and ADLS2 to fail with "Cannot construct instance" errors.
Found in: v2022.010.1, v2022.010.0, v2022.010.2. Fix versions: v2022.011.0, v2022.010.3.

v2022.010.2 Patch Release Notes

This patch release corrects the following issue.

  • Golden record custom expressions causing UI linting errors on v2022.008.0.

v2022.010.1 Patch Release Notes

This patch release corrects the following issue.

  • Errors from an internal dataset that blocked scripts and caused the Projects and Datasets Catalog pages to fail to load.

v2022.010.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • Add/implement JWT validation in Auth service.
    JWT authentication for the Tamr Core API can now be used in place of basic authentication. To support JWT authentication, this release adds configuration variables to store the values required for JWT validation. See Authentication with JSON Web Tokens (JWT).
  • Support Microsoft SQL Server 19 via JDBC Driver 7.4.
    This release adds support for the 7.4.1.jre8 JDBC driver version for Microsoft SQL Server / Azure SQL. See Supported JDBC Driver Versions.
  • Add CDATA SAP Hana driver to Core Connect.
    Core Connect can now import and export from SAP HANA using CData, using the JDBC endpoints. See CData JDBC Driver for SAP HANA and Supported JDBC Driver Versions. Use of this feature requires it to be specifically activated as part of your Tamr license.
  • Add CDATA Azure Synapse driver to Core Connect.
    Core Connect can now import and export from Azure Synapse using CData, using the JDBC endpoints. See CData JDBC Driver for Azure Synapse and Supported JDBC Driver Versions. Use of this feature requires it to be specifically activated as part of your Tamr license.
  • Improvements to policy management UI.
    • The Policy Management dialog that opens from the Dataset Catalog page now sorts dataset policies alphabetically.
    • In the same dialog, the count for datasets included now increases or decreases when you add or remove a dataset from the policy.
    • When you edit a project on the home page, the Permissions tab now sorts policies alphabetically.
    • On the same Permissions tab, the search feature now searches for the entered string in project descriptions as well as project names.
  • Migration to GCP scale-out from any single-node modality via backup and restore.
    This release adds additional support for datastore-agnostic backup and restore. This capability now supports instance migration from any single-node Tamr Core deployment modality to a scaled-out instance hosted on GCP. See Selecting a Backup and Restore Approach and Migrating to a GCP Scale-Out Instance.
  • Move categorization feedback from persistence to storage.
    This release moves categorization assignment and unverified feedback information from the persistence storage (that is, PostgreSQL) to the dataset storage (HBase). The persistence tables “persistence.feedback_ns_current” and “persistence.feedback_ns_log” are no longer used for storing feedback. Instead, a new dataset is created for each categorization project to store feedback. The dataset name is <unified_dataset>_classification_feedback where <unified_dataset> is the name of the unified dataset associated with the project. The dataset has the same schema as the previous persistence table. See Datasets in a Categorization Project.

Fixed Issues

This release corrects the following errors.

  • Unify-admin auxiliary service install not placing start-<service>.sh script in correct folder. Found in: v2022.008.0. Fix versions: v2022.010.0. During installation of an auxiliary service, the default location was improperly calculated. This has been fixed.
  • UI customization buttons only accept text/plain content but should accept application/json. Found in: v2022.008.0. Fix versions: v2022.010.0. API calls made by buttons on a custom toolbar now accept bodies in application/json format in addition to text/plain format.
  • Export to BigQuery failing due to Out of memory. Found in: v2022.005.0. Fix versions: v2022.010.0. Export to BigQuery ran out of memory on repeated exports when using InsertMode=GCSStaging. This has been fixed by upgrading the JDBC driver from 21.0.8017 to 22.0.8297.
  • Ticktime not set for the Tamr Zookeeper instance. Found in: . Fix versions: v2022.009.0, v2022.010.0. This release corrects an error where the configuration variable TAMR_ZK_SESSION_TIMEOUT resulted in a maximum 60 second timeout regardless of what value you set.
    If you experience problems with Zookeeper timing out prior to upgrading to this release, complete the following steps:
    1. Disable updates to zoo.cfg by editing /lib/zk_functions.sh. On line 23, remove the following: ${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh zkSetup, and save.
    2. Add the following to ZooKeeper-3.4.14/conf/zoo.cfg: tickTime=24000

Back to top


v2022.009.1 Patch Release Notes

This patch release corrects errors from an internal dataset that blocked scripts and caused the Projects and Datasets Catalog pages to fail to load.

v2022.009.0 Release Notes

New Features and Improvements

The following new features are included in this release.

Backup and Restore across Platforms and Deployments

This release adds a datastore-agnostic backup and restore capability. The new backup and restore capability supports instance migration from the single-node Tamr Core deployment modality hosted on GCP to a scaled-out instance hosted on GCP. To enable datastore-agnostic backup on the source instance, you change the new TAMR_STORAGE_DRIVER_DATA_STORE_BACKUP_ENABLED configuration parameter to true. After restoring to the destination instance, and in all other circumstances, TAMR_STORAGE_DRIVER_DATA_STORE_BACKUP_ENABLED should be set to its default value of false so that the previous backup capability is used. See Configuring Tamr Core.

Parquet Support Available for Core Connect Service

Core Connect enables users to import and export data files between Tamr Core and a variety of cloud storage providers. This release introduces support for importing and exporting Parquet files. Tamr Core supports importing and exporting Parquet files to and from the following connections:

  • S3
  • ADLS Gen 2
  • GCS
  • Server local file system

See supported file types and cloud platforms. See Apache Parquet documentation for sizing guidelines on Parquet files.

To learn more about Core Connect, see:

If you are upgrading from a release prior to v2022.005.0, see Upgrade considerations.

Supported JDBC Drivers

To import or export Parquet files, you use the JDBC Driver for Parquet. Refer to the supported databases and driver versions.

Additional Features for Custom Buttons

New options are available to enhance your custom UI buttons:

  • For buttons that redirect users to a specified URL, you can now specify whether you want the linked URL to open in a new browser tab.
  • For buttons that complete a POST call to an API endpoint, you can now select which keys to include in the POST body.

For more information, see Adding a Custom Toolbar Button.

Optionally Add Authorization and Authentication to Low-Latency Endpoints

This release adds optional authorization and authentication for low-latency match and categorization endpoints.

  • By default, this feature disabled, and the configuration variable to enable is set to false. Authorization is consistent with the corresponding user access to your related unified dataset in your project.
  • To enable, set the configuration variable TAMR_LOW_LATENCY_AUTH_ENABLED to true, and then restart Tamr Core.

Important: Low-latency matching is in limited release. Before using this feature, contact Tamr Support to discuss your use case and for configuration assistance. Low-latency categorization is available for testing purposes only.

New Low-Latency Match Endpoints Available

To improve the usability of the low latency match service, two new endpoints are available:
You can now use POST v1/projects/{project}:matchRecords to return a stream of record match and non-match results.
Use POST v1/projects/{project}:matchClusters to return a stream of cluster match probabilities.
These endpoints replace POST v1/projects/{project}:match, which returned both types of results. For backward compatibility, this endpoint remains available for use. However, its Swagger documentation is no longer published.

Other Improvements

  • Add hover text over job descriptions and make columns wider by default. On the Jobs page, complete descriptions are now available as tooltips when you move your cursor over a cell in the Description, Project, or Step columns. In addition, the columns on this page are resized to optimize the data that appears.
  • This release improves error messaging for search strings that include invalid syntax.
  • Improve dataset tagging visibility. This release includes several small changes to enhance usability and consistency of the dialog boxes for adding and managing dataset tags.
  • Ability to open a project and top navigation in different browser tabs. To allow for side by side data comparisons, you can now open any link in the UI in a separate browser window.

Fixed Issues

This release corrects the following errors.

  • Selecting Edit for empty Mastering project turns screen blank in 2022.008. Found in: v2022.008.0. Fix versions: v2022.009.0. Fixes bug where empty mastering projects were not editable.
  • Ticktime not set for the Tamr Zookeeper instance. Found in: . Fix versions: v2022.009.0. This release corrects an error where the configuration variable TAMR_ZK_SESSION_TIMEOUT resulted in a maximum 60 second timeout regardless of what value you set.
    If you experience problems with Zookeeper timing out prior to upgrading to this release, complete the following steps:
    1. Disable updates to zoo.cfg by editing /lib/zk_functions.sh. On line 23, remove the following: ${TAMR_UNIFY_HOME}/tamr/utils/unify-admin.sh zkSetup, and save.
    2. Add the following to ZooKeeper-3.4.14/conf/zoo.cfg: tickTime=24000

Back to top


v2022.008.1 Patch Release Notes

This patch release corrects the following issue.

  • Golden record custom expressions causing UI linting errors on v2022.008.0.

v2022.008.0 Release Notes

New Features and Improvements

The following new features are included in this release.

Add Customized Buttons to the Tamr Core User Interface

Add configuration for customizable buttons for use in Tamr Core extensions. System administrators can now customize Tamr Core to display a toolbar of additional buttons on any page of the user interface. Using a YAML file, you can design the buttons to either redirect users to a different URL or to complete a POST API call. See Adding a Custom Toolbar Button.

Help Tamr Core Learn through Cluster Verification

A new user interface control is now available to enable the learned pairs feature in mastering projects. When enabled, Tamr Core uses the changes that experts make to record clusters to label existing pairs or generate and label new pairs. To enable learned pairs, see Learned Pairs and the feature’s recommended setting.

Other Improvements

For Schema Mapping projects, the editing dialog no longer includes the fields for currency symbol and spend, as these options do not apply to this project type.

Fixed Issues

This release corrects the following error.

Tamr Core UI freezes or crashes when navigating pages and adding categorization labels. Found in: v2022.005.0. Fix versions: v2022.008.0, v2022.005.2. Includes performance improvements on both the front end and backend for how Tamr Core commits updates for categorization feedback.

Back to top


v2022.007.0 Release Notes

Important Support Notes for this Release

This release is not supported for AWS cloud-native deployments.

New Features and Improvements

The following new features are included in this release.

  • Delete and update records API should fail when called on a non-source dataset. Versioned APIs for modifying content of datasets now return an error when used on non-source datasets, preventing users from entering an undesirable state. APIs updated: DELETE/v1/datasets/{datasetId}/records and POST/v1/datasets/{datasetId}:updateRecords
  • Make auxiliary install location configurable. New configuration “TAMR_AUXILIARY_SERVICES_HOME” created. It determines the location where auxiliary service configurations are stored in a Tamr Core installation.
  • Ability to rename unified attributes in user interface. You can now rename unified attributes. After you rename, you must rerun all jobs of a project in order for the name change to propagate to all pages of the project.

Fixed Issues

This release corrects the following errors.

  • Renamed projects can not be retrieved by the new name in versioned API. Found in: v2021.021.0. Fix versions: v2022.007.0. Tamr Core’s versioned API endpoint GET/v1/projects now supports filtering by the versioned API visible project name. Previously, if a project had been renamed, this endpoint would require the original name used for the request.
  • Setting keys in the S3 client does not work on EMR. Found in: v2022.003.0, v2022.004.0, v2022.005.0, v2022.006.0. Fix versions: v2022.007.0. Fixes a bug that prevented jobs from running on EMR.
  • Team city failure: backup restore failed due to big table exception :resource exhausted. Fix versions: v2022.007.0. More robust error quota exception handling for Bigtable backups.

Back to top


v2022.006.1 Patch Release Notes

This patch release corrects the 'argument "src" is null' error that could occur after upgrading from v2019.026.0 or earlier to v2022.003.0 or later. This fix reinstates a schema non-null check in the storage driver.

v2022.006.0 Release Notes

Important Support Notes for this Release

This release is not supported for AWS cloud-native deployments.

New Features and Improvement

The following new features are included in this release.

  • New versioned API endpoints are available to generate test records and high-impact clusters, and to compute cluster accuracy metrics. Three new versioned API endpoints are available, which allow you to continuously monitor model performance as part of a continuous mastering pipeline.
    • POST http://localhost:9100/api/versioned/v1/projects/</a>{project}/testRecords:refresh, which generates test records and clusters for users to curate.
    • POST http://localhost:9100/api/versioned/v1/projects/</a>{project}/trainingClusters:refresh, which generates high-impact clusters for users to curate.
    • POST http://localhost:9100/api/versioned/v1/projects/</a>{project}/clustersAccuracy:refresh, which computes cluster accuracy metrics, including Precision and Recall.
  • New Absolute Cosine similarity function is available. Like cosine similarity, this function applies to text values and represents the similarity between two "bags of words". However, this function does not normalize the resulting feature vectors, so the similarity range is [0, infinity).
  • Disable the ZooKeeper AdminServer which consumes the valuable port 8080. The AdminServer defaults to port 8080, which conflicts with many other services. This feature is not needed; it is now disabled and port 8080 is available.
  • Improve documentation for how COALESCE works with arrays. The description of the COALESCE function now clarifies that arrays that are empty and arrays that contain only nulls are not themselves null. COALESCE returns these arrays as the first non-null element.
  • Curators can edit projects in UI. In addition to using the API, curators can now access a UI control to edit project settings.
  • Curators can delete projects in UI. In addition to using the API, curators can now access a UI control to delete projects.
  • Large Deltas should force non-incremental updates automatically. By default, Tamr now automatically disables incremental updates if there are more than 5% changes since the last update. Tamr continues to respect the setting for the TAMR_DEDUP_DISABLE_INCREMENTAL configuration variable. If this variable is set to true, Tamr disables incremental updates. If this variable is set to false, Tamr also uses the new threshold to determine whether to disable incremental updates.

Fixed Issues

This release corrects the following errors.

  • Backup to GCS fails if directory is empty. Found in: v2021.002.3. Fix versions: v2022.006.0. Added a recursive check in v2022.006 for empty directories before gsutil copy tasks.
  • Connect Profile API endpoint gives 500 error. Found in: v2022.021.0. Fix versions: v2022.006.0. There was a regression where the /api/urlIngest/serverfs/delimited/profile endpoint returned a NotImplementedException starting in core-connect version tamr-core-2021.021.0-3.15.0. The issue has been resolved.
  • Null Pointer Exception trying to cancel snapshot operation. Found in: v2022.001.0, v2022.002.0. Fix versions: v2022.006.0, v2022.002.1. Fixes an issue when canceling a snapshot operation after restarting the service.

Back to top


v2022.005.2 Patch Release Notes

This patch includes performance improvements on both the front end and backend for how Tamr Core commits updates for categorization feedback.

v2022.005.1 Patch Release Notes

This patch release corrects the 'argument "src" is null' error that could occur after upgrading from v2019.026.0 or earlier to v2022.003.0 or later. This fix reinstates a schema non-null check in the storage driver.

v2022.005.0 Release Notes

Important Support Notes for this Release

This release is not supported for AWS cloud-native deployments.

New Features and Improvements

The following new features are included in this release.

  • A new visual transformation, MultiFormula, is available. Use this transformation to apply the same transformation logic to multiple columns.
  • Core Connect is now available. Details follow.
  • Support reading data from and writing data to BigQuery and Salesforce through Core Connect. A separate license is required.

Core Connect Service Available

In past releases, Tamr provided an API-only auxiliary service, df-connect, which enabled developers to import and export data files between Tamr Core and a variety of cloud storage providers. This release integrates this service into Tamr Core as the Core Connect feature, available through the expanded Core Connect API. Interactive Swagger documentation for the Connect API is available at http://<tamr_ip:9100/docs>. To learn more about Core Connect, see the following:

Upgrade Considerations for Current Users of df-connect

  • Current users of df-connect can now use the Core Connect service instead. As part of integrating the df-connect service into Tamr Core, the new Core Connect API is significantly expanded and improved. The default port for Core Connect is 9050, while the df-connect port is 9030. The Core Connect API is also available through port 9100. For example, http://localhost:9100/api/connect/jdbcIngest.

Note: These differences require updates to your import/export scripts.

Before upgrading, you must disable the df-connect auxiliary service. After upgrade, you must update import/export scripts to use the new Core Connect API. See upgrade guidance for df-connect users.

Upgrade Considerations for Current Users of the Data Movement Service

Current users of the Data Movement Service (DMS) API for importing and exporting between Tamr Core and cloud storage can now use Core Connect instead. See upgrade guidance for DMS users.

Note: To import or export files in Parquet format in this release, you must continue to use DMS. See supported file types and cloud platforms.

Supported Database Connections

Core Connect supports connections to many databases. Refer to the Tamr Core documentation for the currently supported databases and driver versions.

Supported File Types and Cloud Platforms

Core Connect supports import and export for the following file types:

  • Avro and delimited files for S3, ADLSGen2, HDFS, GCS, and the server local file system.
  • Newline-delimited JSON files for S3 and server local file system (export only).

Note: Currently, Core Connect does not support Parquet files. To import and export Parquet files, continue to use the Data Movement Service (DMS). Contact Support if you have more questions.

Fixed Issues

This release corrects the following errors.

  • Parquet files ADLSGen2 greater than 2Gb created by DMS cannot be read. Found in: v2021.006.0. Fix versions: v2022.005.0. Fixed in version of DMS that ships with v2022.005. Fixed Parquet writer bug affecting large files, and improved handling of: null, empty array, and [nulls] when passed through Tamr Core.
  • Job duration does not show while job is running. Found in: v2022.002.0, v2022.003.0. Fix versions: v2022.005.0, v2022.002.1. This release adds a UI fix, which enables job duration information to display on the Jobs page.
  • Column expander broken in CSV upload preview UI. Found in: v2021.014.0. Fix versions: v2022.005.0. This release adds a UI fix, which enables expand and shrink columns using a blue vertical line tracker.

Back to top


v2022.004.1 Patch Release Notes

This patch release corrects the 'argument "src" is null' error that could occur after upgrading from v2019.026.0 or earlier to v2022.003.0 or later. This fix reinstates a schema non-null check in the storage driver.

v2022.004.0 Release Notes

Important Support Notes for this Release

This release is not supported for AWS cloud-native deployments.

New Features and Improvements

The following new features are included in this release.

  • Increase default value of TAMR_HTTP_IDLE_TIMEOUT to 300s.
  • EnrichmentComponent does not handle attribute value where first array element is null. This changes the handling of attributes which are arrays (of strings). Previously, if the first element of the array was null, the enrichment failed. Now it uses the first non null element as the value to be enriched. If all elements are null, the array is empty, or the attribute is null, the default value "" is used.

Back to top


v2022.003.1 Patch Release Notes

This patch release corrects the 'argument "src" is null' error that could occur after upgrading from v2019.026.0 or earlier to v2022.003.0 or later. This fix reinstates a schema non-null check in the storage driver.

v2022.003.0 Release Notes

Important Support Notes for this Release

This release is not supported for AWS cloud-native deployments.

New Features and Improvements

The following new features are included in this release.

  • Optimize cluster editing operations in mastering projects. This change increases performance when processing edit requests. Note that when upgrading to this release, you must run an “Update results” job from the Pairs page before you can edit clusters.
  • For DMS, remove ability to select greater than "8" thread count in the UI and API. Tamr supports up to 8 threads for data import when using the Data Movement Service (DMS); the UI and API have been updated to reflect this maximum supported thread count.
  • Remove the Google BigQuery option in the Connect to Source page. Tamr Core has deprecated support for BigQuery, and as of this release the BigQuery option is no longer available in the Connect to Sources page when uploading datasets.
  • Browser support for Chrome and Edge in Windows 7, 8, and 10 for versions going forward. Deprecated browser support for IE11 in all versions of Tamr Core. See Requirements for Installing Tamr Core.

Fixed Issues

This release corrects the following errors.

  • Preview button not working. Found in: v2021.020.0. Fix versions: v2022.003.0. When writing any type of transformation for both input and unified datasets, the “Preview” button in the transformations cell doesn't work.
  • Bootstrapping Do Not Map attributes should not create empty unified attributes. Fix versions: v2022.003.0. Do Not Map attributes are now ignored when bootstrapping multiple source attributes.
  • Token weighting should be hidden in categorization projects. Fix versions: v2022.003.0. Because token weighting is not utilized for categorization projects, the option to select token weighting for machine learning attributes has been removed in Schema Mapping for these projects.

Back to top


v2022.002.2 Patch Release Notes

This patch addresses a configuration issue that affected a temporary directory used during upgrade. The problem that caused upgrades to fail if any of the Hbase tables did not fit on /tmp is now corrected.

v2022.002.1 Patch Release Notes

This patch release corrects the following issues.

  • Adds a prompt during upgrade if the --exportHBaseSnapshots option is not included.
  • Null Pointer Exception trying to cancel snapshot operation. Fixes a bug when canceling a snapshot operation after restarting the service.
  • Disable the ZooKeeper AdminServer which consumes the valuable port 8080.
  • This release adds a UI fix, which enables job duration information to display on the Jobs page.

v2022.002.0 Release Notes

New Features

The following new features are included in this release.

New Checkpoint Releases

Tamr Core releases v2022.001.0 and v2022.002.0 are checkpoint releases. When you upgrade Tamr Core, you must first upgrade to v2022.001.0, and then v2022.002.0, before upgrading to a greater version.

Upgrade to HBase 2.x Client

This release includes an upgrade of the HBase Java libraries used by Tamr Core from 1.3.1 to 2.2.3. Additionally, the version of HBase that is installed on single-node instances has been upgraded from 1.3.1 to 2.3.6. See Upgrading Tamr Core. If you are upgrading a cloud-native deployment, please contact Tamr Support for guidance.

❗️

Important

  • For single-node deployments, you must provide an additional flag, --exportHBaseSnapshots, to the admin utility (unify-admin.sh) during upgrade. To prevent data corruption, see prerequisites before upgrade.
  • Upgrading HBase versions requires significant upgrade time; expect upgrade to take longer than usual for this release. Upgrade time is highly dependent on the number of projects in your pipeline. For example, if you have 20 projects, expect that upgrade to take at least 3 hours.

Improvements

  • Schema mapping projects: user interface improvement for mapping suggestion counts. The number next to lightbulbs now indicates the top suggested mappings for an attribute at the specified similarity threshold.

Fixed Issues

This release corrects the following errors.

  • Schema mapping suggestion counts are zero or negative for some attributes. Found in: v2021.015.0. Fix versions: v2022.002.0.

Known Issues

Note: For this release, IAM role-based authentication for S3 on DMS storage is not supported on EC2 instances. Tamr recommends using a service principal to import or export data from AWS.

Back to top


v2022.001.3 Patch Release Notes

This patch addresses a configuration issue that affected a temporary directory used during upgrade. The problem that caused upgrades to fail if any of the Hbase tables did not fit on /tmp is now corrected.

v2022.001.2 Patch Release Notes

This patch release corrects an issue in which upgrade to v2022.001.1 succeeds, but recipe upgrade for projects fails. Found in: v2022.001.1.

Release v2022.001.0 added edit checking to ensure that project and dataset names do not include the characters /, \, or :, a leading ., or leading or trailing white spaces. This patch identifies projects with names that include these characters or spaces and removes them from the project names.

Contact Tamr Support ([email protected]) for assistance if your dataset names include these characters or spaces.

v2022.001.1 Patch Release Notes

This patch release corrects the following issues.

  • Running into malformed YAML issue when upgrading from v2021.006 to v2022.001.

v2022.001.0 Release Notes

New Features and Improvements

The following new features are included in this release.

  • Upgrade the bundled JDK version.
  • Show correct empty state when using the filters in Dataset Catalog page. When using the "Results and Internals" or "System" filter in Dataset Catalog, the message shown in the empty state is now “No datasets matching your filters.”
  • Add disk space available check to upgrade utility. Validation scripts now include a utility to verify that at least 20% of disk space is available when starting or upgrading Tamr Core.

Fixed Issues

This release corrects the following errors.

  • Clustering job stuck. Found in: v2021.010.2. Fix versions: v2022.001.0.
  • Validation script does not correctly identify disk usage scenarios that will break Tamr Core. Found in: v2021.019.0. Fix versions: v2022.001.0.
  • Tamr Core now enforces that project names cannot include the ‘/', ‘\’, ':’ characters or leading or trailing white spaces. Fix versions: v2022.001.0.

Back to top