2023 Tamr Core Release Notes
These release notes describe new features, improvements, and corrected issues in each Tamr Core 2023 release.
See Tamr Core Release Notes for important information for all releases, including upgrade instructions and checkpoint releases.
Other Tamr Core releases:
Tamr Core 2023 Releases
Important: All v2023.001.x and v2023.003.x versions have been replaced by the latest v2023.004.x release.
- v2023.004.1 patch
- v2023.003.1 patch
- v2023.003.0
- v2023.002.2 patch
- v2023.002.1 patch
- v2023.002.0
- v2023.001.0
v2023.004.1 Patch Release Notes
This patch corrects an edge case resulting from the upgrade to Spark 3 affecting automatic Primary Key management alongside LOOKUP statements.
v2023.004.0 Release Notes
This release replaces all Tamr Core v2023.001.x and 2023.003.x versions. It includes all of the new features and fixed issues in the v2023.001.x, v2023.002.x, and v2023.003.x releases, and the additional changes below.
New Features and Improvements
This release includes the following new features.
- Update and publish golden records jobs can now be run from the UI without needing to run profiling first
- A new Tamr configuration variable,
TAMR_DEPENDENCY_STARTUP_WAIT_SECONDS
, has been added to control how long the startup scripts wait for each supporting service to start. The default value is 450 seconds. With this improvement, the timeout ofwait_until_healthy
in the startup scripts is configurable.
Other Changes
The legacy legacy.top()
aggregate function is no longer supported. Use the TOP
function in its place.
Fixed Issues
This release corrects the following issues:
- With the upgrade to Spark 3, the
PARSE_JSON_ARRAY
transformation returned null for multiple attributes. Found in: v2023.001.0, v2023.002.0, v2023.003.0. Fix versions: v2023.004.0. - With the upgrade to Spark 3, the behavior of the
legacy.hash
transformation changed, resulting in loss of verified labels and changes in clustering. Found in: v2023.003.0. Fix versions: v2023.004.0. - The versioned API for predict models (mastering and classification) did not bring projects up-to-date. With this fix, the UI no longer shows projects as out-of-date when they have been run via API. Found in: v2019.023.1, v2022.013.0. Fix versions: v2023.004.0.
- Fetching jobs could fail if more than 32767 jobs had been run by the system. Found in: . Fix versions: v2023.004.0.
- The versioned API
GET /v1/operations
failed due to a Java Heap space error. This issue has been fixed. Found in: v2021.020.0. Fix version: v2023.004.0 - Corrected an issue in which project-specific policies did not give access unless all derived datasets were included in the policy. Found in: v2023.002.1 Fix version: v2023.004.0
- Before loading the whole page, the Schema Mapping tab in Schema Mapping and Mastering Projects showed a misleading and incorrect message that "The unified dataset has been deleted". This message no longer displays erroneously. Found in v2022.008.0, 2020.024.4. Fix versions: v2023.004.0
- Project import missed newly added input datasets. INPUT_DATASETS has been added as an artifact in project export and import. If a project is out of date and INPUT_DATASETS are imported using the INCLUDE DESTRUCTIVE option, project import will fail.
Found in v2022.005.1, v2022.006.0. Fix versions: v2023.004.0.
v2023.003.1 Patch Release Notes
This patch release corrects the following issues:
- When working with categorization projects, the UI became unstable when attempting to load large amounts of data.
- The status for Core Connect export jobs in the UI was still listed as RUNNING when the jobs were 100% complete.
- Core Connect now supports Snowflake JCBC driver version 3.13.29.
- Security improvements.
v2023.003.0 Release Notes
New Features and Improvements
This release includes the following new features.
Support for Red Hat Enterprise Linux (RHEL) 9
- Tamr Core now can be deployed on the RHEL 9 operating system. See Requirements for Installing Tamr Core for all supported operating systems.
Two dot_product transformation functions
math.dot_product
: Computes the dot product of the input arrays.math.normalized_dot_product
: Computes the normalized dot product between two vectors of numbers using the Model Assurance Criterion.
See Functions for more details.
API operations for canceling incomplete and pending activities
In the instance
resource, invoking the :cancel
operation attempts to cancel all incomplete (including pending) activities for the instance, including :
- Incomplete backups.
- Incomplete project imports.
- Incomplete project exports.
- Incomplete dataset create operations, and roll-back any changes.
- Incomplete dataset delete operations, and roll-back any changes.
- Any other incomplete operations, and roll-back any changes.
- Incomplete dataset transactions that are not associated with an operation, and roll-back any changes.
If there is no error canceling any activity, the operation returns success; otherwise it returns the first error encountered. Examine the server logs to identify any canceled activities.
See the API Reference Guide for more information about the operation to Cancel all running and pending activities.
Functionality to poll disk utilization and stop jobs before disk is exhausted
To prevent disk utilization approaching 100%, the Tamr system now monitors free space in storage locations to which Tamr Core and its supporting services are configured to write. If any of these storage locations drops below 20% free space, the health check reports the system status as unhealthy
and an alert displays in the user interface. The health check message and the Tamr logs describe which storage system or systems have less than 20% free space, and which configured directories are associated with them.
If the free space drops below 10%, Tamr cancels any running jobs, including backups, project imports, and so on. This behavior is also triggered if any storage system drops below 10GB free.
Free space must be brought back above 10% for the system to resume running jobs, and above 20% for it to return to healthy
.
Several new configuration variables are available to configure this behavior:
TAMR_STORAGE_LEVEL_WARN
: Fraction of free space below which the system will become unhealthy. Default 0.2 (20%).TAMR_STORAGE_LEVEL_STOP
: Fraction of free space below which the system automatically cancels jobs. Default 0.1 (10%).TAMR_STORAGE_LEVEL_POLL_INTERVAL
: The interval at which Tamr Core checks the storage level of configured storage locations. Default: 1m (1 minute).
See System Health Status for more information.
Fixed Issues
This release corrects the following errors.
- Removed the DAG viewer from the job details dialog in order to resolve an issue introduced by a related dependency. Found in: v2023.002.0. Fix versions: v2023.003.0.
- Upgrade fails due to insufficient root disk space. Found in: v2022.002.1. Fix versions: v2023.003.0, v2023.002.1. The
--skipEnvironmentValidation
upgrade flag now excludes all validators.
Note: Use this flag with caution. It allows the upgrade process to proceed even with an invalid system configuration, which will cause the process to fail. - While renaming field names in S3 json export (s3 json), an error occurred when a
mergedArrayValuesDelimiter
value was not provided. Found in: v2023.002.0. Fix versions: v2023.003.0. Now, a default value of||
is used when none is provided by the user. - Re-promotion of golden record project missing rule update for certain attributes. Found in: v2022.012.1. Fix versions: v2023.003.0. This issue has been fixed.
Known Issues
The Group Records page can return a blank page. Workaround: Complete schema mapping and update the unified dataset.
v2023.002.2 Patch Release Notes
This patch includes the following new features, improvements, and fixes.
New Features and Improvements
This release includes the following new features.
- Update and publish golden records jobs can now be run from the UI without needing to run profiling first
- A new Tamr configuration variable,
TAMR_DEPENDENCY_STARTUP_WAIT_SECONDS
, has been added to control how long the startup scripts wait for each supporting service to start. The default value is 450 seconds. With this improvement, the timeout ofwait_until_healthy
in the startup scripts is configurable. - Support for Red Hat Enterprise Linux (RHEL) 9
- Tamr Core now can be deployed on the RHEL 9 operating system. See Requirements for Installing Tamr Core for all supported operating systems.
- Two dot_product transformation functions
math.dot_product
: Computes the dot product of the input arrays.math.normalized_dot_product
: Computes the normalized dot product between two vectors of numbers using the Model Assurance Criterion.
See Functions for more details.
- API operations for canceling incomplete and pending activities
In theinstance
resource, invoking the:cancel
operation attempts to cancel all incomplete (including pending) activities for the instance, including :- Incomplete backups.
- Incomplete project imports.
- Incomplete project exports.
- Incomplete dataset create operations, and roll-back any changes.
- Incomplete dataset delete operations, and roll-back any changes.
- Any other incomplete operations, and roll-back any changes.
- Incomplete dataset transactions that are not associated with an operation, and roll-back any changes.
If there is no error canceling any activity, the operation returns success; otherwise it returns the first error encountered. Examine the server logs to identify any canceled activities.
See the API Reference for more information.
- Functionality to poll disk utilization and stop jobs before disk is exhausted
To prevent disk utilization approaching 100%, the Tamr system now monitors free space in storage locations to which Tamr Core and its supporting services are configured to write. If any of these storage locations drops below 20% free space, the health check reports the system status asunhealthy
and an alert displays in the user interface. The health check message and the Tamr logs describe which storage system or systems have less than 20% free space, and which configured directories are associated with them.
If the free space drops below 10%, Tamr cancels any running jobs, including backups, project imports, and so on. This behavior is also triggered if any storage system drops below 10GB free.
Free space must be brought back above 10% for the system to resume running jobs, and above 20% for it to return tohealthy
.
Several new configuration variables are available to configure this behavior:TAMR_STORAGE_LEVEL_WARN
: Fraction of free space below which the system will become unhealthy. Default 0.2 (20%).TAMR_STORAGE_LEVEL_STOP
: Fraction of free space below which the system automatically cancels jobs. Default 0.1 (10%).TAMR_STORAGE_LEVEL_POLL_INTERVAL
: The interval at which Tamr Core checks the storage level of configured storage locations. Default: 1m (1 minute).
See System Health Status for more information.
Fixed Issues
This release corrects the following issues:
- With the upgrade to Spark 3, the
PARSE_JSON_ARRAY
transformation returned null for multiple attributes. Found in: v2023.001.0, v2023.002.0, v2023.003.0. Fix versions: v2023.002.2, v2023.004.0. - With the upgrade to Spark 3, the behavior of the
legacy.hash
transformation changed, resulting in loss of verified labels and changes in clustering. Found in: v2023.003.0. Fix versions: v2023.002.2, v2023.004.0. - The versioned API for predict models (mastering and classification) did not bring projects up-to-date. With this fix, the UI no longer shows projects as out-of-date when they have been run via API. Found in: v2019.023.1, v2022.013.0. Fix versions: v2023.002.2, v2023.004.0.
- Fetching jobs could fail if more than 32767 jobs had been run by the system. Found in: . Fix versions: v2023.002.2, v2023.004.0.
- The versioned API
GET /v1/operations
failed due to a Java Heap space error. This issue has been fixed. Found in: v2021.020.0. Fix version: v2023.002.2, v2023.004.0 - Corrected an issue in which project-specific policies did not give access unless all derived datasets were included in the policy. Found in: v2023.002.1 Fix version: v2023.002.2, v2023.004.0
- Before loading the whole page, the Schema Mapping tab in Schema Mapping and Mastering Projects showed a misleading and incorrect message that "The unified dataset has been deleted". This message no longer displays erroneously. Found in v2022.008.0, 2020.024.4. Fix versions: v2023.002.2, v2023.004.0
- Project import missed newly added input datasets. INPUT_DATASETS has been added as an artifact in project export and import. If a project is out of date and INPUT_DATASETS are imported using the INCLUDE DESTRUCTIVE option, project import will fail.
Found in v2022.005.1, v2022.006.0. Fix versions: v2023.002.2, v2023.004.0. - Removed the DAG viewer from the job details dialog in order to resolve an issue introduced by a related dependency. Found in: v2023.002.0. Fix versions: v2023.002.2, v2023.003.0.
- Upgrade fails due to insufficient root disk space. Found in: v2022.002.1. Fix versions: v2023.002.2, v2023.003.0, v2023.002.1. The
--skipEnvironmentValidation
upgrade flag now excludes all validators.
Note: Use this flag with caution. It allows the upgrade process to proceed even with an invalid system configuration, which will cause the process to fail. - While renaming field names in S3 json export (s3 json), an error occurred when a
mergedArrayValuesDelimiter
value was not provided. Found in: v2023.002.0. Fix versions: v2023.002.2, v2023.003.0. Now, a default value of||
is used when none is provided by the user. - Re-promotion of golden record project missing rule update for certain attributes. Found in: v2022.012.1. Fix versions: v2023.002.2, v2023.003.0. This issue has been fixed.
- When working with categorization projects, the UI became unstable when attempting to load large amounts of data.
- The status for Core Connect export jobs in the UI was still listed as RUNNING when the jobs were 100% complete.
- Security improvements.
Known Issues
The Group Records page can return a blank page. Workaround: Complete schema mapping and update the unified dataset.
v2023.002.1 Patch Release Notes
This patch extends the --skipEnvironmentValidation
upgrade flag to exclude all validators.
Note: Use this flag with caution. It allows the upgrade process to proceed even with an invalid system configuration, which will cause the process to fail.
v2023.002.0 Release Notes
New Features and Improvements
This release includes the following new features.
- Add ability to disable live preview on Group Records page
You can now control whether the live preview of the results of record grouping appears: an Enable Preview toggle is now included on the Group Records page. Turning preview off can speed up editing of the record grouping definitions when many edits need to be made at once. It is recommended, but not required, that you re-enable preview before saving and running record grouping to allow you to review your changes before committing them. - New Core Connect API parameter to keep data types
This release adds an option, simplifiedDataTypesEnable, for Core Connect JSON exports to S3 and serverfs. When you set simplifiedDataTypesEnable to false, data is exported using the data types registering for the dataset you are exporting within Tamr Core. By default, simplifiedDataTypesEnable is true, which maintains the previous behavior of JSON exports: data types are simplified and all columns are exported as arrays of strings, unless flattenEnable is also true, in which case all columns are exported as strings.
- When simplifiedDataTypesEnable=true && flattenEnable=false, all data is exported as array of string type.
- When simplifiedDataTypesEnable=true && flattenEnable=true, all data is exported as string type.
- When simplifiedDataTypesEnable=false, data is exported matching the type for its column as defined in the dataset service (regardless of the setting of flattenEnable).
Fixed Issues
This release corrects the following errors.
- Non-functional Max Learned Pair UI checkbox. Found in: v2022.011.0, v2022.012.0. Fix versions: v2023.002.0. This release addresses two issues with the Max Learned Pair setting in the Edit Project dialog. Previously, the value for this setting was reset to 0 if the browser was refreshed. In addition, editing the value in one project updated the value within the UI for all projects. Both errors have been fixed.
- NullPointerException on EmrFileSystem.exists. Found in: v2023.001.0. Fix versions: v2023.002.0. This release fixes an issue on AWS scale-out deployments. Spark jobs no longer fail when a file doesn't exist on S3.
- New keys for ingest endpoints are also included for export endpoints. Found in: . Fix versions: v2023.002.0. Previously, Core Connect import-related JSON keys (policyIds, primaryKey, profile, recipeId, truncateTamrDataset) were also being shown in the Swagger examples for export endpoints. These superfluous keys have been removed from the export Swagger examples.
- S3 client can't upload large files (+5GB). Found in: v2022.009.0. Fix versions: v2023.002.0. This release corrects an error that prevented the Tamr Core S3 client from uploading files larger than 5GB to S3.
v2023.001.0 Release Notes
New Features and Improvements
This release includes the following new feature:
- Upgrade the Hadoop/Spark libraries for single-node deployments
Upgraded the version of Spark used by Tamr Core to Spark 3.1 3. Starting with this release, Tamr Core uses Spark 3.1.3 instead of Spark 2.4.5, which it used in previous releases. The upgrade to Spark 3.1.3 takes place automatically as you upgrade to this release.
For more information, see Upgrading Tamr Core.
Fixed Issues
This release corrects the following error:
- Pair not appearing after Non-ES Backup Restoration in project with record grouping. Found in: v2022.006.0, v2022.011.0. Fix versions: v2023.001.0. The Reindex API now reindexes record groups when Record Grouping is enabled.