Tamr Documentation

v0.51 Notes

New Features

API

  • 5 new clustering endpoints have been added for Mastering in the versioned API:
    • POST projects/<project>/recordClusters:refresh and POST projects/<project>/recordClustersWithData:refresh will both run clustering.
    • Train a model: POST projects/<project>/recordPairsWithPredictions/model:refresh
    • Predict pairs: POST projects/<project>/recordPairsWithPredictions:refresh
    • Generate high impact pairs: POST projects/<project>/highImpactPairs:refresh
    • In addition, the High Impact Pairs dataset has been aliased as: projects/<project>/highImpactPairs.
  • You can now retrieve and update the published clusters garbage collection policy.

Environment Changes

  • TAMR_HADOOP_NAME_NODE_URI is deprecated. Use TAMR_FS_URI in its place. This will be replaced automatically during upgrade, but if the value is stored in local-env.sh it will need to be updated there manually.

General Improvements and Major Bug Fixes

Categorization

Transformations

  • New functions array.nulls() and array.non_nulls() have been added.
  • Unsaved Transformation changes will be restored if you close and reopen the tab while editing.
    • This includes if your browser crashes for any reason. Note, however, that changes will not be saved the Save Changes button has been clicked.
  • The Transformations code editor now uses less browser memory and works up to 3x faster.
  • Bugfix: Transformations syntax highlighting now always works after Save Changes or Cancel Changes.

Mastering

  • Clusters page now referred to as such, rather than $supplier.
  • The Clusters page has improved sorting behavior.
    • Bugfix: cluster records were sorted the same way in two-pane view - these have been decoupled so that each pane can be sorted according to its own criteria.
    • Supplier sort has been moved into a popover (and removed from cluster column headers).
  • Style updates to the Cluster and Record tables on the Clusters page.

API

  • The add and remove dataset to Project endpoints now accept the relative ID of a dataset, in addition to the unique Resource ID. This change is backwards compatible.

General

  • Bugfix: page number now updates when searching and sorting on tables.

Upgrade

See upgrading page for instructions.