HomeTamr Core GuidesTamr Core API Reference
Tamr Core GuidesTamr Core API ReferenceTamr Core TutorialsEnrichment API ReferenceSupport Help CenterLog In

Examples of Cluster ID Changes

Running jobs for record pairs and clusters assigns and updates the persistent IDs of clusters.

In a mastering project, you work to refine cluster membership by merging clusters together and moving records between clusters. The first time you run Apply feedback and update results or Update results only on the Pairs page, Tamr Core assigns persistent IDs to all clusters. Each time you run Review and publish clusters on the Clusters page after that, Tamr Core assigns persistent IDs to any newly-created clusters and deletes the IDs of any empty clusters.

Using the persistent IDs, you can:

  • Merge the records in each cluster to form a single, merged record that describes an entity. See Golden Records Projects.
  • Use the ID as a key in other systems.
  • Track cluster changes over time using metrics and system-generated datasets of cluster membership.

See Publishing Clusters.

How Do Cluster IDs Change Over Time?

As you create, update, and delete clusters, they may change their IDs. The resulting clusters may retain the IDs of previous clusters, and new IDs can be issued to new clusters.

This section explains how Tamr Core handles cluster IDs and what you can do to review the cluster changes that take place over time:

  • The retention of an existing cluster ID is known as surviving. The surviving of cluster IDs occurs when a new cluster retains the ID of a cluster that existed previously.
  • The creating of a new cluster ID is known as minting. The minting of cluster IDs occurs when new clusters are created.
  • The removal of a cluster ID is known as retiring. The retiring of cluster IDs occurs when existing clusters are deleted or emptied.

The following examples explain how IDs evolve over time.

Example 1: Surviving and Minting of Cluster IDs

Consider a published cluster A that splits into two clusters. If one of the two clusters keeps the cluster ID A and the other cluster obtains a newly created cluster ID B, then cluster ID A "survived" and cluster ID B is "minted".

Cluster IDs are unique. Tamr never re-issues a previously minted cluster ID.

Cluster A splits into two clusters. One of the two current clusters keeps the ID A, while the other obtains the new ID B.Cluster A splits into two clusters. One of the two current clusters keeps the ID A, while the other obtains the new ID B.

Cluster A splits into two clusters. One of the two current clusters keeps the ID A, while the other obtains the new ID B.

Example 2: Surviving and Retiring of Cluster IDs

Consider two published clusters A and B that merge into one cluster. If the merged cluster keeps the ID A, cluster A survived and cluster B is retired.

Tamr does not re-issue retired cluster IDs.

Cluster A and cluster B merge into one cluster. The merged cluster keeps the ID A of one of the two merged clusters. The cluster ID B is retired.Cluster A and cluster B merge into one cluster. The merged cluster keeps the ID A of one of the two merged clusters. The cluster ID B is retired.

Cluster A and cluster B merge into one cluster. The merged cluster keeps the ID A of one of the two merged clusters. The cluster ID B is retired.

Automatic and Manual Survivorship of Cluster IDs

Tamr Core automatically survives, mints, and retires cluster IDs between the latest published clusters and the current clusters:

  • When two or more clusters merge, the cluster ID of the cluster with the largest number of records automatically survives on the merged cluster. In the event of a tie, Tamr Core chooses the cluster that has the highest absolute record overlap with the merged cluster.
  • When a cluster splits into two or more clusters, the cluster ID of the cluster that is being split automatically survives on the new cluster with the largest number of records. In the event of a tie, Tamr Core chooses the split cluster that has the highest absolute record overlap with the given cluster.
  • Users with the curator or admin role can override and specify which ID should persist when merging and splitting clusters.
  • When you merge two or more clusters via drag and drop, the cluster ID of the destination cluster survives, while one or more cluster IDs of the dragged clusters are retired. See Merging Using Drag and Drop.
  • When you split a cluster into two or more clusters via Move to new, the cluster ID of the given cluster survives on the remaining records, while a new cluster ID is minted for the cluster resulting from the moved records. See Moving Records to a New Cluster.
  • When you publish clusters, Tamr Core publishes the resulting cluster IDs and they become the latest published clusters.

Reporting Cluster Changes

You can access the historical information about all past cluster versions through RESTful APIs:


Did this page help you?