Publishing Clusters
Publish the current record clusters as the latest version available to downstream record consumers.

Review cluster change metrics, such as the total number of clusters, for published clusters over time.
After the record clusters are ready for downstream consumption, a Curator publishes them. See Publishing Clusters.
What Happens When You Publish Clusters?
When you publish a current cluster of records, this saves it as the latest version visible to downstream consumers within Tamr. Publishing a cluster creates or updates a number of Tamr-Generated Datasets.
You can export published clusters and create reports on cluster metrics over time.
Once Tamr publishes a first cluster version, the Curator continues any number of curation and review iterations as new data or feedback becomes available before publishing a next version. Tamr dynamically captures cluster's change metrics, such as the number of clusters with new members, between the current cluster and the latest published clusters. Tamr presents these metrics in the clustering Curator and Review workflows. See Reviewing Cluster Change Metrics.
How Do Cluster IDs Change Over Time?
As you create, update, and delete clusters, they may change their IDs. The resulting clusters may retain the IDs of previous clusters, and new IDs can be issued to new clusters. This section explains how Tamr handles cluster IDs and what you can do to review the cluster changes that took place over time.
- The retention of an existing cluster ID is known as surviving. The surviving of cluster IDs occurs when a new cluster retains the ID of a cluster that existed previously.
- The creating of a new cluster ID is known as minting. The minting of cluster IDs occurs when new clusters are created.
- The removal of a cluster ID is known as retiring. The retiring of cluster IDs occurs when existing clusters are deleted or emptied.
The following examples explain how IDs evolve over time.
Example 1: Surviving and Minting of Cluster IDs
Consider a published cluster A that splits into two clusters. If one of the two clusters keeps the cluster ID A and the other cluster obtains a newly created cluster ID B, then we say that cluster ID A survived and cluster ID B is minted.
Cluster IDs are unique. Tamr never re-issues a previously minted cluster ID.

Cluster A splits into two clusters. One of the two current clusters keeps the ID A, while the other obtains the new ID B.
Example 2: Surviving and Retiring of Cluster IDs
Consider two published clusters A and B that merge into one cluster. If the merged cluster keeps the ID A, we say that cluster A survived and cluster B is retired or emptied.
Tamr does not re-issue retired cluster IDs.

Cluster A and cluster B merge into one cluster. The merged cluster keeps the ID A of one of the two merging clusters. The cluster ID B is retired or emptied.
Automatic and Manual Survivorship of Cluster IDs
Tamr automatically survives, mints, and retires cluster IDs between the latest published clusters and the current clusters.
- When two or more clusters merge, the cluster ID of the cluster with the largest number of records automatically survives on the merged cluster. In the event of a tie, Tamr chooses the cluster that has the highest absolute record overlap with the merged cluster.
- When a cluster splits into two or more clusters, the cluster ID of the cluster that is being split automatically survives on the new cluster with the largest number of records. In the event of a tie, Tamr chooses the split cluster that has the highest absolute record overlap with the given cluster.
- Users with the Curator or Admin roles can override and specify which ID should persist when merging and splitting clusters.
- When you merge two or more clusters via drag-and-drop, the cluster ID of the drop (or destination) cluster survives, while one or more cluster IDs of the dragged clusters are retired. See Merging Using Drag and Drop.
- When you split a cluster into two or more clusters via Move to new, the cluster ID of the given cluster survives on the remaining records, while a new cluster ID is minted for the cluster resulting from the moved records. See Moving Records to a New Cluster.
- When you publish clusters, Tamr publishes the resulting cluster IDs and they become the latest published clusters.
Reporting Cluster Changes
You can access the historical information about all past cluster versions through RESTful APIs:
- Fetch cluster history. See Retrieve Published Clusters Given Cluster IDs.
- Fetch record history. See Retrieve Published Clusters Given Record IDs.
- Configure the time-to-live for clusters. See Update Published Clusters Configuration.
- Obtain only the latest version of published clusters. See Using Low Latency Match with Published Clusters.
Updated over 5 years ago