Tamr Documentation

Working with Clusters

Curators use Tamr to group records into clusters that contain all of the records that refer to the same real-world entity, and only those records.

When you apply feedback and update record pairs, Tamr also generates the first iteration of record clusters. A cluster can contain one or more records, all of which should represent the same distinct entity. In a given data mastering project, cluster size ranges from one record, known as a singleton cluster, to thousands or tens of thousands of records in a cluster.

The iterative review and curation of important clusters allows Tamr to accurately and automatically cluster records into distinct entities. The process of data mastering. Step 1 is a darker shade of blue to indicate that Tamr completes this step.The iterative review and curation of important clusters allows Tamr to accurately and automatically cluster records into distinct entities. The process of data mastering. Step 1 is a darker shade of blue to indicate that Tamr completes this step.

The iterative review and curation of important clusters allows Tamr to accurately and automatically cluster records into distinct entities. The process of data mastering. Step 1 is a darker shade of blue to indicate that Tamr completes this step.

To achieve one record cluster for each entity, containing all records for that entity and only records for that entity, you and your experts review a small number of important clusters and take the following actions to improve the Tamr model:

  • Merge any clusters that contain records for the same entity.
  • Move records from a cluster for a different entity into an existing cluster for that entity.
  • Separate records into a new cluster for an entity that does not already have a cluster.
  • Verify records as correctly belonging to a cluster. When you verify each record's membership in a cluster you can choose whether Tamr can use that verified membership to make suggestions about future cluster members.

After you review high-impact clusters to verify member records and make other changes, you can generate precision and recall metrics to help you track model accuracy over time.

Tip: The first time that you initiate an Apply feedback and update results or Update results only job in a mastering project, Tamr “publishes” the initial set of clusters by assigning persistent IDs. As you work with clusters, you choose when to manually republish by running a Review and publish clusters job; this job assigns persistent IDs to any new clusters and deletes any empty clusters. Each time you republish, Tamr saves a snapshot of the clusters and recomputes recall and precision metrics.

The iterative curation of important clusters allows Tamr to accurately cluster all records into distinct entities.

Both curators or verifiers can review, filter, assign, and verify clusters. See the following:

Updated 7 days ago



Working with Clusters


Curators use Tamr to group records into clusters that contain all of the records that refer to the same real-world entity, and only those records.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.