Tamr Documentation

Verifying Clusters

Verifying records in a cluster involves collecting expert feedback about the clusters Tamr generates and then making changes to improve clustering.

When are Cluster Verification Options Useful?

Use cluster verification options to collect feedback on the records Tamr groups into a cluster. They allow you to:

  • Assess how well Tamr performs at clustering records.
  • Indicate that you reviewed the results of the clustering job.
  • Manually edit clusters to account for exceptions you don't want the Tamr model to learn. For example, cluster verification options help in cases where two records that look different must correspond to the same company because of an acquisition, and should be manually merged into the same cluster.

Cluster Verification Options

You can use the following verification options when verifying records in the cluster. In particular, you can:

Cluster verification options: lock, lightbulb (enable suggestions), checkmark (move)Cluster verification options: lock, lightbulb (enable suggestions), checkmark (move)

Cluster verification options: lock, lightbulb (enable suggestions), checkmark (move)

Verify and enable suggestions

Choosing this option allows you to verify one or more selected records as associated with the current cluster, and also enable Tamr to make further clustering suggestions for these records. However, any new suggestions are not applied automatically. This is useful if you know that more records will arrive, or when you are working on creating clusters and this process needs to go through a few additional iterations.

Tip: The light bulb icon turns from white to yellow when Tamr has a suggestion for one of these verified records.

Verify and auto-accept suggestions

Choosing this option allows Tamr to verify one or more selected records as associated with the current cluster, and also allows for the possibility of moving the records to another cluster based on future Tamr suggestions. This option is useful, for example, if you decide that you are now further along in the process of creating clusters and that Tamr has been trained sufficiently so that you can trust suggestions Tamr makes in the future.

This option indicates you agree with Tamr's suggestion, but Tamr can still move the records around if Tamr disagrees with you in the next round of its cluster suggestions. You can only use this option to confirm that records are in the correct cluster, and not to merge or split clusters. Use it when you want to audit a cluster and indicate to your team that this record assignment is correct at a point in time, and to continue collecting more feedback on this cluster assignment.

Verify and disable suggestions

This option confirms that one or more selected records are associated with current cluster, and prevents Tamr from making further suggestions for these records. Choose this option if you are satisfied with the current verification and do not anticipate that records or datasets will change over time.

You can also use this option to override Tamr suggestions in edge cases, prevent generation of pairs for very large clusters, or in cases when you are sure that the clustering is correct and you don’t want to know if Tamr thinks otherwise.

This option is equivalent to the Lock option available for cluster verification in the releases before Tamr version 2019.026.

Note: The Verify and disable suggestions option is available for backward-compatibility. Tamr does not recommend using it because of the following limitations:

  • Tamr doesn’t make suggestions for records that use this configuration (also known as "locked cluster records"). As a result, you cannot compare suggestions made by Tamr with human feedback.
  • Records that use this configuration require human intervention if the data or record relationship changes. For example, if a locked cluster contains a record and a company merger or split occurs, you must manually correct the cluster. In another example, if the data in a locked cluster for a specific record is updated with a new field, you must manually correct this locked record to reflect the new information.

Remove verification

If you choose this option, Tamr removes previously made cluster verifications and the record goes back to the pool of records for clustering. Choose this option if you do not agree with the current verification.

Verifying Records

To verify records in the cluster:

  1. In a mastering project, open the Clusters page.
  2. Select one or more records. You can select a cluster to show only records currently in that cluster, or use the cluster or record filtering options.
  3. Choose Verify and enable suggestions, or use the drop-down menu to choose Verify and auto-accept suggestions, Verify and disable suggestions, or Remove verification.
Verification options for records in a clusterVerification options for records in a cluster

Verification options for records in a cluster

When you choose a verification option, the Clusters page refreshes as follows:

  • Records in the table reflect verification states and allow you to take action.
  • You can use verification filters that match existing cluster verification states. See Filtering Clusters.
  • The cluster panel shows verification aggregations, such as the number of records in the cluster that may need to be moved to another cluster, or other actions stemming from the new verification states.
Accepting suggestions or keeping the current cluster assignmentsAccepting suggestions or keeping the current cluster assignments

Accepting suggestions or keeping the current cluster assignments

Note: Verify and enable suggestions has two different possible outcomes.

  • The bulb is yellow when you and the Tamr model disagree about the record's cluster placement: Tamr's suggestion differs from the verified cluster. You may need to verify the suggestion again and decide whether you need to move the record to another cluster.
You and the Tamr model disagreeYou and the Tamr model disagree

You and the Tamr model disagree

  • The bulb is grey when you and the Tamr model agree about the record's cluster placement: Tamr's suggestion is the same as the verified cluster.
You and the Tamr model agreeYou and the Tamr model agree

You and the Tamr model agree

Pinning Cluster Records

You can pin records to the top of the records panel to make comparisons easier, or before you move them to another cluster.

Tip: This feature helps you keep track of a specific record or set of records while you work. Pinning a record does not verify the record in a cluster, and is not used to update the Tamr model.

To pin records to the top of the records panel:

  1. In a mastering project, open the Clusters page.
  2. Select one or more records.
  3. Choose Pin. Pinned records are highlighted in green at the top of the page.

To unpin records:

  1. Select one or more pinned records.
  2. Choose Unpin.

Moving Records to a New Cluster

Moving records to a different cluster verifies the moved records in their new cluster.

To move records from a cluster to another cluster:

  1. In a mastering project, open the Clusters page.
  2. On the right-hand side, select one or more records from the list and open record details.
  3. On the record details side panel, select Accept suggestion to add a record to a specific cluster Tamr suggests. Tamr offers the ID of this new suggested cluster.
  4. In the Move record dialog, you can further decide if you want to move the record to the new cluster and verify and enable suggestions, or verify and disable further suggestions.
Moving a record to another clusterMoving a record to another cluster

Moving a record to another cluster

Moving Records to an Existing Cluster

Moving records to an existing cluster verifies only the records being moved into their new cluster.

Moving Records with Drag and Drop

To move records from a cluster to an existing cluster using drag and drop:

  1. In a mastering project, open the Clusters page.
  2. On the right-hand side, select one or more records via Ctrl + Click or Shift + Click.
  3. Drag and drop the record(s) onto the new cluster.

Tip: In the two-paned cluster browser you can drag records across panes.

Merging Clusters

Merging clusters is directional. When you merge clusters A and B you choose whether to move A into B, or B into A. These are different actions because when you merge A into B, all records will now be associated with the persistent ID for cluster B. Therefore, when merging clusters, you must decide which cluster's persistent ID you want your records to have before you merge them. This matters because cluster suggestions that Tamr provides are associated with a particular cluster persistent ID.

Note: There is a difference between moving records and merging clusters:

  • If you merge cluster A into cluster B, Tamr verifies all records in the merged cluster B, including the records that were already there before merging.
  • If you move all records of cluster A into cluster B, Tamr verifies all records from cluster A in their new cluster B, but it does not verify records that were already in the cluster B.

Merging Using Drag and Drop

  1. In a mastering project, open the Clusters page.
  2. On the left-hand side, select one or more clusters via Ctrl + Click or Shift + Click.
  3. Use the small vertical bars to the left of the cluster name to drag the clusters and drop them onto another cluster to be merged.
  4. Choose a verification status. When you merge clusters, you can further decide if you want Tamr to verify and enable suggestions, or verify and disable further suggestions for cluster assignments.
Merging clusters by dragging and droppingMerging clusters by dragging and dropping

Merging clusters by dragging and dropping

Tip: In the two-paned cluster browser you can drag clusters across panes.

Merging Clusters in the Left Panel

When merging two or more clusters using Actions > Merge, Tamr automatically applies survivorship and survives the cluster ID of the cluster with the largest number of records. See Automatic and Manual Survivorship.

Note: Left-hand-side panel Merge merges all records of the selected clusters, regardless of any record filters that are applied.

To merge clusters in the left-hand-side panel:

  1. In a mastering project, open the Clusters page.
  2. On the left-hand side, select two or more clusters by using Ctrl+Click or Shift+Click.
  3. Select Actions then Merge. When you merge clusters, you can further decide if you want Tamr to verify and enable suggestions, or verify and disable further suggestions for cluster assignments.

Note that in the two-paned cluster browser you can only merge clusters in the same pane in the left-hand side panel.

Navigation Using Two-Paned Cluster Browser

You can compare clusters by viewing two cluster panes at the same time. You can also drag and drop records and clusters across panes for easy editing. Note that sorting and column order are always synced across both panes.

To open a second pane for another cluster or record, select the icon with two horizontal, blue bars next to its name.

The two-paned browser is available in the cluster browser or the records browser. It appears upon hover for any record or cluster. See its icon in the following example:

Select to view a cluster in the bottom paneSelect to view a cluster in the bottom pane

Select to view a cluster in the bottom pane

Reviewing Cluster Information

To view details about a cluster:

  1. In a mastering project, open the Clusters page.
  2. Select a cluster from the left-hand panel.
  3. Select Open details to view the cluster information.

The cluster pane displays the cluster name, the number of records in it, and the number of verified records with suggestions disabled.

If clusters have been manually published by a curator, then in addition to the name, number of records, and number of locked records, the cluster pane also displays the following information:

  • A graph of cluster size over time.
  • The number of records that have been added to and removed from the cluster since it was last published, and the date when it was last published.
  • The cluster ID. This ID is permanent for the cluster and is guaranteed to never change.

See Publishing Clusters.

Updated about a month ago


Verifying Clusters


Verifying records in a cluster involves collecting expert feedback about the clusters Tamr generates and then making changes to improve clustering.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.