Curating Clusters
Manually review, approve or change cluster membership.
Reviewing and curating a cluster involves collecting feedback from reviewers about clusters generated by Tamr. Only curator roles in Tamr can verify cluster assignments.
Note: Publish clusters before providing verification feedback to Tamr. Cluster record verification relies on persistent cluster IDs. To obtain them, you must publish clusters. Only after you publish clusters you can use the cluster verification options.
When Are Cluster Verification Options Useful?
Use cluster verification options to collect cluster feedback. They allow you to:
- Assess how well Tamr is performing at clustering records.
- Indicate that you have reviewed the results of the clustering job.
- Manually edit clusters to account for exceptions that you don't want the ML algorithm to learn. For example, cluster verification options help in cases where two records that look different must correspond to the same company because of an acquisition, and should be manually merged into the same cluster.
Cluster Verification Options
You can use the following granular verification states when verifying records in the cluster. In particular, you can:

Cluster verification options: lock, lightbulb (enable suggestions), checkmark (move)
-
Verify and enable suggestions. Choosing this option allows you to verify the current cluster associated with one or more records and enable Tamr to make further clustering suggestions for these records. However, new suggestions will not be applied automatically. This is useful if you know that more records will arrive, or when you are working on creating your clusters and this process needs to go through a few additional iterations.
-
Verify and auto-accept suggestions. Choosing this option allows Tamr to verify the current cluster associated with one or more records and also allows for the possibility of moving the records to another cluster based on future clustering suggestions that Tamr recommends. This option is useful, for example, if you decide that you are now further along in the process of creating clusters of records and that Tamr has been trained sufficiently so that you can trust suggestions that Tamr makes in the future. This option is similar in its action to upvoting, as it indicates that you agree with Tamr, but Tamr can still move the records around if Tamr disagrees with you in the next round of its cluster suggestions. You can only use this option to confirm that records are in the correct cluster and not to merge or split clusters. Use it when you want to audit a cluster and indicate to your team that this record assignment is correct at a point in time, and to continue collecting more feedback on this cluster assignment.
-
Remove verification. If you choose this option, Tamr removes previously made cluster verifications and the record goes back to the pool of records that must be clustered. Choose this option if you do not agree with the current verification.
-
Verify and disable suggestions. This option is equivalent to the Lock option available for cluster verification in the releases before Tamr version 2019.026. Use it to override Tamr suggestions in edge cases, prevent generation of pairs for very large clusters, or when you are sure that the clustering is correct and you don’t want to know if Tamr thinks otherwise.
This option confirms the current cluster associated with one or more records and prevents Tamr from making further suggestions for these records. Choose this option if you are satisfied with the current verification and do not anticipate that records or datasets will change over time.
Note that the option Verify and disable suggestions is available for backwards-compatibilty and we do not recommend using it because of the following limitations:
- Tamr doesn’t make suggestions for records that use this configuration (also known as "locked cluster records"). As a result, you cannot compare suggestions made by Tamr with human feedback.
- Records that use this configuration require human intervention if the data or record relationship changes. For example, if a locked cluster contains a record and a company merger or split occurs, you must manually correct the cluster. In another example, if the data in a locked cluster for a specific record is updated with a new field (attribute value), such as the record named "engine" has a new field "horse power", you must manually correct this "locked" record to reflect the new information.
Verifying Records
To verify records in the cluster:
- Navigate to the Clusters page of a Mastering project.
- Select one or more records.
- Choose Verify and enable suggestions, or use the drop-down menu to choose Verify and auto-accept suggestions, Verify and disable suggestions, or Remove verification.

Cluster verification options
When you choose any of these verification options, the Clusters page behaves in the following ways:
- Records in the table reflect verification states and allow you to take action. See the following screenshot that illustrates this point.
- As a curator, you can use verification filters that match existing cluster verification states.
- The cluster table shows verification aggregations, such as the number of records in the cluster that may require to be moved to another cluster, or other actions stemming from the new verification states.

Accepting suggestions or keeping the current cluster assignments
Note that Verify and enable suggestions has two different possible outcomes.
- The bulb is yellow when you and Tamr ML disagree about the record's cluster placement: the Tamr ML suggestion differs from the verified cluster. You may need to verify the suggestion again and decide whether you need to move the record to another cluster.

You and Tamr ML disagree
- The bulb is grey when you and Tamr ML agree about the record's cluster placement: the Tamr ML suggestion is the same as the verified cluster.

You and Tamr ML agree
Using Filters for Cluster Records
As a curator, you can use more expressive verification filters that match the cluster verification states, and filter clusters down to records in one of these states. You can also view verification states in the record sidebar. For example, you can use filters to see the number of records on whose cluster assignment Tamr and curators agree. The following screenshot shows filter options.

Filter records to show records that match a particular verification state
Filtering Based on Tamr Suggestions
In addition to filtering cluster records based on their verification status, you can also filter records based on Tamr suggestions and then take actions. For example, you can review all records that Tamr suggested to move to another cluster and accept Tamr suggestions (move records). The following screenshot shows Tamr suggestion filters.

Cluster verification suggestions
Pinning Cluster Records
Pin records to the top of the records panel to compare them or move to another cluster. You can pin up to ninety-nine records at once.
To pin records to the top of the records panel:
- Navigate to the Clusters page of a Mastering project.
- Select one or more records.
- Choose Pin.
- The records are highlighted in green at the top of the page.
Moving Records to a New Cluster
Moving records to a new cluster locks all records in the new cluster.
To move records from a cluster to a new cluster:
- Navigate to the Clusters page of a Mastering project.
- On the right-hand side, select one or more records from the list and open record details.
- On the record details side panel, select Accept suggestion to add a record to a specific cluster that Tamr suggests. Tamr offers the ID of this new suggested cluster. The tooltip that
Tamr recommends moving this record to cluster <clusterID> which does not exist yet.
- In the Move record dialog, you can further decide if you want to move the record to the new cluster and verify and enable suggestions, or verify and disable further suggestions.

Moving a record to another cluster
Moving Records to an Existing Cluster
Moving records to an existing cluster locks only the records being moved.
Moving Records with Drag and Drop
To move records from a cluster to an existing cluster using drag and drop:
- Navigate to the Clusters page of a Mastering project.
- On the right-hand side, select one or more records via Ctrl + Click or Shift + Click.
- Drag and drop the record(s) onto the cluster to which they will be moved.
Note that in Two-Paned Cluster Browser records can be dragged across panes.
Moving Records by Cluster Addition
You can move records by using a plus sign and adding them to a cluster.
To move records by adding them to a cluster:
- Navigate to the Clusters page of a Mastering project.
- On the right-hand side, select one or more records via Ctrl + Click or Shift + Click.
- On the left-hand side, hover over the existing cluster.
- Select the + (plus) icon that appears.
Note that in Two-Paned Cluster Browser, the records and the cluster can be in multiple panes.
Merging Clusters
Merging clusters is directional. When you merge clusters A and B you choose whether to move A into B, or B into A. These are different actions because when you merge A into B, all records will now be associated with the persistent ID for cluster B. Therefore, when merging clusters, you must decide which cluster's persistent ID you want your records to have after you merge them. This matters because cluster suggestions that Tamr provides are associated with a particular cluster persistent ID.
Note: There is a difference between moving records and merging clusters:
- If you merge cluster A into cluster B, Tamr verifies all records in the merged cluster B, including the records that were already there before merging.
- If you move all records of cluster A into cluster B, Tamr verifies all records from cluster A in their new cluster B, but it does not verify records that were already in the cluster B.
Merging Using Drag and Drop
- Navigate to the Clusters page of a Mastering project.
- On the left-hand side, select one or more clusters via Ctrl + Click or Shift + Click.
- Use the small vertical bars to the left of the cluster name to drag the clusters and drop them onto another cluster to be merged.
- Choose which cluster to merge into using the dialog. Any of the clusters that are currently being dragged, as well as the receiving cluster, are listed.
- Select Merge. When you merge clusters, you can futher decide if you want Tamr to verify and enable suggestions, or verify and disable further suggestions for cluster assignments.

Merging clusters
Note that in Two-Paned Cluster Browser you can drag clusters across panes.
Merging Clusters in the Left Panel
When merging two or more clusters via Actions > Merge, Tamr automatically applies survivorship and survives the cluster id of the cluster with the largest number of records. See Automatic and Manual Survivorship.
Note: Left-hand-side panel Merge merges all records of the selected clusters, regardless of any record filters that are applied.
To merge clusters in the left-hand-side panel:
- Navigate to the Clusters page of a Mastering project.
- On the left-hand side, select two-or-more clusters via Ctrl + Click or Shift + Click.
- Select Actions > Merge. When you merge clusters, you can futher decide if you want Tamr to verify and enable suggestions, or verify and disable further suggestions for cluster assignments.
- A confirmation dialog appears. Choose Confirm.
Note that in the Two-Paned Cluster Browser you can only merge clusters in the same pane in the left-hand-side panel.
Reviewing and Merging Similar Clusters
When viewing a selected cluster, its top three most similar clusters are displayed beneath its name.
- To view a similar cluster in the current web browser tab, select the name of the similar cluster.
- To open a similar cluster in a new web browser tab, select the pop-out link icon beside the name.
- To merge two similar clusters, choose the Merge icon beside the name, then choose which cluster to merge into using the dialog, and then choose Merge. When you merge clusters, you can futher decide if you want Tamr to verify and enable suggestions, or verify and disable further suggestions for cluster assignments.
Navigation Using Two-Paned Cluster Browser
You can compare clusters by viewing two cluster browsers at the same time. You can also drag and drop records and clusters across panes for easy editing. Note that sorting and column order are always synced across both panes.
The two-paned browser is available in the Similar clusters text, the cluster browser, or the records browser. It appears upon hover for any record or cluster. Its icon is two horizontal bars, as shown in the folllowing example:

Similar clusters icon
Reviewing Cluster Information
To view details about a cluster:
- Navigate to the Clusters page of a Mastering project and select a cluster from the left-hand panel.
- Select Open details to view the cluster change metrics.
The cluster card displays the cluster name, the number of records in it, and the number of records that have been verified with suggestions disabled (this option is also known as "locked" and in the user interface appears as Verify and Disable Suggestions).
If you have previously published clusters, then in addition to the name, number of records, and number of locked records, the cluster card also displays the following information:
- A graph of cluster size over time.
- The number of records that have been added to and removed from the cluster since it was last published, and the date when it was last published.
- The cluster ID. This ID is permanent for the cluster and is guaranteed to never change.
After you generate, curate, and review clusters, you can publish them. Publishing clusters saves the current clusters as the latest version visible to downstream consumers, and creates a snapshot of the current state of clusters in Tamr. This snapshot is used to track changes over time.
For more information, see Publishing Clusters.
Updated over 4 years ago