Filtering Clusters
To make your work with clusters more efficient, you can apply filters to clusters, records, or both.
Based on record grouping criteria and feedback for pairs, Tamr Core identifies records as duplicates and organizes them into clusters.
Tip: When Tamr Core initially generates clusters, all records that are in the same record group are assigned to the same cluster. The groups themselves do not appear on the Clusters page.
When you review clusters, you typically focus on a sample of representative clusters. Filters help you select clusters and records so that you can evaluate whether all records in a cluster are homogeneous and represent the same real world entity, and that no other records are missing from the cluster.
Filters for Clusters and Records
The Clusters page presents both a list of clusters on the left side of the page, and a list of records on the center-right side of the page. Each list has a dedicated filter to help you locate clusters and records for analysis, assignment, or review.
Applying a filter to the list of clusters on the left does not change the list of records shown. Similarly, applying a filter to the list of records does not change the list of clusters. However, note that when you select one or more clusters the list of records is reduced to show only records in those clusters. Applying a filter to that set of records can reduce the list further. As a result, you can, for example, filter to high-impact clusters only, select a cluster to examine further, and then filter to records from a specific source dataset within that high-impact cluster.
Different options for filtering are available for clusters and records.
Options for Filtering Clusters
You can select one or more of the following options to filter clusters.
Filter | Description and Options |
---|---|
My open assignments | Filter to your open or resolved assignments. Options:
|
High-impact | Filter to high-impact clusters. High-impact clusters are those clusters from which the Tamr model learns the most. In your initial cluster review, use the high-impact filter and curate all of the listed clusters. This helps ensure meaningful precision and recall metrics for clusters. |
Verification | Filter by verification status of cluster and records. Options:
|
Average confidence | Filter by Tamr Core's average confidence in it's cluster suggestions. Options:
|
Similarity | Filter by cluster similarity, which is the measure of how similar the records within the cluster are to one another. Specify a percentage from 0 to 100. |
Cluster changes from last publish | Filter by clusters that have or have not changed since they were last published. Options:
|
Source | Filter by selected source datasets.Clusters meet the filter if they include one or more records from the selected datasets. |
Options for Filtering Records
You can select one or more of the following options to filter records.
Filter | Description and Options |
---|---|
Verification | Filter by verification status of records. Options:
|
Suggestions | Filter by whether suggestions are enabled, disabled, or auto-suggested. Options:
|
Comments | Filter by whether reviewers have entered comments for the record. |
Test records | Filter by test records with and without problems to help identify issues with you data. Tamr Core uses test records to compute cluster precision and recall metrics. See test datasets Options:
|
Record changes from last publish | Filter by records that have changed since they were last published. Options:
|
Sources | Filter by selected source datasets.Records meet the filter if they are included in one or more of the selected datasets. |
Removing Filters
To remove your filtering choices, select remove next to filter .
Updated almost 2 years ago