Filtering Clusters
To make your work with clusters more efficient, you can apply filters to clusters, records, or both.
Based on feedback for record pairs, Tamr Core identifies records as duplicates and groups them into clusters. When you review clusters, you typically focus on a sample of representative clusters. Filters help you select clusters and records so that you can evaluate whether all records in a cluster are homogeneous and represent the same real world entity, and that no other records are missing from the cluster.
Filters for Clusters and Records
The Clusters page presents both a list of clusters on the left side of the page, and a list of records on the center-right side of the page. Each list has a dedicated filter to help you locate clusters and records for analysis, assignment, or review.
Applying a filter to the list of clusters on the left does not change the list of records shown. Similarly, applying a filter to the list of records does not change the list of clusters. However, note that when you select one or more clusters the list of records is reduced to show only records in those clusters. Applying a filter to that set of records can reduce the list further. As a result, you can, for example, filter to high-impact clusters only, select a cluster to examine further, and then filter to records from a specific source dataset within that high-impact cluster.
Different options for filtering are available for clusters and records.
Options for Filtering Clusters
You can select one or more of the following options to filter clusters.
Filter | Description and Options |
---|---|
My open assignments | Filter to your open or resolved assignments. Options: - My open assignments - My resolved assignments |
High-impact | Filter to high-impact clusters. High-impact clusters are those clusters from which the Tamr model learns the most. In your initial cluster review, use the high-impact filter and curate all of the listed clusters. This helps ensure meaningful precision and recall metrics for clusters. |
Verification | Filter by verification status of cluster and records. Options: - Has records verified in current cluster - With move suggested - Has records verified in another cluster - Has no verified records |
Average confidence | Filter by Tamr Core's average confidence in it's cluster suggestions. Options: - High - Medium - Low - Custom range |
Similarity | Filter by cluster similarity, which is the measure of how similar the records within the cluster are to one another. Specify a percentage from 0 to 100. |
Cluster changes from last publish | Filter by clusters that have or have not changed since they were last published. Options: - Unchanged - With changes - Records added from new or updated sources - Records moved from other clusters - Records moved to other clusters - Records deleted from sources - New clusters - Empty clusters |
Source | Filter by selected source datasets. Clusters meet the filter if they include one or more records from the selected datasets. |
Options for Filtering Records
You can select one or more of the following options to filter records.
Filter | Description and Options |
---|---|
Verification | Filter by verification status of records. Options: - Verified - Verified in current cluster - Verified in another cluster - Not verified |
Suggestions | Filter by whether suggestions are enabled, disabled, or auto-suggested. Options: - Suggestions enabled - Move suggested - No move suggested - Suggestions disabled - Suggestions auto-accepted |
Comments | Filter by whether reviewers have entered comments for the record. |
Test records | Filter by test records with and without problems to help identify issues with you data. Tamr Core uses test records to compute cluster precision and recall metrics. See test datasets Options: - Test records problems - Test records with only precision problems - Test records with only recall problems - Test records with both precision and recall problems - Test records with no problems |
Record changes from last publish | Filter by records that have changed since they were last published. Options: - New records added from updated sources - Records moved between clusters - Records stayed in current clusters - Records deleted from updated sources |
Sources | Filter by selected source datasets. Records meet the filter if they are included in one or more of the selected datasets. |
Removing Filters
To remove your filtering choices, select remove next to filter .
Updated over 2 years ago