User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Filtering Clusters

To make your work with clusters more efficient, you can apply filters to clusters, records, or both.

Based on record grouping criteria and feedback for pairs, Tamr Core identifies records as duplicates and organizes them into clusters.

Tip: When Tamr Core initially generates clusters, all records that are in the same record group are assigned to the same cluster. The groups themselves do not appear on the Clusters page.

When you review clusters, you typically focus on a sample of representative clusters. Filters help you select clusters and records so that you can evaluate whether all records in a cluster are homogeneous and represent the same real world entity, and that no other records are missing from the cluster.

Filters for Clusters and Records

The Clusters page presents both a list of clusters on the left side of the page, and a list of records on the center-right side of the page. Each list has a dedicated filter to help you locate clusters and records for analysis, assignment, or review.

1210

The filter icon on the left side of the Clusters page applies to clusters, and the icon in the center of the page applies to records.

Applying a filter to the list of clusters on the left does not change the list of records shown. Similarly, applying a filter to the list of records does not change the list of clusters. However, note that when you select one or more clusters the list of records is reduced to show only records in those clusters. Applying a filter to that set of records can reduce the list further. As a result, you can, for example, filter to high-impact clusters only, select a cluster to examine further, and then filter to records from a specific source dataset within that high-impact cluster.

Different options for filtering are available for clusters and records.

Options for Filtering Clusters

You can select one or more of the following options to filter clusters.

Filter

Description and Options

My open assignments

Filter to your open or resolved assignments.

Options:

  • My open assignments
  • My resolved assignments

High-impact

Filter to high-impact clusters.

High-impact clusters are those clusters from which the Tamr model learns the most. In your initial cluster review, use the high-impact filter and curate all of the listed clusters. This helps ensure meaningful precision and recall metrics for clusters.

Verification

Filter by verification status of cluster and records.

Options:

  • Has records verified in current cluster
    • With move suggested
  • Has records verified in another cluster
  • Has no verified records

Average confidence

Filter by Tamr Core's average confidence in it's cluster suggestions.

Options:

  • High
  • Medium
  • Low
  • Custom range

Similarity

Filter by cluster similarity, which is the measure of how similar the records within the cluster are to one another.

Specify a percentage from 0 to 100.

Cluster changes from last publish

Filter by clusters that have or have not changed since they were last published.

Options:

  • Unchanged
  • With changes
    • Records added from new or updated sources
    • Records moved from other clusters
    • Records moved to other clusters
    • Records deleted from sources
    • New clusters
    • Empty clusters
SourceFilter by selected source datasets.Clusters meet the filter if they include one or more records from the selected datasets.

Options for Filtering Records

You can select one or more of the following options to filter records.

Filter

Description and Options

Verification

Filter by verification status of records.

Options:

  • Verified
    • Verified in current cluster
    • Verified in another cluster
  • Not verified

Suggestions

Filter by whether suggestions are enabled, disabled, or auto-suggested.

Options:

  • Suggestions enabled
    • Move suggested
    • No move suggested
  • Suggestions disabled
  • Suggestions auto-accepted

Comments

Filter by whether reviewers have entered comments for the record.

Test records

Filter by test records with and without problems to help identify issues with you data. Tamr Core uses test records to compute cluster precision and recall metrics. See test datasets

Options:

  • Test records problems
    • Test records with only precision problems
    • Test records with only recall problems
    • Test records with both precision and recall problems
  • Test records with no problems

Record changes from last publish

Filter by records that have changed since they were last published.

Options:

  • New records added from updated sources
  • Records moved between clusters
  • Records stayed in current clusters
  • Records deleted from updated sources
SourcesFilter by selected source datasets.Records meet the filter if they are included in one or more of the selected datasets.

Removing Filters

To remove your filtering choices, select remove close next to filter filter.