User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In

Datasets Generated by Tamr Core

A reference to the system-generated datasets that are categorized as Results and Internals in the dataset catalog.

This page lists datasets that you can locate using the Results and Internals filter on the Dataset Catalog page.

Note: In the sections that follow, Dataset Catalog Export refers to whether the Export option is available for datasets of that type when you use the Tamr Core user interface. You can export all datasets by using the API.

Datasets in a Mastering Project

_dedup_features

Description: Features for all the rows and values in the source dataset.

Dataset Catalog Export: No

Delete Support: Yes, can be recreated by generating pairs.

_dedup_idf

Description: The inverse document frequency for all the fields in the source dataset.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by generating pairs.

_dedup_non_null_count

Description: The non-null count for each feature.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by generating pairs.

_dedup_dnf_binning

Description: Blocking done across all features.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by generating pairs.

_dedup_clusters_with_data

Description: Unified dataset plus cluster IDs, cluster names, and whether the record is locked.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by updating results.

_dedup_clusters_with_stats

Description: Record clustering with statistics.

Dataset Catalog Export: No

Delete Support: Yes, can be recreated by updating results.

_dedup_cluster_stats

Description: Statistics about the clusters.

Dataset Catalog Export: Yes

Delete Support:

_dedup_cluster_average_linkage

Description: The average pairwise match probabilities in clusters.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by running a predict-clustering job.

_dedup_published_clusters

Description: Publish clusters, keyed by unified dataset record ID.

Dataset Catalog Export: Yes

Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.

_dedup_published_clusters_stats

Description: Record clustering with statistics, keyed by persistent cluster ID.

Dataset Catalog Export: No

Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.

_dedup_published_clusters_with_data

Description: Unified dataset plus cluster IDs, cluster names, and whether the record is locked, keyed by unified dataset record ID

Dataset Catalog Export: Yes

Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.

<unified_dataset_name>_dedup_published_cluster_counts

Description: Record and cluster counts as of the latest cluster publication.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by publishing clusters.

_dedup_all_persistent_ids

Description: Contains all persistent cluster IDs ever created for the instance, keyed by the unified persistent cluster ID.

Dataset Catalog Export: Yes

Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.

_dedup_clusters_union

Description: Current clusters and published clusters associated with records.

Dataset Catalog Export: Yes

Delete Support:

_dedup_cluster_stats_union

Description: Statistics of current clusters and published clusters.

Dataset Catalog Export: No

Delete Support:

_dedup_clusters_with_stats_union

Description: Records in the current and published clusters joined with statistics of the associated current and published clusters.

Dataset Catalog Export: No

Delete Support:

_dedup_imported_cluster_members

Description: Imported clusters.

Dataset Catalog Export:

Delete Support:

_dedup_high_impact_questions

Description: All record pairs which are marked as high impact questions.

Dataset Catalog Export: Yes

Delete Support:

_dedup_dnf_binning

Description: Blocking done across all features.

Dataset Catalog Export: Yes

Delete Support:

_dedup_grouped_dnf_binning

Description: Blocking data grouped by clause and bin IDs.

Dataset Catalog Export: Yes

Delete Support:

_dedup_labels

Description: Human-labeled pairs.

Dataset Catalog Export: Yes

Delete Support:

_dedup_feedback

Description: Pairs label feedback.

Dataset Catalog Export: No

Delete Support:

_dedup_pair_comments

Description: Pairs with comments.

Dataset Catalog Export: Yes

Delete Support:

_dedup_signals

Description: All signals generated while comparing records (i.e. pairs and similarities).

Dataset Catalog Export: Yes

Delete Support:

_dedup_human_signals

Description: All signals generated using human labels, comments, and feedback.

Dataset Catalog Export: Yes

Delete Support:

_dedup_dnf_signals

Description: All signals generated using the deduplication model.

Dataset Catalog Export: Yes

Delete Support:

_dedup_signals_predictions

Description: All signals along with the predictions and confidence scores.

Dataset Catalog Export:

Delete Support:

_dedup_model

Description: The deduplication model decision tree.

Dataset Catalog Export: No

Delete Support: Yes, can be recreated by training on pair labels.

_important_pairs

Description: Pairs with labels, feedback, or comments.

Dataset Catalog Export: Yes

Delete Support:

**recordPairLabel

Description: Raw pair labels from persistence.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by generating pairs.

record_pair_feedback

Description: Raw pair feedback from persistence.

Dataset Catalog Export: No

Delete Support: Yes, can be recreated by generating pairs.

record_pair_id_string

Description: Raw record pair ID strings from persistence.

Dataset Catalog Export: Yes

Delete Support:

internal_links

Description: Raw links from persistence.

Dataset Catalog Export: No

Delete Support:

cluster_member

Description: Raw verified cluster members from persistence.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by generating pairs.

cluster_feedback

Description: Raw cluster feedback from persistence.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by updating model.

Datasets in a Categorization Project

_categories

Description: The taxonomy for the project.

Dataset Catalog Export: Yes

Delete Support:

_classifications

Description: All the categorizations, manual and suggested, for the project.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by updating categorizations.

_classifications_with_data

Description: All the categorizations, manual and suggested, with input record fields. The attributes in this dataset include:

  • trainingFunctionCategoryPath (limited release) The category predicted by the machine learning model for a record that has a manually-assigned category label.
  • manualClassificationPath The category label manually assigned by an expert. Can be null.
  • finalCategoryPath The categorization accepted for the record. The final categorization combines manual labels, categorization functions of type override, and model predictions (in that order).

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by updating categorizations.

_classification_histogram_boundaries

Description: The histogram boundaries for numeric attributes.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by updating categorizations.

_classification_model

Description: The categorization model dataset.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by updating categorizations.

_classifications_average_confidences

Description: The average confidences of all records' categorizations.

Dataset Catalog Export: Yes

Delete Support: Yes, can be recreated by updating categorizations.

category

Description: Raw categories from HBase.

Dataset Catalog Export: No

Delete Support: Yes, can be recreated by updating categorizations.

categorization

Description: Raw categorizations from HBase.

Dataset Catalog Export:

Delete Support:

Datasets in a Schema Mapping Project

mapping_recommendations_

Description: The schema mapping recommendations dataset.

Dataset Catalog Export: Yes

Delete Support: No

mapping_recommendation_model_

Description: The schema mapping recommendation model dataset.

Dataset Catalog Export: Yes

Delete Support: No

Datasets in a Golden Records Project

_golden_records_overrides

Description: Dataset listing all the manual overrides that have been applied to attributes within a golden records project including user and timestamp.

Dataset Catalog Export: Yes

Delete Support: No

_golden_records_draft

Description: Golden Records dataset prior to publishing golden records.

Dataset Catalog Export: Yes

Delete Support: No

_golden_records_rule_output

Description: Golden Records dataset as a result of applying the Golden Record rules.

Dataset Catalog Export: Yes

Delete Support: No

_golden_records

Description: Golden Records dataset as a result of applying both golden record rules and manual overrides.

Dataset Catalog Export: Yes

Delete Support: No

_golden_records_pinned_cluster_input

Description: Dataset containing tamr_id values on a record level with their associated Golden Records values.

Dataset Catalog Export: Yes

Delete Support: No

_golden_records_cluster_profile

Description: Dataset containing the relationship between input cluster ids and the ids of the records contained in each cluster.

Dataset Catalog Export: Yes

Delete Support: No