User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Datasets Generated by Tamr

This page lists datasets that you can locate under Results and Internals on the Dataset Catalog page in Tamr.

Note that the Export Support column refers to the label in the Tamr user interface. You can export all datasets listed on this page through the API.

Topics:

Datasets in a Mastering Project

Dataset NameDescriptionExport SupportDelete Support
_dedup_featuresFeatures for all the rows and values in the source dataset.NoYes, can be recreated by generating pairs
_dedup_idfThe inverse document frequency for all the fields in the source dataset.YesYes, can be recreated by generating pairs
_dedup_non_null_countThe non-null count for each feature.YesYes, can be recreated by generating pairs
_dedup_dnf_binningBlocking done across all features.YesYes, can be recreated by generating pairs
_dedup_clusters_with_dataUnified dataset plus cluster IDs, cluster names, and whether the record is locked.YesYes, can be recreated by updating results
_dedup_clusters_with_statsRecord clustering with statistics.NoYes, can be recreated by updating results
_dedup_cluster_statsStatistics about the clusters.Yes
_dedup_published_clustersPublish clusters, keyed by unified dataset record IDYesNot recommended. Can be deleted and recreated but stored data will be lost.
_dedup_published_clusters_statsRecord clustering with statistics, keyed by persistent cluster IDNoNot recommended. Can be deleted and recreated but stored data will be lost.
_dedup_published_clusters_with_dataUnified dataset plus cluster IDs, cluster names, and whether the record is locked, keyed by unified dataset record IDYesNot recommended. Can be deleted and recreated but stored data will be lost.
<unified_dataset_name>_dedup_published_cluster_countsRecord and cluster counts as of the latest cluster publication.YesYes, can be recreated by publishing clusters.
_dedup_all_persistent_idsContains all persistent cluster IDs ever created for the Tamr deployment, keyed by unified persistent cluster IDYesNot recommended. Can be deleted and recreated but stored data will be lost.
_dedup_clusters_unionCurrent clusters and published clusters associated with records.Yes
_dedup_cluster_stats_unionStatistics of current clusters and published clusters.No
_dedup_clusters_with_stats_unionRecords in the current and published clusters joined with statistics of the associated current and published clusters.No
_dedup_imported_cluster_membersImported clusters.
_dedup_high_impact_questionsAll record pairs which are marked as high impact questions.Yes
_dedup_dnf_binningBlocking done across all features.Yes
_dedup_grouped_dnf_binningBlocking data grouped by clause and bin IDs.Yes
_dedup_labelsHuman-labeled pairs..Yes
_dedup_feedbackPairs label feedback.No
_dedup_pair_comments Pairs with comments.Yes
_dedup_signalsAll signals generated while comparing records (i.e. pairs and similarities).Yes
_dedup_human_signalsAll signals generated using human labels, comments, and feedback.Yes
_dedup_dnf_signalsAll signals generated using the deduplication model.Yes
_dedup_signals_predictionsAll signals along with the predictions and confidence scores.
_dedup_modelThe deduplication model decision tree.NoYes, can be recreated by training on pair labels.
_important_pairsPairs with labels, feedback, or comments.Yes
recordPairLabelRaw pair labels from persistence.YesYes, can be recreated by generating pairs
record_pair_feedbackRaw pair feedback from persistence.NoYes, can be recreated by generating pairs.
record_pair_id_stringRaw record pair ID strings from persistence.Yes
internal_linksRaw links from persistence.No
cluster_memberRaw verified cluster members from persistence.YesYes, can be recreated by generating pairs.
cluster_feedbackRaw cluster feedback from persistence.YesYes, can be recreated by updating model.

Datasets in a Categorization Project

Dataset NameDescriptionExport SupportDelete Support
_classificationsAll the classifications, manual and suggested, for the project.YesYes, can be recreated by updating categorizations
_classifications_with_dataAll the classifications, manual and suggested, with input record fields.YesYes, can be recreated by updating categorizations
_classification_histogram_boundariesThe histogram boundaries for numeric attributes.YesYes, can be recreated by updating categorizations
_classification_modelThe categorization model dataset.YesYes, can be recreated by updating categorizations
_classifications_average_confidencesThe average confidences of all records' classifications.YesYes, can be recreated by updating categorizations
categoryRaw categories from persistance.NoYes, can be recreated by updating categorizations
categorizationRaw categorizations from persistence.

Datasets in a Schema Mapping Project

Dataset NameDescriptionExport SupportDelete Support
mappingrecommendations_recipe<sm_recommendations_recipe_number>The schema mapping recommendations dataset.YesNo
mappingrecommendation_model_recipe<sm_recommendations_recipe_number>The schema mapping recommendation model dataset.YesNo