This page lists datasets that you can locate under Results and Internals on the Dataset Catalog page in Tamr.

Note that the Export Support column refers to the label in the Tamr user interface. You can export all datasets listed on this page through the API.

Topics:

Datasets in a Mastering Project

Dataset Name	Description	Export Support	Delete Support
_dedup_features	Features for all the rows and values in the source dataset.	No	Yes, can be recreated by generating pairs
_dedup_idf	The inverse document frequency for all the fields in the source dataset.	Yes	Yes, can be recreated by generating pairs
_dedup_non_null_count	The non-null count for each feature.	Yes	Yes, can be recreated by generating pairs
_dedup_dnf_binning	Binning done across all features.	Yes	Yes, can be recreated by generating pairs
_dedup_clusters_with_data	Unified dataset plus cluster IDs, cluster names, and whether the record is locked.	Yes	Yes, can be recreated by updating results
_dedup_clusters_with_stats	Record clustering with statistics.	No	Yes, can be recreated by updating results
_dedup_cluster_stats	Statistics about the clusters.	Yes
_dedup_published_clusters	Publish clusters, keyed by unified dataset record ID	Yes	Not recommended. Can be deleted and recreated but stored data will be lost.
_dedup_published_clusters_stats	Record clustering with statistics, keyed by persistent cluster ID	No	Not recommended. Can be deleted and recreated but stored data will be lost.
_dedup_published_clusters_with_data	Unified dataset plus cluster IDs, cluster names, and whether the record is locked, keyed by unified dataset record ID	Yes	Not recommended. Can be deleted and recreated but stored data will be lost.
<unified_dataset_name>_dedup_published_cluster_counts	Record and cluster counts as of the latest cluster publication.	Yes	Yes, can be recreated by publishing clusters.
_dedup_all_persistent_ids	Contains all persistent cluster IDs ever created for the Tamr deployment, keyed by unified persistent cluster ID	Yes	Not recommended. Can be deleted and recreated but stored data will be lost.
_dedup_clusters_union	Current clusters and published clusters associated with records.	Yes
_dedup_cluster_stats_union	Statistics of current clusters and published clusters.	No
_dedup_clusters_with_stats_union	Records in the current and published clusters joined with statistics of the associated current and published clusters.	No
_dedup_imported_cluster_members	Imported clusters.
_dedup_high_impact_questions	All record pairs which are marked as high impact questions.	Yes
_dedup_dnf_binning	Binning done across all features.	Yes
_dedup_grouped_dnf_binning	Binning data grouped by clause and bin IDs.	Yes
_dedup_labels	Human-labeled pairs..	Yes
_dedup_feedback	Pairs label feedback.	No
_dedup_pair_comments	Pairs with comments.	Yes
_dedup_signals	All signals generated while comparing records (i.e. pairs and similarities).	Yes
_dedup_human_signals	All signals generated using human labels, comments, and feedback.	Yes
_dedup_dnf_signals	All signals generated using the deduplication model.	Yes
_dedup_signals_predictions	All signals along with the predictions and confidence scores.
_dedup_model	The deduplication model decision tree.	No	Yes, can be recreated by training on pair labels.
_important_pairs	Pairs with labels, feedback, or comments.	Yes
recordPairLabel	Raw pair labels from persistence.	Yes	Yes, can be recreated by generating pairs
record_pair_feedback	Raw pair feedback from persistence.	No	Yes, can be recreated by generating pairs.
record_pair_id_string	Raw record pair ID strings from persistence.	Yes
internal_links	Raw links from persistence.	No
cluster_member	Raw verified cluster members from persistence.	Yes	Yes, can be recreated by generating pairs.
cluster_feedback	Raw cluster feedback from persistence.	Yes	Yes, can be recreated by updating model.

Datasets in a Categorization Project

Dataset Name	Description	Export Support	Delete Support
_classifications	All the classifications, manual and suggested, for the project.	Yes	Yes, can be recreated by updating categorizations
_classifications_with_data	All the classifications, manual and suggested, with input record fields.	Yes	Yes, can be recreated by updating categorizations
_classification_histogram_boundaries	The histogram boundaries for numeric attributes.	Yes	Yes, can be recreated by updating categorizations
_classification_model	The categorization model dataset.	Yes	Yes, can be recreated by updating categorizations
_classifications_average_confidences	The average confidences of all records' classifications.	Yes	Yes, can be recreated by updating categorizations
category	Raw categories from persistance.	No	Yes, can be recreated by updating categorizations
categorization	Raw categorizations from persistence.

Datasets in a Schema Mapping Project

Dataset Name	Description	Export Support	Delete Support
mappingrecommendations_recipe<sm_recommendations_recipe_number>	The schema mapping recommendations dataset.	Yes	No
mappingrecommendation_model_recipe<sm_recommendations_recipe_number>	The schema mapping recommendation model dataset.	Yes	No