Datasets Generated by Tamr Core
A reference to the system-generated datasets that are categorized as Results and Internals in the dataset catalog.
This page lists datasets that you can locate using the Results and Internals filter on the Dataset Catalog page.
Note: In the sections that follow, Dataset Catalog Export refers to export in CSV format from the user interface (UI). "No" indicates that the CSV format is not available to export from the UI; however, other formats may be available for export from the UI. All formats are available for export through the API.
Datasets in a Mastering Project
<unifiedDatasetName>_dedup_features
<unifiedDatasetName>_dedup_features
Description: Features for all the rows and values in the source dataset.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by generating pairs.
<unifiedDatasetName>_dedup_idf
<unifiedDatasetName>_dedup_idf
Description: The inverse document frequency for all the fields in the source dataset.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
<unifiedDatasetName>_dedup_non_null_count
<unifiedDatasetName>_dedup_non_null_count
Description: The non-null count for each feature.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
<unifiedDatasetName>_dedup_grouped_entities
<unifiedDatasetName>_dedup_grouped_entities
Description: The grouped version of the unified dataset (the output of record grouping).
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by generating pairs.
<unifiedDatasetName>_dedup_entity_group_mapping
<unifiedDatasetName>_dedup_entity_group_mapping
Description: Mapping between the tamr_ids of the unified dataset and the groupUnifiedIds of the <unifiedDatasetName>_dedup_grouped_entities
dataset.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by generating pairs.
<unifiedDatasetName>_dedup_dnf_binning
<unifiedDatasetName>_dedup_dnf_binning
Description: Blocking done across all features.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
<unifiedDatasetName>_dedup_clusters_with_data
<unifiedDatasetName>_dedup_clusters_with_data
Description: Unified dataset plus cluster IDs, cluster names, and whether the record is locked.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating results.
<unifiedDatasetName>_dedup_clusters_with_stats
<unifiedDatasetName>_dedup_clusters_with_stats
Description: Record clustering with statistics.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by updating results.
<unifiedDatasetName>_dedup_cluster_stats
<unifiedDatasetName>_dedup_cluster_stats
Description: Statistics about the clusters.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_cluster_average_linkage
<unifiedDatasetName>_dedup_cluster_average_linkage
Description: The average pairwise match probabilities in clusters.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by running a predict-clustering job.
<unifiedDatasetName>_dedup_published_clusters
<unifiedDatasetName>_dedup_published_clusters
Description: Publish clusters, keyed by unified dataset record ID.
Dataset Catalog Export: Yes
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
<unifiedDatasetName>_dedup_published_clusters_stats
<unifiedDatasetName>_dedup_published_clusters_stats
Description: Record clustering with statistics, keyed by persistent cluster ID.
Dataset Catalog Export: No
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
<unifiedDatasetName>_dedup_published_clusters_with_data
<unifiedDatasetName>_dedup_published_clusters_with_data
Description: Unified dataset plus cluster IDs, cluster names, and whether the record is locked, keyed by unified dataset record ID
Dataset Catalog Export: Yes
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
<unifiedDatasetName>_dedup_published_cluster_counts
<unifiedDatasetName>_dedup_published_cluster_counts
Description: Record and cluster counts as of the latest cluster publication.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by publishing clusters.
<unifiedDatasetName>_dedup_all_persistent_ids
<unifiedDatasetName>_dedup_all_persistent_ids
Description: Contains all persistent cluster IDs ever created for the instance, keyed by the unified persistent cluster ID.
Dataset Catalog Export: Yes
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
<unifiedDatasetName>_dedup_clusters_union
<unifiedDatasetName>_dedup_clusters_union
Description: Current clusters and published clusters associated with records.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_cluster_stats_union
<unifiedDatasetName>_dedup_cluster_stats_union
Description: Statistics of current clusters and published clusters.
Dataset Catalog Export: No
Delete Support:
<unifiedDatasetName>_dedup_clusters_with_stats_union
<unifiedDatasetName>_dedup_clusters_with_stats_union
Description: Records in the current and published clusters joined with statistics of the associated current and published clusters.
Dataset Catalog Export: No
Delete Support:
<unifiedDatasetName>_dedup_imported_cluster_members
<unifiedDatasetName>_dedup_imported_cluster_members
Description: Imported clusters.
Dataset Catalog Export:
Delete Support:
<unifiedDatasetName>_dedup_high_impact_questions
<unifiedDatasetName>_dedup_high_impact_questions
Description: All pairs which are marked as high-impact questions.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_dnf_binning
<unifiedDatasetName>_dedup_dnf_binning
Description: Blocking done across all features.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_grouped_dnf_binning
<unifiedDatasetName>_dedup_grouped_dnf_binning
Description: Blocking data grouped by clause and bin IDs.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_labels
<unifiedDatasetName>_dedup_labels
Description: Human-labeled pairs.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_feedback
<unifiedDatasetName>_dedup_feedback
Description: Pairs label feedback.
Dataset Catalog Export: No
Delete Support:
<unifiedDatasetName>_dedup_pair_comments
<unifiedDatasetName>_dedup_pair_comments
Description: Pairs with comments.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_signals
<unifiedDatasetName>_dedup_signals
Description: All signals generated while comparing records (i.e. pairs and similarities).
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_human_signals
<unifiedDatasetName>_dedup_human_signals
Description: All signals generated using human labels, comments, and feedback.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_dnf_signals
<unifiedDatasetName>_dedup_dnf_signals
Description: All signals generated using the deduplication model.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_dedup_signals_predictions
<unifiedDatasetName>_dedup_signals_predictions
Description: All signals along with the predictions and confidence scores.
Dataset Catalog Export:
Delete Support:
<unifiedDatasetName>_dedup_model
<unifiedDatasetName>_dedup_model
Description: The deduplication model decision tree.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by training on pair labels.
<unifiedDatasetName>_important_pairs
<unifiedDatasetName>_important_pairs
Description: Pairs with labels, feedback, or comments.
Dataset Catalog Export: Yes
Delete Support:
recordPairLabel
Description: Raw pair labels from persistence.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
record_pair_feedback
Description: Raw pair feedback from persistence.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by generating pairs.
record_pair_id_string
Description: Raw pair ID strings from persistence.
Dataset Catalog Export: Yes
Delete Support:
internal_links
Description: Raw links from persistence.
Dataset Catalog Export: No
Delete Support:
cluster_member
Description: Raw verified cluster members from persistence.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
cluster_feedback
Description: Raw cluster feedback from persistence.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating model.
Datasets in a Categorization Project
<unifiedDatasetName>_categories
<unifiedDatasetName>_categories
Description: The taxonomy for the project.
Dataset Catalog Export: Yes
Delete Support:
<unifiedDatasetName>_classification_feedback
<unifiedDatasetName>_classification_feedback
Description: Stores categorization assignment and unverified feedback information.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by the feedback service.
<unifiedDatasetName>_classifications
<unifiedDatasetName>_classifications
Description: All the categorizations, manual and suggested, for the project.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
<unifiedDatasetName>_classifications_with_data
<unifiedDatasetName>_classifications_with_data
Description: All the categorizations, manual and suggested, with input record fields. The attributes in this dataset include:
trainingFunctionCategoryPath
(limited release) The category predicted by the machine learning model for a record that has a manually-assigned category label.manualClassificationPaths
The category labels manually assigned by an expert. Can be null.finalCategoryPath
The categorization accepted for the record. The final categorization combines manual labels, categorization functions of type override, and model predictions (in that order).
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
<unifiedDatasetName>_classification_histogram_boundaries
<unifiedDatasetName>_classification_histogram_boundaries
Description: The histogram boundaries for numeric attributes.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
<unifiedDatasetName>_classification_model
<unifiedDatasetName>_classification_model
Description: The categorization model dataset.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
<unifiedDatasetName>_classifications_average_confidences
<unifiedDatasetName>_classifications_average_confidences
Description: The average confidences of all records' categorizations.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
category
Description: Raw categories from HBase.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by updating categorizations.
categorization
Description: Raw categorizations from HBase.
Dataset Catalog Export:
Delete Support:
Datasets in a Schema Mapping Project
mapping\_recommendations\_<unifiedDatasetName>
mapping\_recommendations\_<unifiedDatasetName>
Description: The schema mapping recommendations dataset.
Dataset Catalog Export: Yes
Delete Support: No
mapping\_recommendation_model\_<unifiedDatasetName>
mapping\_recommendation_model\_<unifiedDatasetName>
Description: The schema mapping recommendation model dataset.
Dataset Catalog Export: Yes
Delete Support: No
Datasets in a Golden Records Project
<unifiedDatasetName>_golden_records_overrides
<unifiedDatasetName>_golden_records_overrides
Description: Dataset listing all the manual overrides that have been applied to attributes within a golden records project including user and timestamp.
Dataset Catalog Export: Yes
Delete Support: No
<unifiedDatasetName>_golden_records_draft
<unifiedDatasetName>_golden_records_draft
Description: Golden Records dataset prior to publishing golden records.
Dataset Catalog Export: Yes
Delete Support: No
<unifiedDatasetName>_golden_records_rule_output
<unifiedDatasetName>_golden_records_rule_output
Description: Golden Records dataset as a result of applying the Golden Record rules.
Dataset Catalog Export: Yes
Delete Support: No
<unifiedDatasetName>_golden_records
<unifiedDatasetName>_golden_records
Description: Golden Records dataset as a result of applying both golden record rules and manual overrides.
Dataset Catalog Export: Yes
Delete Support: No
<unifiedDatasetName>_golden_records_pinned_cluster_input
<unifiedDatasetName>_golden_records_pinned_cluster_input
Description: Dataset containing tamr_id
values on a record level with their associated Golden Records values.
Dataset Catalog Export: Yes
Delete Support: No
<unifiedDatasetName>_golden_records_cluster_profile
<unifiedDatasetName>_golden_records_cluster_profile
Description: Dataset containing the relationship between input cluster ids and the ids of the records contained in each cluster.
Dataset Catalog Export: Yes
Delete Support: No
Updated almost 2 years ago