Datasets Generated by Tamr Core
A reference to the system-generated datasets that are categorized as Results and Internals in the dataset catalog.
This page lists datasets that you can locate using the Results and Internals filter on the Dataset Catalog page.
Note: In the sections that follow, Dataset Catalog Export refers to whether the Export option is available for datasets of that type when you use the Tamr Core user interface. You can export all datasets by using the API.
Datasets in a Mastering Project
Description: Features for all the rows and values in the source dataset.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by generating pairs.
Description: The inverse document frequency for all the fields in the source dataset.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
Description: The non-null count for each feature.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
Description: Blocking done across all features.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
Description: Unified dataset plus cluster IDs, cluster names, and whether the record is locked.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating results.
Description: Record clustering with statistics.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by updating results.
Description: Statistics about the clusters.
Dataset Catalog Export: Yes
Delete Support:
Description: The average pairwise match probabilities in clusters.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by running a predict-clustering job.
Description: Publish clusters, keyed by unified dataset record ID.
Dataset Catalog Export: Yes
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
Description: Record clustering with statistics, keyed by persistent cluster ID.
Dataset Catalog Export: No
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
Description: Unified dataset plus cluster IDs, cluster names, and whether the record is locked, keyed by unified dataset record ID
Dataset Catalog Export: Yes
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
Description: Record and cluster counts as of the latest cluster publication.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by publishing clusters.
Description: Contains all persistent cluster IDs ever created for the instance, keyed by the unified persistent cluster ID.
Dataset Catalog Export: Yes
Delete Support: Not recommended. Can be deleted and recreated, however stored data is lost.
Description: Current clusters and published clusters associated with records.
Dataset Catalog Export: Yes
Delete Support:
Description: Statistics of current clusters and published clusters.
Dataset Catalog Export: No
Delete Support:
Description: Records in the current and published clusters joined with statistics of the associated current and published clusters.
Dataset Catalog Export: No
Delete Support:
Description: Imported clusters.
Dataset Catalog Export:
Delete Support:
Description: All record pairs which are marked as high impact questions.
Dataset Catalog Export: Yes
Delete Support:
Description: Blocking done across all features.
Dataset Catalog Export: Yes
Delete Support:
Description: Blocking data grouped by clause and bin IDs.
Dataset Catalog Export: Yes
Delete Support:
Description: Human-labeled pairs.
Dataset Catalog Export: Yes
Delete Support:
Description: Pairs label feedback.
Dataset Catalog Export: No
Delete Support:
Description: Pairs with comments.
Dataset Catalog Export: Yes
Delete Support:
Description: All signals generated while comparing records (i.e. pairs and similarities).
Dataset Catalog Export: Yes
Delete Support:
Description: All signals generated using human labels, comments, and feedback.
Dataset Catalog Export: Yes
Delete Support:
Description: All signals generated using the deduplication model.
Dataset Catalog Export: Yes
Delete Support:
Description: All signals along with the predictions and confidence scores.
Dataset Catalog Export:
Delete Support:
Description: The deduplication model decision tree.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by training on pair labels.
Description: Pairs with labels, feedback, or comments.
Dataset Catalog Export: Yes
Delete Support:
Description: Raw pair labels from persistence.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
Description: Raw pair feedback from persistence.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by generating pairs.
Description: Raw record pair ID strings from persistence.
Dataset Catalog Export: Yes
Delete Support:
Description: Raw links from persistence.
Dataset Catalog Export: No
Delete Support:
Description: Raw verified cluster members from persistence.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by generating pairs.
Description: Raw cluster feedback from persistence.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating model.
Datasets in a Categorization Project
Description: The taxonomy for the project.
Dataset Catalog Export: Yes
Delete Support:
Description: All the categorizations, manual and suggested, for the project.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
Description: All the categorizations, manual and suggested, with input record fields. The attributes in this dataset include:
(limited release) The category predicted by the machine learning model for a record that has a manually-assigned category label.manualClassificationPath
The category label manually assigned by an expert. Can be null.finalCategoryPath
The categorization accepted for the record. The final categorization combines manual labels, categorization functions of type override, and model predictions (in that order).
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
Description: The histogram boundaries for numeric attributes.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
Description: The categorization model dataset.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
Description: The average confidences of all records' categorizations.
Dataset Catalog Export: Yes
Delete Support: Yes, can be recreated by updating categorizations.
Description: Raw categories from HBase.
Dataset Catalog Export: No
Delete Support: Yes, can be recreated by updating categorizations.
Description: Raw categorizations from HBase.
Dataset Catalog Export:
Delete Support:
Datasets in a Schema Mapping Project
Description: The schema mapping recommendations dataset.
Dataset Catalog Export: Yes
Delete Support: No
Description: The schema mapping recommendation model dataset.
Dataset Catalog Export: Yes
Delete Support: No
Datasets in a Golden Records Project
Description: Dataset listing all the manual overrides that have been applied to attributes within a golden records project including user and timestamp.
Dataset Catalog Export: Yes
Delete Support: No
Description: Golden Records dataset prior to publishing golden records.
Dataset Catalog Export: Yes
Delete Support: No
Description: Golden Records dataset as a result of applying the Golden Record rules.
Dataset Catalog Export: Yes
Delete Support: No
Description: Golden Records dataset as a result of applying both golden record rules and manual overrides.
Dataset Catalog Export: Yes
Delete Support: No
Description: Dataset containing tamr_id
values on a record level with their associated Golden Records values.
Dataset Catalog Export: Yes
Delete Support: No
Description: Dataset containing the relationship between input cluster ids and the ids of the records contained in each cluster.
Dataset Catalog Export: Yes
Delete Support: No
Updated over 3 years ago