Data Attributes Generated by Tamr Core
A reference to the system-generated attributes in datasets that are categorized as Results and Internals in the dataset catalog.
For each project type, Tamr Core generates a set of output datasets, which are listed under Results and Internals on the Dataset Catalog page. See Datasets Generated by Tamr for more information about these datasets.
These datasets include system-generated attributes that you can use in data analytics. The table that follows describes these attributes and the project types in which they are used. In cases where the same definition applies to several attributes, they are grouped into a single table row.
Important: When creating unified attributes, do not use names that match these reserved, case-insensitive Tamr-generated attribute names.
Attribute | Defintion | Project Type |
---|---|---|
sourceId | Name of the unified dataset associated with the project. | All |
tamr_id | Unique primary key associated with each record in Tamr, system-generated by default. | All |
origin_source_name originSourceId | Name of the source dataset associated with the record for the current project. | Schema Mapping |
origin_entity_id | Primary key of the record within the source dataset for the current project. | Schema Mapping |
clusterId | Unique system-generated ID associated with each cluster. | Mastering |
persistentId | Persistent and unique system-generated ID associated with each published cluster. | Mastering |
suggestedClusterId | The 'clusterId' to which the clustering model believes the record should belong. This attribute can be different from 'persistentId' or 'clusterId' if the record is verified. | Mastering |
verificationType | The verification status of the record:
| Mastering |
verifiedClusterId | 'persistentId' or 'clusterId' in which the record is verified. This attribute can be different from 'suggestedClusterId' if the clustering model disagrees with your verification. Null for records that are not verified. | Mastering |
clusterName | Name of the cluster to which the record belongs. This is the most common value for all records in the cluster of the attribute selected to be the representation of the entity being mastered. | Mastering |
groupUnifiedId | For projects with the record grouping feature enabled, the unique ID for all records grouped together by the grouping key(s). | Mastering |
manualClassificationPaths | The list of full paths in the taxonomy to the node manually labelled for that record. | Categorization |
manualClassificationIds | The list of ids in the taxonomy of the node manually labelled for that record. | Categorization |
suggestedClassificationPath | The full path in the taxonomy to the node to which the categorization model believes that record should be labelled. | Categorization |
suggestedClassificationId | The id in the taxonomy of the node to which the categorization model believes that record should be labelled. | Categorization |
suggestedClassificationConfidence | The overall confidence [0-1] the categorization model has against the categorization it suggests. | Categorization |
suggestedClassificationTierConfidences | The confidence [0-1] the categorization model has at each tier of the taxonomy against the categorization it suggests. | Categorization |
suggestedClassificationPathAbove Threshold | The full path in the taxonomy to the node to which the categorization model believes that record should be labelled, which does not contradict the minimum classification confidence threshold set. | Categorization |
suggestedClassificationIdAboveThreshold | The id in the taxonomy of the node to which the categorization model believes that record should be labelled, which does not contradict the minimum categorization confidence threshold set. | Categorization |
suggestedClassificationConfidenceAbove Threshold | The overall confidence [0-1] the categorization model has against the categorization it suggests which does not contradict the minimum categorization confidence threshold set. | Categorization |
overrideFunctionCategoryId | The id in the taxonomy of the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label. | Categorization |
overrideFunctionCategoryPath | The full path in the taxonomy to the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label. | Categorization |
trainingFunctionCategoryId | The id in the taxonomy of the node labelled for that record by implemented training classification functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model. | Categorization |
trainingFunctionCategoryPath | The full path in the taxonomy to the node labelled for that record by implemented training categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model. | Categorization |
functionCategoryPaths | The output category paths of all categorization functions of all types (training, override and validation). | Categorization |
functionCategoryIds | The output category ids in the taxonomy for all categorization functions of all types (training, override and validation). | Categorization |
finalCategoryId | The id in the taxonomy of the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream. | Categorization |
finalCategoryPath | The full path in the taxonomy to the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream. | Categorization |
questionImpactRank | A long value indicating the rank of the record for high impact training. | Categorization |
highImpactRank | A Boolean flag to indicate whether a record belong to the top ranked high impact questions or not. By default, the top 1% of the records with highest questionImpactRank values are labeled as high impact. | Categorization |
Sources | The number of sources from which where the records originated. | Golden Records |
Cluster Size | The number of records included in a cluster. | Golden Records |
is_new_since_last_publish | Whether the record is new since the last publish. | Golden Records |
Updated almost 2 years ago