User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Data Attributes Generated by Tamr Core

A reference to the system-generated attributes in datasets that are categorized as Results and Internals in the dataset catalog.

For each project type, Tamr Core generates a set of output datasets, which are listed under Results and Internals on the Dataset Catalog page. See Datasets Generated by Tamr for more information about these datasets.

These datasets include system-generated attributes that you can use in data analytics. The table that follows describes these attributes and the project types in which they are used. In cases where the same definition applies to several attributes, they are grouped into a single table row.

important Important: When creating unified attributes, do not use names that match these reserved, case-insensitive Tamr-generated attribute names.

Attribute

Defintion

Project Type

sourceId

Name of the unified dataset associated with the project.

All

tamr_id
entityId
recordId

Unique primary key associated with each record in Tamr, system-generated by default.

All

origin_source_name originSourceId

Name of the source dataset associated with the record for the current project.

Schema Mapping
Mastering
Categorization

origin_entity_id
originEntityId

Primary key of the record within the source dataset for the current project.

Schema Mapping
Mastering
Categorization

clusterId

Unique system-generated ID associated with each cluster.

Mastering
Golden Records

persistentId

Persistent and unique system-generated ID associated with each published cluster.

Mastering
Golden Records

suggestedClusterId

The 'clusterId' to which the clustering model believes the record should belong. This attribute can be different from 'persistentId' or 'clusterId' if the record is verified.

Mastering

verificationType

The verification status of the record:

  • SUGGEST: verify and enable suggestion.
  • MOVABLE: verify and auto accept suggestion.
  • LOCK: verify and disable suggestion.
  • UNVERIFIED: previously verified, currently unverified.
  • null: never been verified.
Mastering
verifiedClusterId'persistentId' or 'clusterId' in which the record is verified. This attribute can be different from 'suggestedClusterId' if the clustering model disagrees with your verification. Null for records that are not verified.Mastering
clusterNameName of the cluster to which the record belongs. This is the most common value for all records in the cluster of the attribute selected to be the representation of the entity being mastered.Mastering
groupUnifiedIdFor projects with the record grouping feature enabled, the unique ID for all records grouped together by the grouping key(s).Mastering
manualClassificationPathsThe list of full paths in the taxonomy to the node manually labelled for that record.Categorization
manualClassificationIdsThe list of ids in the taxonomy of the node manually labelled for that record.Categorization
suggestedClassificationPathThe full path in the taxonomy to the node to which the categorization model believes that record should be labelled.Categorization
suggestedClassificationIdThe id in the taxonomy of the node to which the categorization model believes that record should be labelled.Categorization
suggestedClassificationConfidenceThe overall confidence [0-1] the categorization model has against the categorization it suggests.Categorization
suggestedClassificationTierConfidencesThe confidence [0-1] the categorization model has at each tier of the taxonomy against the categorization it suggests.Categorization
suggestedClassificationPathAbove
Threshold
The full path in the taxonomy to the node to which the categorization model believes that record should be labelled, which does not contradict the minimum classification confidence threshold set.Categorization
suggestedClassificationIdAboveThresholdThe id in the taxonomy of the node to which the categorization model believes that record should be labelled, which does not contradict the minimum categorization confidence threshold set.Categorization
suggestedClassificationConfidenceAbove
Threshold
The overall confidence [0-1] the categorization model has against the categorization it suggests which does not contradict the minimum categorization confidence threshold set.Categorization
overrideFunctionCategoryIdThe id in the taxonomy of the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label.Categorization
overrideFunctionCategoryPathThe full path in the taxonomy to the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label.Categorization
trainingFunctionCategoryIdThe id in the taxonomy of the node labelled for that record by implemented training classification functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model.Categorization
trainingFunctionCategoryPathThe full path in the taxonomy to the node labelled for that record by implemented training categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model.Categorization
functionCategoryPathsThe output category paths of all categorization functions of all types (training, override and validation).Categorization
functionCategoryIdsThe output category ids in the taxonomy for all categorization functions of all types (training, override and validation).Categorization
finalCategoryIdThe id in the taxonomy of the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream.Categorization
finalCategoryPathThe full path in the taxonomy to the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream.Categorization
questionImpactRankA long value indicating the rank of the record for high impact training.Categorization
highImpactRankA Boolean flag to indicate whether a record belong to the top ranked high impact questions or not. By default, the top 1% of the records with highest questionImpactRank values are labeled as high impact.Categorization
SourcesThe number of sources from which where the records originated.Golden Records
Cluster SizeThe number of records included in a cluster.Golden Records
is_new_since_last_publishWhether the record is new since the last publish.Golden Records