Data Attributes Generated by Tamr Core

For each project type, Tamr Core generates a set of output datasets, which are listed under Results and Internals on the Dataset Catalog page. See Datasets Generated by Tamr for more information about these datasets.

These datasets include system-generated attributes that you can use in data analytics. The table that follows describes these attributes and the project types in which they are used. In cases where the same definition applies to several attributes, they are grouped into a single table row.

Important: When creating unified attributes, do not use names that match these reserved, case-insensitive Tamr-generated attribute names.

Attribute	Defintion	Project Type
sourceId	Name of the unified dataset associated with the project.	All
tamr_id entityId recordId	Unique primary key associated with each record in Tamr, system-generated by default.	All
origin_source_name originSourceId	Name of the source dataset associated with the record for the current project.	Schema Mapping Mastering Categorization
origin_entity_id originEntityId	Primary key of the record within the source dataset for the current project.	Schema Mapping Mastering Categorization
clusterId	Unique system-generated ID associated with each cluster.	Mastering Golden Records
persistentId	Persistent and unique system-generated ID associated with each published cluster.	Mastering Golden Records
suggestedClusterId	The 'clusterId' to which the clustering model believes the record should belong. This attribute can be different from 'persistentId' or 'clusterId' if the record is verified.	Mastering
verificationType	The verification status of the record: - SUGGEST: verify and enable suggestion. - MOVABLE: verify and auto accept suggestion. - LOCK: verify and disable suggestion. - UNVERIFIED: previously verified, currently unverified. - null: never been verified.	Mastering
verifiedClusterId	'persistentId' or 'clusterId' in which the record is verified. This attribute can be different from 'suggestedClusterId' if the clustering model disagrees with your verification. Null for records that are not verified.	Mastering
clusterName	Name of the cluster to which the record belongs. This is the most common value for all records in the cluster of the attribute selected to be the representation of the entity being mastered.	Mastering
groupUnifiedId	If the pregroupby feature (limited release) is active, this is the unique ID for all records grouped together by the pregroupby attributes.	Mastering
manualClassificationPath	The full path in the taxonomy to the node manually labelled for that record.	Categorization
manualClassificationId	The id in the taxonomy of the node manually labelled for that record.	Categorization
suggestedClassificationPath	The full path in the taxonomy to the node to which the categorization model believes that record should be labelled.	Categorization
suggestedClassificationId	The id in the taxonomy of the node to which the categorization model believes that record should be labelled.	Categorization
suggestedClassificationConfidence	The overall confidence [0-1] the categorization model has against the categorization it suggests.	Categorization
suggestedClassificationTierConfidences	The confidence [0-1] the categorization model has at each tier of the taxonomy against the categorization it suggests.	Categorization
suggestedClassificationPathAbove Threshold	The full path in the taxonomy to the node to which the categorization model believes that record should be labelled, which does not contradict the minimum classification confidence threshold set.	Categorization
suggestedClassificationIdAboveThreshold	The id in the taxonomy of the node to which the categorization model believes that record should be labelled, which does not contradict the minimum categorization confidence threshold set.	Categorization
suggestedClassificationConfidenceAbove Threshold	The overall confidence [0-1] the categorization model has against the categorization it suggests which does not contradict the minimum categorization confidence threshold set.	Categorization
overrideFunctionCategoryId	The id in the taxonomy of the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label.	Categorization
overrideFunctionCategoryPath	The full path in the taxonomy to the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label.	Categorization
trainingFunctionCategoryId	The id in the taxonomy of the node labelled for that record by implemented training classification functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model.	Categorization
trainingFunctionCategoryPath	The full path in the taxonomy to the node labelled for that record by implemented training categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model.	Categorization
functionCategoryPaths	The output category paths of all categorization functions of all types (training, override and validation).	Categorization
functionCategoryIds	The output category ids in the taxonomy for all categorization functions of all types (training, override and validation).	Categorization
finalCategoryId	The id in the taxonomy of the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream.	Categorization
finalCategoryPath	The full path in the taxonomy to the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream.	Categorization
questionImpactRank	A long value indicating the rank of the record for high impact training.	Categorization
highImpactRank	A Boolean flag to indicate whether a record belong to the top ranked high impact questions or not. By default, the top 1% of the records with highest questionImpactRank values are labeled as high impact.	Categorization
Sources	The number of sources from which where the records originated.	Golden Records
Cluster Size	The number of records included in a cluster.	Golden Records
is_new_since_last_publish	Whether the record is new since the last publish.	Golden Records