HomeTamr Core GuidesTamr Core API Reference
Tamr Core GuidesTamr Core API ReferenceTamr Core TutorialsEnrichment API ReferenceSupport Help CenterLog In

Data Attributes Generated by Tamr Core

A reference to the system-generated attributes in datasets that are categorized as Results and Internals in the dataset catalog.

For each project type, Tamr Core generates a set of output datasets, which are listed under Results and Internals on the Dataset Catalog page. See Datasets Generated by Tamr for more information about these datasets.

These datasets include system-generated attributes that you can use in data analytics. The table that follows describes these attributes and the project types in which they are used. In cases where the same definition applies to several attributes, they are grouped into a single table row.

importantimportant Important: When creating unified attributes, do not use names that match these reserved, case-insensitive Tamr-generated attribute names.

Attribute

Defintion

Project Type

sourceId

Name of the unified dataset associated with the project.

All

tamr_id
entityId
recordId

Unique primary key associated with each record in Tamr, system-generated by default.

All

origin_source_name originSourceId

Name of the source dataset associated with the record for the current project.

Schema Mapping
Mastering
Categorization

origin_entity_id
originEntityId

Primary key of the record within the source dataset for the current project.

Schema Mapping
Mastering
Categorization

clusterId

Unique system-generated ID associated with each cluster.

Mastering
Golden Records

persistentId

Persistent and unique system-generated ID associated with each published cluster.

Mastering
Golden Records

suggestedClusterId

The 'clusterId' to which the clustering model believes the record should belong. This attribute can be different from 'persistentId' or 'clusterId' if the record is verified.

Mastering

verificationType

The verification status of the record:

  • SUGGEST: verify and enable suggestion.
  • MOVABLE: verify and auto accept suggestion.
  • LOCK: verify and disable suggestion.
  • UNVERIFIED: previously verified, currently unverified.
  • null: never been verified.

Mastering

verifiedClusterId

'persistentId' or 'clusterId' in which the record is verified. This attribute can be different from 'suggestedClusterId' if the clustering model disagrees with your verification.

Null for records that are not verified.

Mastering

clusterName

Name of the cluster to which the record belongs. This is the most common value for all records in the cluster of the attribute selected to be the representation of the entity being mastered.

Mastering

groupUnifiedId

If the pregroupby feature (limited release) is active, this is the unique ID for all records grouped together by the pregroupby attributes.

Mastering

manualClassificationPath

The full path in the taxonomy to the node manually labelled for that record.

Categorization

manualClassificationId

The id in the taxonomy of the node manually labelled for that record.

Categorization

suggestedClassificationPath

The full path in the taxonomy to the node to which the categorization model believes that record should be labelled.

Categorization

suggestedClassificationId

The id in the taxonomy of the node to which the categorization model believes that record should be labelled.

Categorization

suggestedClassificationConfidence

The overall confidence [0-1] the categorization model has against the categorization it suggests.

Categorization

suggestedClassificationTierConfidences

The confidence [0-1] the categorization model has at each tier of the taxonomy against the categorization it suggests.

Categorization

suggestedClassificationPathAbove
Threshold

The full path in the taxonomy to the node to which the categorization model believes that record should be labelled, which does not contradict the minimum classification confidence threshold set.

Categorization

suggestedClassificationIdAboveThreshold

The id in the taxonomy of the node to which the categorization model believes that record should be labelled, which does not contradict the minimum categorization confidence threshold set.

Categorization

suggestedClassificationConfidenceAbove
Threshold

The overall confidence [0-1] the categorization model has against the categorization it suggests which does not contradict the minimum categorization confidence threshold set.

Categorization

overrideFunctionCategoryId

The id in the taxonomy of the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label.

Categorization

overrideFunctionCategoryPath

The full path in the taxonomy to the node labelled for that record by implemented override categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Override categorization function will override the label of the ML model but will not override a manual label.

Categorization

trainingFunctionCategoryId

The id in the taxonomy of the node labelled for that record by implemented training classification functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model.

Categorization

trainingFunctionCategoryPath

The full path in the taxonomy to the node labelled for that record by implemented training categorization functions. This is the output of the function conflict resolution if multiple functions are valid for that record. Training categorization function will automatically create further training labels to train your ML model.

Categorization

functionCategoryPaths

The output category paths of all categorization functions of all types (training, override and validation).

Categorization

functionCategoryIds

The output category ids in the taxonomy for all categorization functions of all types (training, override and validation).

Categorization

finalCategoryId

The id in the taxonomy of the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream.

Categorization

finalCategoryPath

The full path in the taxonomy to the node labelled for that record (taking into account manual, function and ML labels). This is the output you should use downstream.

Categorization

questionImpactRank

A long value indicating the rank of the record for high impact training.

Categorization

highImpactRank

A Boolean flag to indicate whether a record belong to the top ranked high impact questions or not. By default, the top 1% of the records with highest questionImpactRank values are labeled as high impact.

Categorization

Sources

The number of sources from which where the records originated.

Golden Records

Cluster Size

The number of records included in a cluster.

Golden Records

is_new_since_last_publish

Whether the record is new since the last publish.

Golden Records


Did this page help you?