Tamr Documentation

Categorization

Understand the basics of a categorization project such as taxonomy design and expert sourcing.

A Categorization Project solves the task of placing records into categories. It is a top-down organizational project designed to classify individual records into a collection of hierarchical categories, referred to as a taxonomy.

A project can be begun by adding data and adding a taxonomy, with either task being completed first.

Working with the Unified Dataset

One first step in a categorization project is to add one-or-more sources to be categorized from Unify's registered datasets to the project's datasets. A project's sources are focused on a single logical entity, e.g. customers or products. Once added to the project, sources must be mapped to a single unified dataset and initially configured for Unify's machine learning to understand (Working with the Unified Dataset).

Working with the Taxonomy

The alternative first step is to load the target taxonomy into the project. Understanding and working with the taxonomy as Unify classifies records with increasing confidence and reviewers create more and more feedback is crucial to the success of the project (Working with the Taxonomy).

Categorizing Records

The next step is to begin categorizing records into the taxonomy. Once a minimum of 5 records have been categorized, Unify can begin to identify matches between words / tokens contained within values of each dataset record and words / tokens already associated with each category of the taxonomy. This enables Unify to suggest a classification for each record based on the initial model it generates (Categorizing Records).

Curator and Reviewer Categorizations

Unify then produces simple high-impact questions regarding whether or not certain records, that are representative of a large portion of the unified dataset records, are categorized appropriately. For example, if Unify has low confidence regarding whether or not a record pertaining to “1 inch turbine bolts” is in fact part of the “Bolt” category within the organization’s taxonomy, it will ask a Reviewer for their feedback - driving accuracy and enhancing future automation. The reviewer's feedback is then incorporated into the dataset and Unify’s models (Curator and Reviewer Categorizations).


Categorization


Understand the basics of a categorization project such as taxonomy design and expert sourcing.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.