Taxonomy Design Principles
A well-designed taxonomy is easy to understand and consume. It is also easy to categorize the data into it.
Use the following tips to design or improve your taxonomy. Each of these tips are explained in further detail in this topic.
- Identify the purpose of the taxonomy.
- Identify candidates for the taxonomy.
- Follow the MECE principle (mutually exclusive, collectively exhaustive).
- Consider tier or level homogeneity.
- Decide whether you need leaf categories in all cases.
- Identify properties and attributes that don't require a taxonomy.
Identifying the Taxonomy Purpose
A critical aspect of taxonomy design is the purpose of the taxonomy. The purpose of a taxonomy is specific to its consumer audience.
For example, you can use a taxonomy to help procurement leaders understand annual spend. Such a taxonomy needs to categorize transactions into broad categories that reflect high-level segmentation of supply markets.
Identifying Candidates for a Taxonomy
It is important to identify what is being categorized.
For example, for categorizing records of purchase orders for machine parts, it is important to establish whether it is the part that is purchased or the final product sold that should be categorized. Clarifying any ambiguity early allows for consistent categorization.
Using the Mutually Exclusive, Collectively Exhaustive (MECE) Principle
The MECE principle is a methodology for arranging data. The more closely a taxonomy follows this principle, the easier it is to use for categorizing records. Though MECE can be difficult to realize in practice, remembering this principle when designing a taxonomy helps ensure that each record has exactly one category in which it belongs and that categories are comprehensive.
Considering Tier or Level Homogeneity
Categories within a single tier should be of a single kind.
For example, a taxonomy with perfectly homogeneous tiers is the biology taxonomy, where all first tier categories are of the single kind, domain, and all second tier categories are of kind kingdom. For the vast majority of taxonomies, it is not possible to realize perfect tier homogeneity. The MECE principle applies to the groups within the tier just as it applies to categories across tiers. Instead, a taxonomy should aim to have as few tier-inhomogeneous groups as possible.
Deciding Whether You Need Leaf Categories
A leaf category does not have to exist for every tier in the taxonomy.
Avoid adding catch-all leaf categories, such as other
or miscellaneous
. For example, if a dataset contains transactions for which you cannot find matching leaf categories in the taxonomy, Tamr recommends that you categorize transactions at a lower tier or rule them out of scope.
Note: "Out of scope" is not considered the same as "failed or unable to categorize". If Tamr Core cannot categorize a record, this typically indicates a data quality issue.
Identifying Properties and Attributes that Don't Require a Taxonomy
Treat attributes or properties as a separate single-level, or flat taxonomy.
For example, the elements of the periodic table, or colors of the rainbow, constitute a flat taxonomy because these attributes do not contain hierarchical information.
Do not include attributes and properties with a small, finite domain set categories into a hierarchical taxonomy. For example, consider these four categories: Fruit > Green
, Fruit > Red
, Vegetable > Green
, and Vegetable > Red
. There is no meaningful "parent to child" information conveyed in this hierarchy. To make this apparent, interchange the tiers.
Updated over 2 years ago