Golden Records Workflow
Curators compose rules to consolidate the values of specific attributes, and review and edit the resulting golden records.
When an author or admin creates a golden records project, they select a dataset from a mastering project that has published clusters. Next, a curator composes attribute-specific consolidation rules that Tamr Core uses to generate golden records.
As a curator, you can review the resulting records, directly edit their values, and refine the consolidation rules used to generate them.
Creating and Previewing Consolidation Rules
For each golden record attribute, you create a consolidation rule. Each consolidation rule consists of:
- Input attributes
- (Optional) Conditions
- Aggregation functions
- (Optional) Expression aggregation functions
For convenience, when an author or admin creates a golden records project, Tamr Core automatically generates a golden records dataset using a default rule with the "most common value" aggregation function and with golden record attributes matching 1-to-1 with records from the input dataset. You can then customize this rule for each attribute. For more information, see Editing Golden Record Consolidation Rules.
Previewing Rules
It is often convenient to run a preview for how your rules behave. You can preview rules after you edit a rule but before you have applied it to the entire cluster or group of records. See Previewing Rules for Golden Records.
Conditions
A condition filters the records in each cluster or group. You can apply a condition to filter the records within each cluster or group. Tamr Core applies the condition before it applies the aggregation function. This way, you can use a condition to filter down to the records that meet it, such as "is not empty", and then apply the aggregation function to the values that remain. For the list of conditions, see Conditions.
Aggregation Functions
You specify an aggregation function for every consolidation rule. To find the best value for an attribute, Tamr Core applies the aggregation function to the records in each cluster or group.
For example, when you apply the "most common value" function to the attribute state
, the rule returns the value Massachusetts
for records that have a value of 101
for the published_id
grouping key.
published_id | state |
---|---|
101 | Massachusetts |
101 | Massachusetts |
101 | Ohio |
101 | Massachusetts |
101 | Ohio |
101 | Massachusetts |
For the list of aggregation functions you can use, see Aggregation Functions.
Expression Aggregation Functions
You can use a code editor to compose custom aggregation functions. These functions are based on transformations. Tamr Core first applies conditions, filtering down to the records that meet it, and then applies expression aggregation functions. See Expression Aggregation Functions.
Editing Values
As a curator, you review golden records and can directly edit their values. The values you enter override the values selected by the consolidation rule for a golden record attribute. See Creating or Editing a Value Override for a Golden Record.
Note: The user interface for golden records projects offers this option to override or insert values in a dataset. Other project types do not support edits through the user interface.
If you create new or update existing consolidation rules, value overrides remain unchanged. You can view and filter the number of value overrides for a given attribute in the rules panel.
Publishing Golden Records
After golden records are ready for downstream consumption, curators can publish the current version of golden records. Publishing makes golden records available to consumers.
The process of publishing golden records achieves the following:
- Saves the current golden records as the latest version visible to downstream consumers. For more information, see Publishing Golden Records.
- Creates or updates the system generated dataset,
<project-name>_golden_records
. You can export it and run reports on it on over time.
After you publish the first version of the golden records, you can review and update them as new data or feedback become available and publish the next version. See Updating Golden Records.
Updated about 2 years ago