When an admin creates a golden records project, they either select a dataset from a mastering project that has published clusters or upload a dataset with a record grouping key. Next, a curator composes attribute-specific consolidation rules that Tamr uses to generate golden records.
As a curator, you can review the resulting records, directly edit their values, and refine the consolidation rules used to generate them.
Creating and Previewing Consolidation Rules
For each golden record attribute, you create a consolidation rule. Each consolidation rule consists of:
- Input attributes
- Conditions (optional)
- Aggregation functions
- Expression aggregation functions (optional)
For convenience, when an admin creates a golden records project Tamr automatically generates a golden records dataset using a default rule with the "most common value" aggregation function and with golden record attributes matching 1-to-1 with records from the input dataset. You can then customize this rule for each attribute. For more information, see Editing Golden Record Consolidation Rules.
Rules Preview
It is often convenient to run a preview for how your rules will behave. You can preview rules after you edit a rule but before you have applied it to the entire cluster or group of records. See Previewing Rules for Golden Records.
Conditions
A condition filters the records in each cluster or group. You can apply a condition to filter the records within each cluster or group. Tamr applies the condition before it applies the aggregation function. This way, you can use a condition to filter down to the records that meet it, such as "is not empty", and then apply the aggregation function to the values that remain. For the list of conditions, see Conditions.
Aggregation Functions
You specify an aggregation function for every consolidation rule. To find the best value for an attribute, Tamr applies the aggregation function to the records in each cluster or group.
For example, when you apply the "most common value" function to the attribute state
, the rule returns the value Massachusetts
for records that have a value of 101
for the published_id
grouping key.
101
Massachusetts
101
Massachusetts
101
Ohio
101
Massachusetts
101
Ohio
101
Massachusetts
For the list of aggregation functions you can use, see Aggregation Functions.
Expression Aggregation Functions
You can use a code editor to compose custom aggregation functions. These functions are based on Tamr transformations. Tamr first applies conditions, filtering down to the records that meet it, and then applies expression aggregation functions. See Expression Aggregation Functions.
Editing Values
As a curator, you review golden records and can directly edit their values. The values you enter override the values selected by the consolidation rule for a golden record attribute. See Creating or Editing a Value Override for a Golden Record.
If you create new or update existing consolidation rules, value overrides remain unchanged. You can see the number of value overrides for a given attribute in the rules panel and filter to them. See Filtering To Records with Value Overrides.
Publishing Golden Records
After golden records are ready for downstream consumption, curators can publish the current version of golden records. Publishing makes golden records available to consumers.
The process of publishing golden records achieves the following:
- Saves the current golden records as the latest version visible to downstream consumers. For more information, see Publishing Golden Records.
- Creates or updates the Tamr generated dataset,
<project-name>_golden_records
. You can export it and run reports on it on over time.
After you publish the first version of the golden records, you can review and update them as new data or feedback become available and publish the next version. See Updating Golden Records.
Updated 3 months ago