Curators and admins can add metadata to input datasets and attributes, which can then be used to expand the options available for transforming data.

You add metadata by specifying a property for the input dataset as a whole or for one or more of its input attributes. Each property contains a key:value pair. See Adding a Metadata Property.

Use Cases for Metadata

Example use cases for transformations that use metadata follow.

Filter by Confidentiality Level

Add a privacy key and a value of P1, P2, P3, or P4 to each attribute to filter records to meet access level requirements.

Filter to Newest Records

Add a last_modified_date key and corresponding value to the input dataset. Use it and the transaction_date attribute in your unified dataset to filter out records more than n months or years old relative to the date of the dataset.

Make Location-Specific Changes

In a project that harmonizes datasets from different countries, add a country_of_origin key and a corresponding value to identify each input dataset.

Group Datasets by Type

In a project with a large number of datasets that fall into a few general types, instead of (or in addition to) using dataset tags to identify a dataset’s type, add a type key and value to allow for type-specific transformations.

Adding a Metadata Property

Curators and admins can add metadata to an input dataset in a mastering, categorization, or schema mapping project.

To add a metadata property:

Open a mastering, categorization, or schema mapping project.
On the project’s Datasets page, select a dataset.
Choose Open properties. A panel opens on the right side of the page.
Use the drop-down list to select the object you want to add metadata to: the dataset itself, or one of the attributes in the dataset.
Enter a name to identify the property key.
Enter a value for the key.
Select Add property. The Properties panel populate the first row of a table with the selected object and the specified key and value.

Repeat these steps to add more properties to the input dataset.

Example: Using Metadata in a Transformation

The metadata properties that you add for an input dataset are treated like a separate dataset, named metadata, with columns for attribute , datasetName, and key.

You access these columns using syntax like USE metadata("source_dataset_1.csv", "source_dataset_2.csv").

In the example that follows, metadata has been added to identify the source country of the input datasets: one input dataset contains data for Canadian companies, the other for US companies. The transformation validates the formatting of postal code values in the unified dataset.

// Join the metadata to your unified dataset, filtering to the metadata key 
// and attribute that you need.
// When using metadata on the dataset as a whole (not attribute specific) the 
// attribute field should be null.
LEFT JOIN WITH metadata("healthcare_facilities_canada.csv", "us_companies.csv") AS metadata
ON origin_source_name == metadata.datasetName AND metadata.attribute is null AND metadata.key == 'country_of_origin';

// You can use it in transformations directly
// Setup data types
SELECT *, to_string(postal_code) AS postal_code;

SELECT *,
CASE 
  // CA post codes should follow the pattern letter,number,letter number,letter,number
  WHEN metadata.value == 'CA' THEN matches('\\w\\d\\w \\d\\w\\d', postal_code)
  //US post codes should be exactly 5 digits
  WHEN metadata.value == 'US' THEN matches('\\d{5}', postal_code)
END AS postal_code_is_valid;

// or you can save the metadata value as a new column
SELECT *, metadata.value AS country;