User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Overview

Join allows you to combine rows from two datasets. Learn more about SQL-style joins here.
In general, joins have a left dataset and a right dataset. In Transformations, the left dataset is always the result of the previous statement. Therefore the left dataset is implicit and is not explicitly listed as is the case in standard SQL. The right dataset is specified by name.

Example

Here's a simple example, for two datasets named "parents.csv" and "children.csv."

In this example, source attributes from "parents.csv" were mapped to two unified attributes. Empty unified attributes were also created to store information about the children.

1408

This creates a preview with 3 empty attributes.

1866

Using Join in Scripts, you can populate each of these empty columns. After the Join, Select is used to display the columns with names matching those of the Unified Attributes. Note that the unified dataset is the left dataset, and the right dataset is specified by name (i.e. "children.csv"). It is given the alias "Children" in order to make selecting attributes easier.

784

The final result looks as follows:

1786

📘

Previewing Joins

Preview executes transformations on small samples of each dataset. When executing a join transformation between two sampled datasets, the probability of joining any pair of records is low. The preview may thus contain very few, or even no, joined records.

To see the true join, apply the transformation to the non-sampled datasets by clicking Update Unified Dataset and subsequently view all records.

Additional Information

  • Supported join types are: inner, full outer, left outer, right outer, and cross. The join type can be written in front of the join keyword (e.g. full outer join with Children. If it is not present, the join type is inner by default.
  • join prefixes every added column with the name of the dataset it came from. This can be changed by writing join with <dataset> as <prefix> on
  • join statements can be chained together
  • It is usually a good idea to follow up a join with a select that keeps all the relevant columns out of the joins.
  • Remember that all join keywords are not case sensitive
  • cross join is not recommended, as it is very easy to cause performance problems.