Learn more about SQL-style joins here. In general, joins have a left dataset and a right dataset. In Tamr transformations, the left dataset is always the result of the previous statement. Therefore the left dataset is implicit and is not explicitly listed as is the case in standard SQL. The right dataset is specified by name.

Example

Here's a simple example, for two datasets named parents.csv and children.csv.

In this example, source attributes from parents.csv were mapped to two unified attributes. Empty unified attributes were also created to store information about the children.

This creates a preview with 3 empty attributes.

Using JOIN in a script, you can populate each of these empty columns. After the JOIN, SELECT is used to display the columns with names matching those of the Unified Attributes. Note that the unified dataset is the left dataset, and the right dataset is specified by name (in this example, "children.csv"). It is given the alias "Children" in order to make selecting attributes easier.

The final result looks as follows:

📘
Previewing Joins
Preview executes transformations on small samples of each dataset. When executing a JOIN transformation between two sampled datasets, the probability of joining any pair of records is low. The preview may thus contain very few, or even no, joined records.
To see the true join, apply the transformation to the non-sampled datasets by clicking Update Unified Dataset and subsequently view all records.

Additional Information

Supported JOIN types are: INNER, FULL OUTER, LEFT OUTER, RIGHT OUTER, and CROSS. The JOIN type can be written in front of the JOIN keyword (e.g. FULL OUTER JOIN with Children. If it is not present, the JOIN type is INNER by default.
Note: CROSSJOIN is not recommended, as it is very easy to cause performance problems.
JOIN prefixes every added column with the name of the dataset it came from. This can be changed by writing JOIN WITH <dataset> AS <prefix> ON .
JOIN statements can be chained together.
It is usually a good idea to follow up a JOIN with a SELECT that keeps all the relevant columns out of the joins.
All JOIN keywords are case insensitive.

Example

📘Previewing Joins

Additional Information

📘
Previewing Joins