Join
JOIN
allows you to combine rows from two datasets.
Learn more about SQL-style joins here. In general, joins have a left dataset and a right dataset. In Tamr transformations, the left dataset is always the result of the previous statement. Therefore the left dataset is implicit and is not explicitly listed as is the case in standard SQL. The right dataset is specified by name.
Example
Here's a simple example, for two datasets named parents.csv
and children.csv
.
In this example, source attributes from parents.csv
were mapped to two unified attributes. Empty unified attributes were also created to store information about the children.

This creates a preview with 3 empty attributes.

Using JOIN
in a script, you can populate each of these empty columns. After the JOIN
, SELECT
is used to display the columns with names matching those of the Unified Attributes. Note that the unified dataset is the left dataset, and the right dataset is specified by name (in this example, "children.csv"
). It is given the alias "Children" in order to make selecting attributes easier.

The final result looks as follows:

Previewing Joins
Preview executes transformations on small samples of each dataset. When executing a
JOIN
transformation between two sampled datasets, the probability of joining any pair of records is low. The preview may thus contain very few, or even no, joined records.To see the true join, apply the transformation to the non-sampled datasets by clicking Update Unified Dataset and subsequently view all records.
Additional Information
- Supported
JOIN
types are:INNER
,FULL OUTER
,LEFT OUTER
,RIGHT OUTER
, andCROSS
. TheJOIN
type can be written in front of theJOIN
keyword (e.g.FULL OUTER JOIN with Children
. If it is not present, theJOIN
type isINNER
by default.
Note:CROSSJOIN
is not recommended, as it is very easy to cause performance problems. JOIN
prefixes every added column with the name of the dataset it came from. This can be changed by writingJOIN WITH <dataset> AS <prefix> ON
.JOIN
statements can be chained together.- It is usually a good idea to follow up a
JOIN
with aSELECT
that keeps all the relevant columns out of the joins. - All
JOIN
keywords are case insensitive.
Updated over 4 years ago