Union All
UNION ALL
concatenates tables together.
For example, if you have a table for every state, you can use UNION ALL
to concatenate all of them into a single table with values for the whole country.
UNION ALL WITH table1, table2, ...;
The schema of each table is carried into the unioned table; no attribute is dropped. If the initial schemas match, then the unioned table has the same exact schema. If there is an attribute in one schema that is different, then the unioned table includes it at the end of the schema.
Note: Matching attributes must have matching types. If the schemas have a matching attribute, the attribute must be the same type in each schema for them to be unioned correctly.
Important: Changing a
UNION ALL
statement can result in an unintentional change totamr_id
values in the unified dataset. To avoid data loss and maintain the stability oftamr_id
in mastering and categorization projects, as a best practice Tamr recommends that you explicitly set thetamr_id
to a value you're sure is stable in the last transformation. See Managing Primary Keys. In addition, both theorigin_source_name
andorigin_entity_id
must exist in the datasets being unioned or you must create and populate them with strings.
UNION ALL Example
For example, you have Massachusetts and Vermont datasets containing resident information, and you want to union them.
Massachusetts.csv contains:
ID | Name | Surname |
---|---|---|
002 | John | Smith |
003 | Sarah | Brown |
004 | Alex | Jones |
Vermont.csv contains:
First_Name | Surname |
---|---|
Abby | Elliot |
Ben | Anderson |
Carol | Baldwin |
To union these two tables together, you write a script such as:
USE Massachusetts.csv; UNION ALL WITH Vermont.csv
After the transformation runs, the new, unioned table contains:
ID | Name | Surname | First_Name |
---|---|---|---|
002 | John | Smith | |
003 | Sarah | Brown | |
004 | Alex | Jones | |
Elliot | Abby | ||
Anderson | Ben | ||
Baldwin | Carol |
Updated almost 3 years ago