User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Uploading a Dataset Into Tamr

Upload a dataset into Tamr.

📘

Dataset File Format

Format: Comma separated values, .csv. Defaults for the delimiter, quote and escape characters are ,, " and ", respectively. To configure these, select Show advanced CSV options in the dataset upload window.

Encoding: UTF-8, or UTF-8 with BOM.

Header: A file must contain a header row.

To upload a dataset into Tamr:

  1. Navigate to the Dataset Catalog tab.
  2. Select Add new dataset.
  3. Select Choose File and then choose your delimited file, which is by default treated as a csv file.
  4. Optionally, choose Show advanced CSV options to configure the delimiter, quote, and escape characters.
  5. Optionally, choose Description and edit it.
  6. Choose a Primary Key for the file. If there is no such column in the file, select No Primary Key and Tamr will create one for you, which will be the row number.
  7. By default, Tamr has the Profile Dataset checkbox ticked and it will profile the dataset once it is uploaded. See Profiling a Dataset.
  8. Click Add Dataset.

🚧

Dataset Primary Key

Datasets must have a primary key column for Tamr to reference individual records. Each primary key value must not be null and is expected to be unique. If Tamr encounters a record that has duplicate primary key values, it overwrites it.

When you upload datasets without a primary key, Tamr generates a Primary Key attribute and populates it using internally-generated values that are guaranteed to be unique. You must reference these IDs when updating the dataset’s records using the Tamr API. See Modify a Dataset's Records.

📘

Access to a Dataset

To ensure that users in your group can access your dataset, attach a policy to it.


What’s Next