User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
API Reference

Create a Dataset

Create a new dataset and define the schema

Create a dataset, specifying its name, type and (optionally) external id properties.

Optional keys include description, externalId, and tags. If no description is provided, the field is blank. If no externalId is provided, it is generated at creation time. External IDs must be unique and are case-insensitive. If no tags are provided, the tags array will remain empty.

For an example, post body

{
  "name": "Dataset created with pubapi",
  "keyAttributeNames": ["F1"],
  "description": "So much data in here!",
  "externalId": "Dataset created with pubapi",
  "tags": ["my-project"]
}

will create a new dataset named Dataset created with pubapi with the primary key column F1.

🚧

Key Attributes

The Key Attribute is the field in your dataset that Tamr will use as a unique identifier for each record. Note that, at this time, compound keys are not supported. This means only one field can be passed in as the keyAttribute.

Creating a new dataset automatically creates a string type attribute for the field in keyAttributeNames.

Loading a file from an external storage provider

To create a file backed by files in a storage provider, include the optional externalDatasetConfig field in your post body. This must include the storage provider name and the path to the file or a directory containing multiple files. For example:

"name": "External Dataset",
"description": "my dataset from foo",
"keyAttributeNames": ["id"],
"externalDatasetConfig": {
    "storageProviderName": "foo",
    "filePath": "/dataset.avro"
}

If the filepath points to a directory, all of the avro files in that directory will be combined together as the dataset to be added to Tamr.

🚧

File Types in Storage Providers

Note that only avro files are supported for storage providers, not csv.

Exporting a Tamr dataset to a file in an external storage provider

You can link an upstream dataset from Tamr to a downstream file in an external storage provider. To do this, include both the optional externalDatasetConfig and upstreamDatasetIds field in your post body. The filepath of the externalDatasetConfig must point to a directory, not a file, and the upstreamDatasetIds must reference the full id of a Tamr dataset.

When you materialize the external dataset, the contents of the upstream dataset will be written to one or more avro files, overwriting anything that may have previously existed in that directory.

"name": "External Dataset",
"description": "my dataset from foo",
"keyAttributeNames": ["id"],
"externalDatasetConfig": {
    "storageProviderName": "foo",
    "filePath": "myDirectory/mySubdirectory/"
},
"upstreamDatasetIds": ["unify://unified-data/v1/datasets/1"]

Response Fields

This endpoint returns a dataset object describing the dataset created, if successful.

Language