User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
API Reference

Create a dataset

You can POST to this endpoint to create a dataset.

Required keys:

  • name
  • keyAttributeNames. At least one keyAttributeName is required in the array.

Optional keys:

  • description. If not provided, the field is left blank.
  • externalId. If not provided, generated on creation.
  • externalDatasetConfig. If not provided, the field is left blank
  • tags. If not provided, the array is left empty.
  • upstreamDatasetIds. If not provided, the array is left blank

Example: to create a dataset named "Dataset created with API" and a single string type attribute for the primary key "F1":

{
  "name": "Dataset created with pubapi",
  "keyAttributeNames": ["F1"],
  "description": "So much data in here!",
  "externalId": "Dataset created with API",
  "tags": ["my-project"]
}

Loading a File from an External Storage Provider

To create a dataset backed by files in a storage provider, include the optional externalDatasetConfig key in the request body. This object must include the storageProviderName and the filePath to ta single file or a directory containing multiple files.

Example:

"name": "External Dataset",
"description": "my dataset from foo",
"keyAttributeNames": ["id"],
"externalDatasetConfig": {
    "storageProviderName": "foo",
    "filePath": "/dataset.avro"
}

If the filePath points to a directory, all of the .avro files in that directory are combined together as the dataset to be added to Tamr Core.

Exporting a Tamr Core Dataset to a File in an External Storage Provider

You can link an upstream dataset from Tamr Core to a downstream file in an external storage provider. To do this, include both the optional externalDatasetConfig and upstreamDatasetIds keys in your request body. The filePath of the externalDatasetConfig must point to a directory, not a single file, and the upstreamDatasetIds must reference the full id of a Tamr Core dataset.

When you materialize the external dataset, the contents of the upstream dataset are written to one or more .avro files, overwriting anything that may have previously existed in that directory.

Example:

"name": "External Dataset",
"description": "my dataset from foo",
"keyAttributeNames": ["id"],
"externalDatasetConfig": {
    "storageProviderName": "foo",
    "filePath": "myDirectory/mySubdirectory/"
},
"upstreamDatasetIds": ["unify://unified-data/v1/datasets/1"]

Response Fields

On success, this call returns a dataset object describing the dataset created.

Language
Click Try It! to start a request and see the response here!