Tamr Documentation

Quickstart

Quickly get up and running with common use-cases

ALPHA

This feature is currently in ALPHA.

If you would like to know more, please reach out to Tamr support.

Before starting, make sure you install the API Client for your language.

Configure your Client

Start by importing the API Client library and authentication provider:

import unify_api_v1 as api
from unify_api_v1.auth import UsernamePasswordAuth

Next, create an authentication provider and use that to create an authenticated client:

# replace with your credentials
auth = UsernamePasswordAuth('username', 'password')
unify = api.Client(auth)

Secure credentials

We hardcode the credentials in the code snippet for simplicity.

In production, you should read in your credentials securely via config file or environment variables.

By default, the client tries to find the Unify instance on localhost.
To point to a different host, set the host argument when instantiating the Client.
For example, to connect to 10.20.0.1:

unify = api.Client(auth, host='10.20.0.1')

Loop over Top-level Collections

The API Clients expose some top-level collections such as Projects, Datasets, Operations, etc...

You can access these collections through the client and loop over their members with simple for-loops. E.g. for Projects:

for project in unify.projects:
  print(project.name)

Fetch a specific resource

If you know the identifier for a specific resource, you can ask for it directly.

Top-level collections expose by_* methods (e.g. ProjectCollection.by_relative_id or DatasetCollection.by_name) to fetch specific resources by those IDs. E.g. fetching a specific project by its relative ID:

relative_id = "projects/1" # replace with your relative ID
project = unify.projects.by_relative_id(relative_id)

Models, not data

Note that the API Clients return models rather than the JSON data exposed by the RESTful HTTP API.

You can access a representation of the data via the .data accessor on a Model object (e.g. project.data) if necessary, but we recommend you use the Model objects directly.

Access models through relations

Models often have relations to other models (e.g. a Project has a Unified Dataset).
You can access related models via convenience methods.

For example, to get the Unified Dataset for a particular Project, you could determine the Dataset ID for the Unified Dataset and then fetch it. But it's much easier to simply use the accessor from the Project Model object:

project = unify.projects.by_relative_id("projects/1")
ud = project.unified_dataset()

Kick-off a Unify operation

Some methods on Model objects can kick-off long-running Unify operations.

Here, kick-off a "Unified Dataset refresh" operation:

operation = project.unified_dataset().refresh()
assert op.succeeded()

By default, the API Clients expose a synchronous interface for Unify operations

Asynchronous operations

You can opt-in to an asynchronous interface via the asynchronous keyword argument for methods that kick-off Unify operations.

operation = project.unified_dataset().refresh(asynchronous=True)
# do asynchronous stuff while operation is running
operation.wait() # hangs until operation finishes
assert op.succeeded()

What's Next

See step-by-step guides for our supported workflows

Workflows
Reference documentation