The request body is a series of records to match, separated by newlines, which have the same attributes as the unified dataset for the project. The request will look the same whether you are matching records or clusters. For instance:

{"recordId":"8793219","record":{"NAME":["MANNY'S CAR WASH"],"CITY":["OAKLAND"],"ZIP":["94603"],"PHONE":["5556325115"],"STATE_CODE":["CA", "MA"]}}
{"recordId":"8800364","record":{"NAME":["BEST BEAUTY SALON"],"CITY":["SONORA"],"ZIP":["95370"],"PHONE":["5555324000"],"STATE_CODE":["CA"]}}

Requirements for Streaming Records

If you need to provide multiple records as an input, or stream records, use these tips:

Swagger endpoints available within Tamr do not support a streaming response. To add multiple input records, use Curl for this endpoint.
When making an LLM match request with Curl, use the ‘--data-binary’ instead of the ‘-d’ option.

Response Fields

The response body looks different depending on whether the posted records were matched against records or clusters.

Output

Matching record information is returned as a response stream, so matches are returned as soon as the first batch of match records is processed. For records, the response is similar to the following example:

{"queryRecordId":"8793219","matchedRecordId":"7117244409972542111","matchedOriginSourceId":"source1.csv","matchedOriginRecordId":"rec-654-org","suggestedLabel":"MATCH","suggestedLabelConfidence":1.0,"attributeSimilarities":{"name_default_cosine":1.0,"city_default_cosine":1.0,"phone_default_cosine":1.0}}
{"queryRecordId":"8800364","matchedRecordId":"7117244409972542111","matchedOriginSourceId":"source1.csv","matchedOriginRecordId":"rec-6541-org","suggestedLabel":"NON_MATCH","suggestedLabelConfidence":1.0,"attributeSimilarities":{"name_default_cosine":1.0,"city_default_cosine":0.0,"phone_default_cosine":1.0}}

For clusters of records, the response looks similar to this example:

{"entityId": "8793219", "clusterId": "c3", "avgMatchProb": 0.73}
{"entityId": "8800364", "clusterId": "c2", "avgMatchProb": 0.89}

Record Parameters

Field	Description
queryRecordId	The ID of the record from the POST body.
matchedRecordId	The Tamr ID of the record returned as a match.
matchedOriginSourceId	The origin dataset of the record returned as a match.
matchedOriginRecordId	The origin ID of the record returned as a match.
suggestedLabel	`MATCH` or `NON-MATCH` for the record.
suggestedLabelConfidence	The confidence level of the label.
attributeSimilarities	A JSON of each individual attribute compared and the confidence level of each attribute.

Cluster Parameters

Field	Description
entityId	The ID of the record from the POST body.
clusterId	The ID of the cluster the record was compared against.
avgMatchProb	The average of the matching probability for the record against each record in the cluster.

API Properties

Request Type: Synchronous. Match requests use the Mastering project's most recent model.
Request Processing: Streaming
Response Processing: Streaming
Implementation Details: The following datasets are materialized:
- Features of unified source (tokens, parsed numbers)
- Binning data of the unified source
- Clustering of the unified source

Steps in the LLM Process

The matching operation performs these steps:

Runs pre-processing for similarity functions.Tokenizes records by treating numbers as numbers and converting text records to tokens. Bins records.
Generates record pairs, that is, generates pairs of (input-record, existing-record) that pass the binning model.
Predicts match or no match using the current matching model:

If using a 'record' match, rolls up pair match probabilities to get input record, or existing cluster associations.
If using a 'cluster' match, for each input record, selects the existing cluster with the highest similarity.