HomeTamr Core GuidesTamr Core API Reference
Tamr Core GuidesTamr Core API ReferenceTamr Core TutorialsEnrichment API ReferenceSupport Help CenterLog In

Perform LLM Match

Run a job to find low latency matches on records or on associated clusters of records.

The request body is a series of records to match, separated by newlines, which have the same attributes as the unified dataset for the project. The request will look the same whether you are matching records or clusters. For instance:

{"recordId":"8793219","record":{"NAME":["MANNY'S CAR WASH"],"CITY":["OAKLAND"],"ZIP":["94603"],"PHONE":["5556325115"],"STATE_CODE":["CA", "MA"]}}
{"recordId":"8800364","record":{"NAME":["BEST BEAUTY SALON"],"CITY":["SONORA"],"ZIP":["95370"],"PHONE":["5555324000"],"STATE_CODE":["CA"]}}

Requirements for Streaming Records

If you need to provide multiple records as an input, or stream records, use these tips:

  • Swagger endpoints available within Tamr do not support a streaming response. To add multiple input records, use Curl for this endpoint.
  • When making an LLM match request with Curl, use the ‘--data-binary’ instead of the ‘-d’ option.

Response Fields

The response body looks different depending on whether the posted records were matched against records or clusters.

Output

Matching record information is returned as a response stream, so matches are returned as soon as the first batch of match records is processed. For records, the response is similar to the following example:

{"queryRecordId":"8793219","matchedRecordId":"7117244409972542111","matchedOriginSourceId":"source1.csv","matchedOriginRecordId":"rec-654-org","suggestedLabel":"MATCH","suggestedLabelConfidence":1.0,"attributeSimilarities":{"name_default_cosine":1.0,"city_default_cosine":1.0,"phone_default_cosine":1.0}}
{"queryRecordId":"8800364","matchedRecordId":"7117244409972542111","matchedOriginSourceId":"source1.csv","matchedOriginRecordId":"rec-6541-org","suggestedLabel":"NON_MATCH","suggestedLabelConfidence":1.0,"attributeSimilarities":{"name_default_cosine":1.0,"city_default_cosine":0.0,"phone_default_cosine":1.0}}

For clusters of records, the response looks similar to this example:

{"entityId": "8793219", "clusterId": "c3", "avgMatchProb": 0.73}
{"entityId": "8800364", "clusterId": "c2", "avgMatchProb": 0.89}

Record Parameters

Field

Description

queryRecordId

The ID of the record from the POST body.

matchedRecordId

The Tamr ID of the record returned as a match.

matchedOriginSourceId

The origin dataset of the record returned as a match.

matchedOriginRecordId

The origin ID of the record returned as a match.

suggestedLabel

MATCH or NON-MATCH for the record.

suggestedLabelConfidence

The confidence level of the label.

attributeSimilarities

A JSON of each individual attribute compared and the confidence level of each attribute.

Cluster Parameters

Field

Description

entityId

The ID of the record from the POST body.

clusterId

The ID of the cluster the record was compared against.

avgMatchProb

The average of the matching probability for the record against each record in the cluster.

API Properties

  • Request Type: Synchronous. Match requests use the Mastering project's most recent model.
  • Request Processing: Streaming
  • Response Processing: Streaming
  • Implementation Details: The following datasets are materialized:
    • Features of unified source (tokens, parsed numbers)
    • Binning data of the unified source
    • Clustering of the unified source

Steps in the LLM Process

The matching operation performs these steps:

  1. Runs pre-processing for similarity functions. Tokenizes records by treating numbers as numbers and converting text records to tokens. Bins records.
  2. Generates record pairs, that is, generates pairs of (input-record, existing-record) that pass the binning model.
  3. Predicts match or no match using the current matching model:
  • If using a 'record' match, rolls up pair match probabilities to get input record, or existing cluster associations.
  • If using a 'cluster' match, for each input record, selects the existing cluster with the highest similarity.
Language