User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
API Reference

Perform LLM Match

Run a job to find low latency matches on records or on associated clusters of records.

The request body is a series of records to match, separated by newlines, which have the same attributes as the unified dataset for the project. The request will look the same whether you are matching records or clusters. For instance:

{"recordId":"8793219","record":{"NAME":["MANNY'S CAR WASH"],"CITY":["OAKLAND"],"ZIP":["94603"],"PHONE":["5556325115"],"STATE_CODE":["CA", "MA"]}}
{"recordId":"8800364","record":{"NAME":["BEST BEAUTY SALON"],"CITY":["SONORA"],"ZIP":["95370"],"PHONE":["5555324000"],"STATE_CODE":["CA"]}}

Requirements for Streaming Records

If you need to provide multiple records as an input, or stream records, use these tips:

  • Swagger endpoints available within Tamr do not support a streaming response. To add multiple input records, use Curl for this endpoint.
  • When making an LLM match request with Curl, use the ‘--data-binary’ instead of the ‘-d’ option.

Response Fields

The response body looks different depending on whether the posted records were matched against records or clusters.

Output

Matching record information is returned as a response stream, so matches are returned as soon as the first batch of match records is processed. For records, the response is similar to the following example:

{"queryRecordId":"8793219","matchedRecordId":"7117244409972542111","matchedOriginSourceId":"source1.csv","matchedOriginRecordId":"rec-654-org","suggestedLabel":"MATCH","suggestedLabelConfidence":1.0,"attributeSimilarities":{"name_default_cosine":1.0,"city_default_cosine":1.0,"phone_default_cosine":1.0}}
{"queryRecordId":"8800364","matchedRecordId":"7117244409972542111","matchedOriginSourceId":"source1.csv","matchedOriginRecordId":"rec-6541-org","suggestedLabel":"NON_MATCH","suggestedLabelConfidence":1.0,"attributeSimilarities":{"name_default_cosine":1.0,"city_default_cosine":0.0,"phone_default_cosine":1.0}}

For clusters of records, the response looks similar to this example:

{"entityId": "8793219", "clusterId": "c3", "avgMatchProb": 0.73}
{"entityId": "8800364", "clusterId": "c2", "avgMatchProb": 0.89}

Record Parameters

FieldDescription
queryRecordIdThe ID of the record from the POST body.
matchedRecordIdThe Tamr ID of the record returned as a match.
matchedOriginSourceIdThe origin dataset of the record returned as a match.
matchedOriginRecordIdThe origin ID of the record returned as a match.
suggestedLabelMATCH or NON-MATCH for the record.
suggestedLabelConfidenceThe confidence level of the label.
attributeSimilaritiesA JSON of each individual attribute compared and the confidence level of each attribute.

Cluster Parameters

FieldDescription
entityIdThe ID of the record from the POST body.
clusterIdThe ID of the cluster the record was compared against.
avgMatchProbThe average of the matching probability for the record against each record in the cluster.

API Properties

  • Request Type: Synchronous. Match requests use the Mastering project's most recent model.
  • Request Processing: Streaming
  • Response Processing: Streaming
  • Implementation Details: The following datasets are materialized:
    • Features of unified source (tokens, parsed numbers)
    • Binning data of the unified source
    • Clustering of the unified source

Steps in the LLM Process

The matching operation performs these steps:

  1. Runs pre-processing for similarity functions.Tokenizes records by treating numbers as numbers and converting text records to tokens. Bins records.
  2. Generates record pairs, that is, generates pairs of (input-record, existing-record) that pass the binning model.
  3. Predicts match or no match using the current matching model:
  • If using a 'record' match, rolls up pair match probabilities to get input record, or existing cluster associations.
  • If using a 'cluster' match, for each input record, selects the existing cluster with the highest similarity.
Language