Returns a stream of PredictionPair
Note: This endpoint pairs incoming query records that pass the blocking model with records in an existing mastering project and returns match/no-match labels for the pairs. Along with the v1/projects/{project}:matchClusters endpoint, it replaces the functionality of the v1/projects/{project}:match endpoint which remains available for backward compatibility.
The request body is a set of external records to match against records in an existing mastering project. The external records must have the same attributes as the unified dataset for the project, separated by newlines.
Requirements for Streaming Records
To provide multiple records as an input, or stream records, follow these guidelines:
- Swagger endpoints available within Tamr Core do not support streaming. To add multiple input records, use cURL for this endpoint.
- When making an LLM match request with cURL, use the
--data-binary
instead of the-d
option.
Response Fields
Match/no-match record information is returned as a response stream, so matches are returned as soon as the first batch of records is processed. For record pairs, the response output is similar to the following example:
{
"queryRecordId": "query-record-id",
"matchedRecordId": "matched-record-id",
"suggestedLabel": "MATCH",
"suggestedLabelConfidence": 0.94,
"matchProbability": 0.55,
"attributeSimilarities": {
"f1": 0.0,
"f2": 1.0,
"f3": 0.5
},
"userDefinedSignals": {
"s1": 0.6,
"s2": 0.8
},
"matchingFunctionsPredictions": {
"mf1": "MATCH",
"mf2": "NON_MATCH"
},
"overrideMatchingFunctionMajorityPrediction": 0.46,
"finalMatchProbability": 0.81
}
Record Parameters
Field | Description |
---|---|
queryRecordId | The ID of the record from the POST body. |
matchedRecordId | The Tamr ID of the record paired with the query record by the blocking model. |
matchedOriginSourceId | The origin dataset of the record returned as a match. |
matchedOriginRecordId | The origin ID of the record returned as a match. |
suggestedLabel | MATCH or NON-MATCH for the query record. |
suggestedLabelConfidence | The confidence level of the label. |
attributeSimilarities | A JSON object with each attribute that was compared and the confidence level for each attribute. |
matchProbability | The match probability of the record pair. This value is in the range [0, 1], where 0 indicates a highly probably non-match and 1 indicates a highly probably match. |
userDefinedSignals | A JSON object with the external ID of each user-defined signal to the signal value for the record pair. |
matchingFunctionsPredictions | A JSON object with the external ID of each matching function to the function output for the record pair. |
overrideMatchingFunctionMajorityPrediction | The overall match prediction across all override matching functions (if any), where 0 indicates a non-match and 1 indicates a match. |
finalMatchProbability | The final match probability of the record pair, which is equal to overrideMatchingFunctionMajorityPrediction if defined, and otherwise is equal to matchProbability . |
API Properties
- Request Type: Synchronous. Match requests use the mastering project's most recent model.
- Request Processing: Streaming
- Response Processing: Streaming
Note: This method has two query parameters that are available in limited release: transform
and defaultSourceName
. Tamr recommends using these parameters for testing purposes only.