Searching Records
Search records in the unified dataset using basic or advanced search syntax.
Basic Search
Search is case-insensitive.
Exact Phrases
To search for exact phrases, use double quotes.
"Machine Learning"
Attribute Names
To search for attribute names in datasets, use the prefix tamr__
.
tamr__vendor_name:Tamr
Golden Records
To search for attribute names in golden records, use the prefix gr__
.
gr__<attribute_name>:"<value>"
To search for an empty string in golden records, first select all records, and then use the negation of the raw
keyword, to find empty or blank values:
gr__<attribute_name>:* AND NOT gr__<attribute_name>.raw:*
Logical Operators
The logical operators AND
, OR
and NOT
are capitalized.
Instead of AND
, OR
and NOT
, you can also use &&
, ||
, and !
, respectively.
tamr__vendor_name:("Tamr" OR "Datatamr") AND NOT tamr__part_description:deterministic
The +
operator requires that results contain a term.
The -
operator requires the results do not contain a term.
For example:
tamr__description:(plate +sheet)
Note: Omitting an operator defaults to OR
.
For example, the following queries are equivalent:
tamr__description:(plate sheet)
and
tamr__description:(plate OR sheet)
Range Filtering
To use a range filter in search, use the operators <
(less than), <=
(less than or equal), >
(greater than), and >=
(greater than or equal) . Although you can also use =
(equals), this is equivalent to a normal search.
Note: Do not add spaces between the attribute name, the operator, or the filter boundary.
To run a search that compares string
type values, use syntax as in the following example. This search filters to records where a string
type attribute age
is greater than or equal to 18.
tamr__age:>=18
You can combine range filtering in searches with other operators, for example:
tamr__age:(>=18 AND <=25)
Sorting for string
type values is useful for filtering on dates. If dates use ISO format (YYYY-MM-DD), you can run searches that sort and compare string
type values, as in the following example. This search finds records with dates in January 2020.
tamr__date:(>=2020-01-01 AND <2020-02-01)
To run a search that compares numeric values, specify .numeric
after the attribute value, as in the following example. Both integer and floating point type numbers are supported.
range tamr__age.numeric:>=18
Advanced Search
Fixed Length Expressions
To run a search with fixed length expressions, use the ?
wildcard for each character (single length).
tamr__part_id: BD???908
Variable Length Expressions
To run a search with variable length expressions, use the *
wildcard for zero or more characters.
tamr__vendor_name: Tam*r
Blank values
To search for records with blank values, use the .raw
keyword.
tamr__vendor_spend.raw:""
Null values
First select all records, and then use the negation of the .raw
keyword to find null values. That is, search for something true, then use AND
for the column in question that is NOT
being anything.
tamr__tamr_id:* AND NOT tamr__vendor_spend.raw:*
Regular Expressions
Regular expressions must be wrapped with a forward slash /
.
tamr__vendor_number: /0*100250/
Note: Use regular expressions to search for all records that contain numbers in a given field. For example, to return records that contain digits 0-9 in vendor_number
column, use:
tamr__vendor_number: /[0-9]/
Searching for Values with Punctuation
In regular searches for attribute values, value text is tokenized by Elasticsearch. Tokenization throws away any special characters.
To search for characters that are excluded by tokenization, search on the .raw
facet of an attribute. For example, instead of tamr__text:"search string"
, search on tamr__text.raw:"search string"
.
When searching using .raw
, ensure that your search matches the entire text of the attribute values being searched and use regular expressions when needed. For example:
tamr__myattribute.raw:/.*\|.*/
Excluding Records
To run a successful search query that excludes items, first select records, and then use NOT
to filter out some records from those results.
Note: Negative search queries require using the wildcard *
operator in the first part of the query, and negative search in the second part. First, select all records with a wildcard, and then add a negative search to filter out records based on your criteria.
For example, to exclude a keyword from search, first use the wildcard *
to select all records, and then use NOT
to filter out the keyword.
tamr__vendor_name:(* NOT "Tamr")
To find records that do not have a level 2 categorization, first use the wildcard *
to search for all records that are categorized with "Level1" category, and then exclude records categorized with "Level2" category, as in the following example:
tamr__Level1:* AND NOT tamr__Level2:*
To exclude records with a specified category, select all records using the wildcard *
, and then exclude the category, specifying the categoryId
, as in the following example:
* AND NOT suggestedCategorization.categoryId: 852
For more information, see the following topics in the Elasticsearch Reference:
Columns With Spaces
To search for an attribute that has spaces in the name, escape each space with a \
.
tamr__Column\ With\ Spaces:value
Escaping Reserved Characters
The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
.
To run searches for items that contain reserved characters, escape them with a backslash \
.
tamr__vendor_formula:\(1\+1\)\=2
Advanced Metadata Search
If you categorize and add your own labels to records, Tamr Core attaches metadata to these records. Mastering and categorization projects have their own lists of metadata that become associated with records or record pairs.
Searching in the Categorization Projects
You can search metadata that Tamr adds to categorized records.
To search any of the metadata:
Use the form manualCategorization.[metadata]: <searchTerm>
, where [metadata]
is any of the metadata from the following list. You can choose to replace manualCategorization
with suggestedCategorization
.
Tamr Core adds the following metadata to categorized records:
- categoryId.
manualCategorization
orsuggestedCategorization
. Data type:Long
. The unique ID of the category that will be used in searches. - reason.
manualCategorization
. Data type:String
. The description that users can add when categorizing records. You can search these descriptions by keyword, exact phrase, or a REGEX match. - score.
suggestedCategorization
. Data type:Long
. The confidence score that the machine learning model associates with the suggested categorization. The score range is [0,1]. - timestamp.
manualCategorization
orsuggestedCategorization
. Data type:Int
. The timestamp when the record was categorized. - username.
manualCategorization
. Data type:String
. The username of the user who created the manual categorization.
Searching in Mastering Projects
To search for records on the Clusters tab of a Mastering project, use the following metadata. For more information, see Searching Cluster Records.
- Cluster Id. Use this variable to find records in a newly formed cluster (that is, a cluster that does not yet have a persistent cluster Id). For example,
cluster.id.raw:"<cluster-id>"
. - Published Cluster Id. Use this variable to find records associated with a persistent cluster Id. For example,
publishedCluster.id.raw:"<cluster-id>"
.
For example, to search for clusters that contain more than 40 records, you can use this syntax:
cluster.recordCount:(>40)
Updated over 2 years ago