User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Searching Records

Search records in the unified dataset using basic or advanced search syntax.

Basic Search

Search is case-insensitive.

Exact Phrases

To search for exact phrases, use double quotes.

"Machine Learning"

Attribute Names

To search for attribute names in datasets, use the prefix tamr__.

tamr__vendor_name:Tamr

Golden Records

To search for attribute names in golden records, use the prefix gr__.

gr__<attribute_name>:"<value>"

To search for an empty string in golden records, first select all records, and then use the negation of the raw keyword, to find empty or blank values:

gr__<attribute_name>:* AND NOT gr__<attribute_name>.raw:*

Logical Operators

The logical operators AND, OR and NOT are capitalized.
Instead of AND, OR and NOT, you can also use &&, ||, and !, respectively.

tamr__vendor_name:("Tamr" OR "Datatamr") AND NOT tamr__part_description:deterministic

The + operator requires that results contain a term.
The - operator requires the results do not contain a term.

For example:

tamr__description:(plate +sheet)

Note: Omitting an operator defaults to OR.

For example, the following queries are equivalent:

tamr__description:(plate sheet)

and

tamr__description:(plate OR sheet)

Range Filtering

To use a range filter in search, use the operators < (less than), <= (less than or equal), > (greater than), and >= (greater than or equal) . Although you can also use = (equals), this is equivalent to a normal search.

Note: Do not add spaces between the attribute name, the operator, or the filter boundary.

To run a search that compares string type values, use syntax as in the following example. This search filters to records where a string type attribute age is greater than or equal to 18.

tamr__age:>=18

You can combine range filtering in searches with other operators, for example:

tamr__age:(>=18 AND <=25)

Sorting for string type values is useful for filtering on dates. If dates use ISO format (YYYY-MM-DD), you can run searches that sort and compare string type values, as in the following example. This search finds records with dates in January 2020.

tamr__date:(>=2020-01-01 AND <2020-02-01)

To run a search that compares numeric values, specify .numeric after the attribute value, as in the following example. Both integer and floating point type numbers are supported.

range tamr__age.numeric:>=18

Advanced Search

Fixed Length Expressions

To run a search with fixed length expressions, use the ? wildcard for each character (single length).

tamr__part_id: BD???908

Variable Length Expressions

To run a search with variable length expressions, use the * wildcard for zero or more characters.

tamr__vendor_name: Tam*r

Blank values

To search for records with blank values, use the .raw keyword.

tamr__vendor_spend.raw:""

Null values

First select all records, and then use the negation of the .raw keyword to find null values. That is, search for something true, then use AND for the column in question that is NOT being anything.

tamr__tamr_id:* AND NOT tamr__vendor_spend.raw:*

Regular Expressions

Regular expressions must be wrapped with a forward slash /.

tamr__vendor_number: /0*100250/

Note: Use regular expressions to search for all records that contain numbers in a given field. For example, to return records that contain digits 0-9 in vendor_number column, use:

tamr__vendor_number: /[0-9]/

Searching for Values with Punctuation

In regular searches for attribute values, value text is tokenized by Elasticsearch. Tokenization throws away any special characters.

To search for characters that are excluded by tokenization, search on the .raw facet of an attribute. For example, instead of tamr__text:"search string", search on tamr__text.raw:"search string".

When searching using .raw, ensure that your search matches the entire text of the attribute values being searched and use regular expressions when needed. For example:

tamr__myattribute.raw:/.*\|.*/ 

Excluding Records

To run a successful search query that excludes items, first select records, and then use NOT to filter out some records from those results.

Note: Negative search queries require using the wildcard * operator in the first part of the query, and negative search in the second part. First, select all records with a wildcard, and then add a negative search to filter out records based on your criteria.

For example, to exclude a keyword from search, first use the wildcard * to select all records, and then use NOT to filter out the keyword.

tamr__vendor_name:(* NOT "Tamr")

To find records that do not have a level 2 categorization, first use the wildcard * to search for all records that are categorized with "Level1" category, and then exclude records categorized with "Level2" category, as in the following example:

tamr__Level1:* AND NOT tamr__Level2:*

To exclude records with a specified category, select all records using the wildcard *, and then exclude the category, specifying the categoryId, as in the following example:

* AND NOT suggestedCategorization.categoryId: 852

For more information, see the following topics in the Elasticsearch Reference:

Columns With Spaces

To search for an attribute that has spaces in the name, escape each space with a \.

tamr__Column\ With\ Spaces:value

Escaping Reserved Characters

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /.

To run searches for items that contain reserved characters, escape them with a backslash \.

tamr__vendor_formula:\(1\+1\)\=2

Advanced Metadata Search

If you categorize and add your own labels to records, Tamr Core attaches metadata to these records. Mastering and categorization projects have their own lists of metadata that become associated with records or record pairs.

Searching in the Categorization Projects

You can search metadata that Tamr adds to categorized records.

To search any of the metadata:

Use the form manualCategorization.[metadata]: <searchTerm>, where [metadata] is any of the metadata from the following list. You can choose to replace manualCategorization with suggestedCategorization.

Tamr Core adds the following metadata to categorized records:

  • categoryId. manualCategorization or suggestedCategorization. Data type: Long. The unique ID of the category that will be used in searches.
  • reason. manualCategorization. Data type: String. The description that users can add when categorizing records. You can search these descriptions by keyword, exact phrase, or a REGEX match.
  • score. suggestedCategorization. Data type: Long. The confidence score that the machine learning model associates with the suggested categorization. The score range is [0,1].
  • timestamp. manualCategorization or suggestedCategorization. Data type: Int. The timestamp when the record was categorized.
  • username. manualCategorization. Data type: String. The username of the user who created the manual categorization.

Searching in Mastering Projects

To search for records on the Clusters tab of a Mastering project, use the following metadata. For more information, see Searching Cluster Records.

  • Cluster Id. Use this variable to find records in a newly formed cluster (that is, a cluster that does not yet have a persistent cluster Id). For example, cluster.id.raw:"<cluster-id>".
  • Published Cluster Id. Use this variable to find records associated with a persistent cluster Id. For example, publishedCluster.id.raw:"<cluster-id>".

For example, to search for clusters that contain more than 40 records, you can use this syntax:

cluster.recordCount:(>40)