User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Working with Geospatial Data

You can work with geospatial data arriving from multiple sources and run the following projects:

📘

Support for geospatial data is in a beta release

Geospatial functions are in a beta release. While they are supported in the API, they are new and may present a different performance and stability profile compared with other Tamr features. Loading the geospatial data is possible only through the APIs. Golden records and the preview in Transformations are not supported for datasets containing geospatial data.

After adding the datasets with the geospatial data, you can work with pairs, find matches and duplicates, and run transformations. You can then put records in clusters based on information extracted from geospatial data, and create a categorization project to align records with an existing taxonomy that you might have in place.

Data Formats for Geospatial Data

Tamr APIs allow you to work with geospatial attributes represented as GeoJSON (RFC7946). Similarly, you can export data from Tamr in the GeoJSON format. See Geospatial Data Types.

Geospatial Coordinate Systems

Tamr supports the WGS84 coordinate system. If the input data uses another coordinate system, such as Universal Transverse Mercator (UTM), convert it to WGS84 before loading it into Tamr.

Pair Matching for Geospatial Data

You can use similarity metrics on geospatial data. These metrics help you determine whether a pair of geographic objects represents the same real world entity. The metrics are of two types: one calculates the absolute distance between objects and the other measures relative geometric similarity between the objects. Both of these metrics rely on the concept of Hausdorff Distance.

Hausdorff distance is the maximum distance from a set to the nearest point in the other set. The closer two geometry objects are based on the Hausdorff distance, the more likely it is that they are similar. 

When creating pairs for matching geospatial type attributes, you can select these metric types from the drop-down menu on the Pairs Generation page:

  • Hausdorff Distance measures how far two objects are from each other, within a metric space. This metric represents the absolute Hausdorff distance in meters between two geometries.
  • Relative Hausdorff measures the ratio of the Hausdorff distance between two objects divided by the minimum of their diameters. This metric represents the degree of similarity between the two objects. It is useful when you need to determine possible similarity between two geographic objects that have different scale or sizes, such as small or large buildings. Possible values are between 0 and 1. Identical objects, such as objects of the same size that completely overlap have the Relative Hausdorff value equal to 1.0. Use the Relative Hausdoff metric for lineStrings and polygons. Do not use it for attributes with the point geospatial type.

Once you specify the metric, Tamr generates record pairs with similarity above a specified threshold. Tamr uses 64bit double precision for its calculations on geospatial data.

Comparisons, Statements and Transformation Functions on Geospatial Data

You can use the following comparison expressions, statements and transformation functions on your geospatial records:

  • Comparison Expressions on Records in Geospatial Data Types
  • Supported Statements
  • GIS Functions.

Comparison Expressions on Records in Geospatial Data Types

You can compare geospatial records with matching geospatial data types, such as when both records are of geospatial type point.

You can run the following comparison operations in Tamr on geospatial data:

  • EQUALITY
  • IS (NOT) NULL/EMPTY
  • hash
  • Conversion functions

Supported Statements

Tamr supports running the following statements on records with geospatial data types:

NOTE: In this list, geo* denotes one of the geospatial data types supported in Tamr.

SELECT <geo*> as <myColumn>
GROUP BY <geo*>
GROUP min(<geo*>) as min, max(<geo*>) as max
GROUP top(<geo*>, 5) as <value> by …
WINDOW for all the the GROUP BY cases
WINDOW … ORDER BY <geo*> RANGE/ROWS …
MERGE BY <geo*>
FILTER <geo*> IS (NOT) NULL/EMPTY
ORDER BY <geo*> ASC/DESC NULLS FIRST/LAST
LEFT/RIGHT/OUTER/INNER JOIN WITH <myTable> ON <geo*> = table.<geo*>
PARTITION BY <geo*>