User GuidesAPI ReferenceRelease Notes
Doc HomeHelp CenterLog In
User Guides

Working with Geospatial Data

Match pairs of geospatial records using geospatial similarity metrics, and put matched records into clusters. Configure tile servers to view records on the map.

📘

Support for geospatial features is in a beta release.

While geospatial features are supported in the API, they are new and may present a different performance and stability profile compared with other Tamr features. Adding geospatial data to Tamr is possible through internal APIs. Golden records and the preview in Transformations are not supported for datasets containing geospatial data.

Adding Datasets with Geospatial Data

To load geospatial data, contact Tamr Support. Importing datasets requires using internal Tamr APIs. Tamr parses the data in the GeoJSON format and creates records in geospatial data types.

What Can You Do with Geospatial Data in Tamr?

After adding datasets with geospatial data, you can:

  • Configure Tamr to use an OSM or WMTS tileserver. Note that public servers such as Openstreet Map and ThunderForest require that you abide by their terms of use. You can then view geospatial record pairs, clusters, and shapes, such as polygons, on the Leaflet-based map. See Configuring Geospatial Map Tiles.
  • Use a Leaflet-based map on Pairs and Clusters pages in Tamr. If you configure two or more tile servers, you can switch between them and use different maps for pair matching and clustering. You can zoom and pan on the map to refetch geospatial data as the map adjusts interactively. See Selecting a Tile Server.
  • Match pairs of records that contain geospatial data. On the Schema Mapping page, you can configure pair similarity metrics, such as Hausdorff, Relative Hausdorff, and Directional Hausdorff Distances. You can then view record pairs on the map, along with their similarity metrics and location. See Similarity Metrics for Geospatial Data.
  • Put matched records into clusters based on features extracted from geospatial data and eliminate duplicates. On the Clusters page, view a cluster of records on the map, and configure Tamr to display records that are adjacent to a specific cluster of geospatial records. See Working with Pairs and Clusters of Geospatial Records.
  • Use geospatial records in unified datasets.
  • Align records containing geospatial data with existing taxonomies.
  • Run geospatial transformations on input records, such as constructing records of geospatial data types from their geographic coordinates. For information, see GIS Functions.
  • Run geospatial-boundary searches on clusters of geospatial records.

Geospatial Data Formats

Tamr APIs allow you to work with geospatial attributes represented as GeoJSON (RFC7946). Similarly, you can export data from Tamr in the GeoJSON format. See Geospatial Data Types.
Tamr uses 64-bit double-precision floating-point format for its calculations on geospatial data.

Geospatial Coordinate Systems

Tamr supports the WGS84 coordinate system. If the input data uses another coordinate system, such as Universal Transverse Mercator (UTM), convert it to WGS84 before loading it into Tamr.

Similarity Metrics for Geospatial Data

You can use similarity metrics on geospatial data. These metrics help you determine whether a pair of geographic objects represents the same real world entity.

The metrics rely on the concept of Hausdorff Distance. Hausdorff distance is the maximum distance from a set to the nearest point in the other set. The closer two geometry objects are based on the Hausdorff distance, the more likely it is that they are similar, both in shape and in location. 

When creating pairs for matching geospatial type attributes, you can select these metric types:

  • Hausdorff Distance measures how far two objects are away from each other, within a metric space. This metric represents the absolute Hausdorff distance in meters between two geometric objects.
  • Relative Hausdorff represents the degree of similarity between two objects. It is computed by dividing the standard Hausdorff distance from A to B by the diameter of the smaller of the two objects (A or B). This quotient is subtracted from one to get the Relative Hausdorff distance (formula below). The Relative Hausdorff distance is bounded by 0 and 1.0, so if the resulting number is less than 0, the Relative Hausdorff distance is set to 0. It is useful when you need to determine possible similarity between two geographic objects that have different scale or sizes, such as small or large buildings. Relative Hausdorff uses true shape diameter for its calculations. Identical objects, such as objects of the same size that completely overlap, have the Relative Hausdorff value equal to 1.0. Use Relative Hausdorff for attributes of type lineString and polygon. Do not use it for attributes with the point geospatial type.
1156
  • Directional Hausdorff similarity metric helps you match contained objects. For example, you can use it to see if a small section of a road is matched against the entire road, or whether a portion of a building matches an entire building.

To specify geospatial similarity metrics for an attribute:

  1. On the Schema Mapping page, select an attribute that you would like to treat as geospatial.
  • If you have more than one such attribute, then before this procedure, you may want to map them to one attribute that you will designate as a geospatial attribute in Tamr. This is because Tamr displays the map for the first geospatial attribute it finds.
  • The attribute that will become a geospatial attribute must have values that represent its geographic coordinates, and be of any of the supported Geospatial Data Types.
  1. On the right side of the screen, click the three vertical docs for this attribute to open its properties.
  2. Activate the Geospatial Attribute toggle, to mark the attribute to Tamr as geospatial, and then choose Advanced.
  3. Select one of the Hausdorff metric types, as shown in the following screenshot.
861

Specify geospatial similarity metrics for a geospatial attribute.

Once you specify the metric, Tamr generates record pairs with similarity satisfying a specified threshold.

Configuring Geospatial Map Tiles

Tamr works with the following tile servers:

To configure a tile server:

  1. Create a YAML file similar to the following example.
  2. Add this file to Tamr using <tamr-home-directory>/tamr/utils/unify-admin.sh config:set --file <path-to-file>/my-config.yaml. For information, see Setting configuration variables.

When creating the YAML file that will describe your tile server configuration, use these tips:

  • name is required. This label for the tile server will appear in the dropdown menu of tile servers from which to choose. See the following animated screenshot to observe the action of choosing a preconfigured tile server from the dropdown menu.
  • urlTemplate URI should include all of the variables for the coordinates to specify the zoom and x,y location or tile location.
  • You can specify options that are specific to the tile server, such as a minimum and maximum zoom or the tile matrix set.

In this example, the first urlTemplate uses the OSM (OpenStreetMap) tile server format. If you are configuring a tile server that uses the Web Map Tile Service (WMTS) protocol instead, specify "wmts": true and provide a URI that conforms to that protocol, as shown in the second urlTemplate.

TAMR_TILE_SERVERS: |
    [
      {
        "name": "openstreetmap_example",
        "urlTemplate": "https://{s}.tile.serverName.org/{z}/{x}/{y}.png",
        "options": {
          "minZoom": 0,
          "maxZoom": 18
        }
      },
      {
        "name": "wmts_example",
        "urlTemplate": "https://tile.serverName.com/{tileMatrixSet}/{tileMatrix}/{tileCol}/{tileRow}.png",
        "wmts": true,
        "options": {
          "tilematrixSet": "GLOBAL_WEBMERCATOR"
        }
      }
    ]

Selecting a Tile Server

After you have configured multiple tile servers, you can select the server you would like to use for the map to display geospatial records in Tamr. You can switch between tile servers.

To select a tile server for the map:

  1. On the Record details side panel, choose the tile server icon.
  2. Select the tile server from the dropdown menu.
429

Use the tile server icon to select the tile server for the map.

The following animated screenshot illustrates how to select a tile server from the dropdown menu on the Pair details right-side panel.

434

Choose a tile server on the Pair details side panel.

Working with Pairs and Clusters of Geospatial Records

Once you configure map tiles, you can explore groups of geospatial records on the Pairs and Clusters pages.

To view record pair details:

  1. On the Pairs page, select a pair of geospatial records.
  2. To display the Pair details side panel for these records with the map, click the blue link in the geospatial attribute's column on the selected pair. In the screenshot, the geospatial attribute's column is titled "geometry". The records display in the Pair details side panel on the map that is powered by the tile server you have configured. Colors distinguish two different records.
1322

The Pair details side panel on the Pairs page shows two geospatial records on the map.

The following screenshot shows two clusters of geospatial data on a single screen on the Clusters page.

1323

The Clusters page shows two clusters of geospatial records.

The following screenshot shows a cluster view of records. In addition, you can use the side panel to view a single geospatial record. In this example, the main screen and the side panel rely on different tile servers.

1311

A cluster view of records with two different tile servers used for the main page and the Record details side panel.

You can indicate whether Tamr should display adjacent records. The following screenshot shows:

  1. The control to display the map.
  2. The toggle to show records that are adjacent to the selected cluster of geospatial records.
  3. The informational message about the limit of displaying up to one thousand clusters, when zooming out.
2043

Callouts show the icon to choose to display the map, the toggle to show adjacent records, and the informational message about the limit for the number of clusters.

The following screenshot shows a cluster of records along with adjacent records. This means you have chosent to display adjacent clusters. Adjacent records that are not part of the cluster are shown in black.

1320

The cluster of records is shown in blue, whereas the adjacent records display in black.

Troubleshooting Tips

Use these tips when working with geospatial records:

  • The attribute of type geospatial must be configured as such, using the Geospatial Attribute toggle. If the attribute is not one of the geospatial types, the map will not display and the following error message will display in place of the map: No geometry features specified.
  • If you marked more than one attribute as geospatial, the map will display for the first attribute, by default.
2844

If a cluster or a pair of records does not have geospatial attributes, an error message displays instead of the map.