Teradata Geospatial Extensions: a Case Study in User-Defined Types
Many database systems are highly extensible through user-defined types (UDTs), methods (UDMs) and functions (UDFs). Database system vendors often provide UDFs to assist users with migrating from a competitor's DBMS to their own. But UDTs, UDFs and UDMs also provide domain-specific functionality, as seen in Teradata's Geospatial Extensions. Tableau offers a simple, powerful way to exploit these DBMS extensions that can enhance the analytical expressiveness of your data visualizations.
Teradata's implementation of their Geospatial Extensions largely follows the SQL/MM standard. The ST_POINT datatype allows for two floating point values representing coordinates. To Tableau this appears as a binary data string, but Tableau's calculated fields can extract the longitude and latitude components using Teradata's user-defined methods for geospatial types. For example, given an ST_POINT column named AIRPORT_LOC we would create the following calculated fields named "lon" and "lat":
- lon: RAWSQL_REAL("%1.ST_X()", [AIRPORT_LOC])
- lat: RAWSQL_REAL("%1.ST_Y()", [AIRPORT_LOC])
Note that these are automatically recognized by Tableau as their appropriate geographic roles. To demonstrate Teradata’s Geospatial Extensions, we will explore a sample data set taken from twenty years of flight records. Teradata's user-defined method called 'sphericaldistance' performs great-circle calculations between two ST_POINT types. To determine the distance between an origin and destination (assuming a spherical path at sea level), we can create the following calculated field which includes a meters-to-miles conversion:
- distance: RAWSQL_REAL("%1.sphericaldistance(%2)", [ORIGIN_LOC], [DEST_LOC]) * 0.000621371192
With the distance computations we can explore a new aspect of the sample airline data. The view below examines delays by distance, in bins of 500 miles, and reveals an interesting spike in delays for flights between 3000 and 3500 miles. Interestingly, there are rarely weather delays at that distance.
Another quick way to view the flights in this category is a map of origin and destination points, filtered on distance. The business dashboard below helps isolate the delays to a handful of popular destinations, most notably Hawaii.
Tableau's calculated fields also allow us to construct UDTs that we can use in queries to the database. For example, the plot below shows all airports within 100 miles of New York City. The marks are size-encoded by the number of unique destinations served by each airport, and color-encoded by the average severity of delays. Of course, it’s no surprise that the three airports typically associated with NYC have high numbers of unique destinations; note that of the three, JFK has the highest severity of delays.
In order to create this view, we first need a calculated field representing New York City:
- New York City: RAWSQL_STR("NEW ST_POINT(-73.986951,40.756054)")
We can create a distance calculation and place it on the Filters shelf to limit our view to distances of less than 100 miles:
- Distance from NYC: RAWSQL_REAL("%1.sphericaldistance(%2)",[New York City],[ORIGIN_LOC])*0.000621371192
With fairly simple SQL expressions we can extend Tableau's support for new types of data and data analysis on any database system that supports user-defined types, methods and functions.