How to Find Patterns and Anomalies Using Spatial Data Distributions

Six maps of the Boston area, each telling a different story. Clockwise, from top left: Where (too much data); where (base map options for context); by ZIP code; density/heatmap; by neighborhood; by Census tract; by Hexbin

Each of the six maps tell a slightly different story about the distribution. Clockwise, from top left: Where (too much data); where (base map options for context); by ZIP code; density/heatmap; by neighborhood; by Census tract; by Hexbin

Looking at the location of data points on a map is the fundamental element in understanding spatial data. If you don’t know where it is, you don’t have a map! There are a lot of ways to explore spatial data distributions in Tableau. Let’s explore how they differ in helping us find patterns in our data—and how they can help us find problems in the underlying data set.

About the data: 311 service requests

I used a data set of 311 service requests from the open data portal for the city of Boston, Massachusetts for this post. Citizens can submit non-emergency 311 requests to enlist city service support, such as fixing potholes or cleaning up graffiti.

  • latitude and longitude to map requests at specific point locations, 
  • ZIP codes to map to polygons with geocoding using Tableau’s built-in geographic roles
  • other attributes (such as neighborhood name, police district, city council district, etc.) to match up with external spatial files for mapping

Let’s examine how we can use these attributes to uncover patterns—and anomalies—in our data.

Show me all the data

I always start by dropping all the data onto a map to see what I can see. Is there anything interesting in the distribution? Clusters with more data? Areas with sparse data?  

To look at all the data points for this data set, just add latitude and longitude, along with the Case Enquiry ID (a unique identifier), into the worksheet. As quick as that, we have a map. But, it’s hard to see patterns because there’s so much data! In fact, it’s about 230,000 data points—which look like a big blue blob. We’ll need to explore a bit more to uncover any patterns…

Tableau mapping of longitude, latitude, and case enquiry IDs from the city of Boston appears as a big blue blob.

Sometimes there’s just too much data, as illustrated by this mapping of Boston’s 311 service requests. We’ll need to explore further to uncover patterns and insights.

We can zoom in to see more detail and understand localized patterns. Adjust the built-in base maps in Tableau to customize the map’s context and understand the local situation for our data: use one of the built-in map Background Map styles or customize the layers (streets, points of interest, city names, etc.).

Adjust the built-in map styles in Tableau to dig deeper into your data set and optimize your view.

Here are a few examples of the built-in base map styles in Tableau:

Three side-by-side views of Tableau map styles: streets (blue dots on street map), light (blue dots on grey street grid), and satelite (aerial view of street grid)

Sample built-in base map styles in Tableau, from left to right: Streets, light, satellite

Aggregate the data 

If the raw data doesn’t provide a clear view (like the big blue blob above), or if you have specific groupings of data that are important for your analytic questions (such as, how many requests came in for each ZIP code?), you’ll want to look at aggregation. Tableau loves aggregation, so it’s easy to group point locations to make more simplified versions of your map for analysis.  

However, before you decide you understand the true patterns in your data, aggregate your data in multiple ways. Why? 

  • Most regions of interest (state, ZIP code, police district, etc.) are different shapes and sizes. If you’re counting up points inside them, remember that larger areas generally contain more points—and a map that simply shows that big places have more stuff isn’t very interesting.
  • Data is rarely clean. You may encounter incorrect attributes (like a mistyped ZIP code) or errors in location.

Let’s explore aggregation methods with this data set to see this in action.

The 311 data includes a ZIP code attribute, which makes it easy to map in Tableau using the built-in geographic roles. Just drop the Location ZIP Code attribute on the map and drop the Count of records on color. Done! 

Hmm… not quite. Look out for these common gotchas:

Tableau mapping of longitude and latitude by ZIP code in the Boston area where the highest value (61,070) isn’t visible on the map because it represents records with null ZIP codes

Map of 311 requests by ZIP code. There are 61,070 records with null ZIP codes, which skews the values in legend.

Always check for null values and confirm that your map and legend match up, or it will be hard to visually see the true pattern! In this example, once we filter out the null records, we see a truer picture of how the data is distributed by ZIP code:

Tableau mapping of longitude and latitude of 311 requests by ZIP code in the Boston area where density of requests is represented by blue

Map of 311 requests by ZIP code. With the null ZIP codes filtered out, we see a clearer pattern of distribution of 311 requests.

But, what if your data doesn’t align with one of the built-in Tableau Geocoding Roles? Easily add other spatial data to define your areas of interest by setting up a join based on an attribute or by geography.  

To see our 311 points aggregated into neighborhoods, we can use a spatial file from the Boston Open Data portal for neighborhoods and set up a join or relationship based on the neighborhood name.

Tableau hover state: Relationship: 311 data to Boston_Neighborhoods.geojson; Cadinality: Many to Many; Related fields: Neighborhood = Name

The neighborhood name field can be used to set up a relationship between the 311 data set and a spatial file with boundaries of Boston neighborhoods, allowing for easy mapping of 311 requests within neighborhood boundaries.

Now we can quickly drop the neighborhood geometry onto a map and use the Count of records on color to see that the Dorchester neighborhood has the most 311 requests (41,024). This also looks like the largest neighborhood in the city, so we might expect it to have a high count because, if you recall, bigger things tend to hold more stuff.

Tableau mapping of 311 requests in Boston by neighborhood with Dorchester selected (41,024 requests)

Map of 311 requests by neighborhood.

If there isn’t a nice attribute that we can match up by name, we can always use a spatial intersection join to match up the points in our data set to locations in another data set, just like this:

311 data made of 2 tables: 311 data and Census_2010_Tracts

The 311 requests and US Census tracts can be joined by location using the Intersects join type—based on the latitude and longitude for the 311 requests and the polygon geometry for the Census tracts.

Here, I match our 311 records to a set of spatial data for US Census Tracts from the Census Cartographic Boundary Files. I use the MakePoint() calculation with the latitude and longitude from the 311 data source to create a point geometry to find the intersections between these points and the US Census Tract polygons. 

But, wait, something weird happens when I map the data! This looks nothing like the earlier distribution.

Tableau mapping of 311 requests in Boston by census tract with one tract showing 68,357 requests

Map of 311 requests by Census tract. Because this is based on a spatial intersection join we can catch a location anomaly in the data set when we view the distribution—there are an unexpected 68,357 data points in one tract!

What’s going on? In the earlier maps we looked for matches of named locations, and in this map we see a spatial match for location. This tells us that there are a ton of points that fall inside one specific ZIP code…which might be an anomaly in our data.  

Another way to look for glitches in our data is using the density mark type to see relative counts of points in a “heat map” view. Here are two versions of the same map: The map on the left matches the map with Census tracts above—one very hot spot with tons of data. The map on the right shows the data distribution with a set of anomalous data points removed.  

Side by side map views in Tableau of density/heat map. Left map shows a single blue point; right map shows larger blue heat map.

Density map for 311 requests. The map on the left shows the density of 311 requests using the entire data set. The map on the right shows the density of requests after filtering out the anomalous data points.

The single “hot spot” on the density map and in the Census map seemed strange. When I zoomed to that hot spot, I saw one point, but when selecting it I realized it was really ~66,000 points stacked on top of each other! That’s odd.  

Tableau mapping of Density / Heatmap showing single blue hotspot and tooltip showing 66,120 items selected

Zoomed in on the density map to find a single, very “hot” hotspot on the map with 66,120 data points all stacked on top of one another.

My guess: This is the default location given to points that don’t otherwise have a defined latitude and longitude. When we remove those points, the data is more realistically distributed. Without looking at multiple map types, we may never have found this issue in our data.

Side by side view of Tableau mapping of Boston, left by Census tract and right by density with blue heat map.

Comparing Census tract and density distribution maps after the anomalous data points have been filtered out.

The takeaway: To truly understand distributions in your data, build multiple spatial data maps. You’ll more deeply explore the patterns and find the hidden glitches that may be lurking in your data!

For a closer look at the data and maps referenced in this post, check out the corresponding Tableau Public workbook.

Subscribe to our blog