GoDaddy scales data governance on 13TB of data/day with Tableau, Alation Data Catalog, and Hadoop


Increased governance over 13+ TB of data stored in Hadoop
Reduced reporting errors with IT-certified data sources
Optimized product development for better customer experience

GoDaddy is an international web hosting and internet domain registrar that serves 17 million customers and gathers 13 terabytes (TBs) of data each day, stored and managed in Hadoop. Today with Tableau and Alation, the Enterprise Data team increased control over data accuracy, empowering business units to access curated data sources certified by GoDaddy’s data stewards. Now analysts spend less time searching for accurate data and more time analyzing it. As a result, the team continues to drive product development with data-driven decisions for a better customer experience.

By putting data in our end users’ hands, they then had the ability to quickly pull together their own base-level reporting. These individuals were closest to the products and application changes and could quickly identify where something may need adjustment.

Enabling self-service at scale for 1400+ Tableau users

With hundreds of data sources spread across tens of platforms and 1400+ Tableau users, the team at GoDaddy needed a better way to enable trusted and accessible data to the organization.

“We had multiple replications of that data. We had calculated fields that weren’t documented,” recalls Sharon Graves, Enterprise Data Evangelist and Tableau Server administrator at GoDaddy. “From a business intelligence perspective, we had no real certainty of saying, ‘OK, yes, this is exactly what I need to use to do my reporting.’”

To empower self-service at scale, GoDaddy’s Enterprise Data team adopted Alation’s Data Catalog alongside Tableau, which inventories data sources and provides usage-based business context.

Alation provides a complement to Tableau’s approach to data governance by allowing end users to easily discover data from multiple sources and deeply understand the nuances of that data. Alation does this in a uniquely automated way by crawling and cataloguing an organization’s data, the business semantics related to the data, and the logic of analysis embedded in an organization’s analytic history captured through SQL query logs.

The joint solution from Tableau and Alation, sits on top of a Hadoop footprint at GoDaddy comprising Apache Pig, a scripting platform for processing and analyzing large data sets, Apache Spark, a cluster computing framework, and Apache Hive.

With Alation and Tableau, GoDaddy's Enterprise Data team was able to examine the lineage of a table, search multiple data sources for a field, and increase visibility and control.

With this new data platform, GoDaddy automated many manual processes including creating automated alerts to ensure data loads properly. Also, business rules are uniformly applied at the time of processing.

And that’s not all. “Much of the metadata is captured automatically using machine learning in Alation, scanning query log files and profiling data on GoDaddy’s servers,” says Sharon.

Data stewards use the combination of Tableau and Alation to curate data that is automatically inventoried in the Alation Data Catalog. Using Alation’s curation features, stewards can confirm automatically recommended business semantics in order to capture the context of data. They can propagate tags for data sets that contain PII data or data sets that require management through data usage policies for other compliance considerations. They can also endorse or deprecate data sources in order to certify accuracy for corporate use and guide analysts and power users when they build analyses.

"The creation of the Enterprise Data team along with the implementation of these products gave a home to everything data at Godaddy," explains Sharon. "It helped us solidify rules and management, which has greatly improved our end-user experience."

The creation of the Enterprise Data team along with the implementation of these products gave a home to everything data at Godaddy. It helped us solidify rules and management, which has greatly improved our end-user experience.

Optimized product experience for over 17 million customers

The data enterprise team at GoDaddy deals with an influx of 13 terabytes per day: everything from website traffic metrics to customer purchase histories and internal statistics. Prior to the new solution, Sharon recalls how processes made data access slow, confusing, and frustrating for analysts at GoDaddy.

“Our power users who may not be intimately familiar with the data didn’t know where to find the appropriate data for their analytics, or, if they did know where it was, they were not sure how to use it to meet their needs,” Sharon says

This new platform signaled a shift towards self-service analytics at GoDaddy. “By creating a self-service environment,” Sharon says, “GoDaddy product managers and business users can leverage data to create a better customer experience and find and design the product that will meet their needs by identifying trends and anticipating issues.”

Now that users can find the analytics they need, they spend less time searching for data and more time improving processes, resulting in a better product experience for their 17 million customers.

Product managers and functional teams use Tableau dashboards to identify trends and spot potential issues before they arise. With data at their fingertips, they can dig into website trends and email campaigns to optimize product development.

“By putting data in our end users’ hands, they then had the ability to quickly pull together their own base-level reporting,” Sharon explains. “If we see shoppers dropping out of the process at a given point, we can go in and relook at that flow to see if there’s a better approach. These individuals were closest to the products and application changes and could quickly identify where something may need adjustment."

Reducing costly errors with better data governance

Because previous analyst reporting involved pulling data from SQL, dropping it into a spreadsheet, and then sending out results via email, the Enterprise Data team had little insight into how data was used. Errors were hard to trace and even harder to stamp out.

Now GoDaddy data stewards use Alation to scan thousands of Tableau workbooks, showing top users of each data source. With this knowledge, the Analytics and Business teams can get a better understanding of data usage throughout the company—and they now know who to ping for further insight. This helps reduce large-scale, costly errors.

“With the combination Alation and Tableau, GoDaddy's Enterprise Data team was able to examine the lineage of a table, search multiple data sources for a field, and increase visibility and control,” says Sharon.

In just a few clicks, they can view the popularity of various data sources within Tableau Server, where data is being used and by whom, and can search through hundreds of data sources for a specific field.

To learn more about how Alation works with Tableau to enable governance at scale, check out this webinar and read this whitepaper.

You might also be interested in...