New MapR Connector Provides Enterprise-Class Hadoop Support

Hola Tableau followers! Today Tableau is announcing a connector to MapR's Hadoop distribution, adding onto the Hadoop integration we started last year. Before I get into that, let me tell you about myself and talk a bit about why our customers are interested in Big Data.

Hola Tableau followers! Today Tableau is announcing a connector to MapR's Hadoop distribution, adding onto the Hadoop integration we started last year. Before I get into that, let me tell you about myself and talk a bit about why our customers are interested in Big Data.

My name is Ted Wasserman and I work on the product management team at Tableau. I am responsible for all “data” related areas of the product, from expanding our integration with existing database vendors, to looking at new technologies we would like to support in the future, and more. I will be a more frequent contributor on this blog to let you know about some exciting new projects we’re working on as well as sharing my thoughts on interesting technologies and trends I’m seeing in the field.

Hadoop and Big Data

Today, let’s talk about Big Data and Hadoop. There’s certainly a lot of marketplace buzz about it. I’ve found that depending on who you talk to, everyone seems to have a different definition of what Big Data means or how it applies to them. When I think of Big Data, I like using Gartner’s definition, sometimes called the “3 V’s”: Volume, Variety, and Velocity. I find that many people only talk about the first dimension (volume). The amount of data available to analyze and process today is enormous; however, that has always been a relative problem of the times. One Terabyte of data seemed unfathomable 10 years ago; today the hard disk on my laptop has more than that! So, while data volumes are certainly much larger, the other dimensions capture the essence of what differentiates this topic from the past. Velocity refers to the rate at which the data is being added, while Variety looks at the type and structure of data being captured. New innovations and approaches are needed to handle the intersection of these three dimensions.

Tableau knows how to work with Big Data. We’ve continually invested in supporting new data sources that our customers use to store, manage, and analyze their data. If you’ve been following Tableau’s activity in this space, you’ll recall that we made a significant investment in Hadoop last year with a connector for Cloudera’s CDH distribution. Since then, we’ve continued to move forward and have been working with the team at MapR to build a connector to MapR’s Hadoop distribution.

Why MapR?

Our customers are looking to build scalable, highly available, high performing systems to support their business. In talking to our customers during their evaluation/implementation of MapR, they indicated that they selected the MapR Hadoop stack because it had many of the enterprise-class capabilities they needed.

For example, the MapR storage services layer supports the Hadoop FileSystem API and the NFS interfaces, allowing a customer to choose the distributed filesystem that best meets their needs. Some of the high availability features were also appealing. For example, snapshots enable point-in-time recovery, while mirroring makes data protection more robust.

How To Get It

Similar to the Cloudera connector, Tableau connects to the MapR distribution through the Hive layer, using MapR’s ODBC driver. The MapR connector is now available in 7.0.7, available today, so upgrade your version of Tableau Desktop to get access to it.

I’m very interested in hearing about what you are doing with Hadoop. What types of data and projects are you using it for? What factors went into choosing Hadoop vs. another technology (e.g. relational database)? How do you envision its long term place in your company’s infrastructure? I'm (twasserman at tableausoftware.com) and would love to hear from you.