Cloudera Impala Integration: Fast Analytics for Hadoop
Last week, I blogged about the Big Data announcements we made in conjunction with the Strata-Hadoopworld conference. There was one additional highlight that didn't make my blog post because the technology had not been announced yet at the conference. It is actually a very significant announcement for customers using Tableau on top of Cloudera's hadoop distribution (CDH). At the conference, Cloudera announced a new hadoop technology called Impala, which is a real-time query processing engine for hadoop. Tableau was chosen as one of its first partners to integrate with Cloudera Impala and we previewed an early version of the Impala connector at the conference.
Cloudera Impala was developed in response to one of the biggest complaints of using Hadoop (and Hive) for analytics: latency. While hadoop was great for batch-oriented type of workloads that churned through massive volumes of data, it did not provide a fast and interactive experience for users doing ad-hoc analytics. Even very simple queries could take 20 seconds or more because of the Map-Reduce overhead occurring behind-the-scenes.
Impala by-passes the Map-Reduce layer in hadoop. In their internal tests, Cloudera has reported that Impala is anywhere from 3x-90x faster than Hive depending on the type of query and workload. This should provide significant performance gains over Tableau's existing Hive connectivity.
Customers interested in trying out a preview version of the Impala connector should contact their Tableau account manager for details about how to participate in the early access program. For more information about Impala, I encourage you to read Cloudera's blog post that describes the technology in more detail.
PS - many thanks to Franz Funk for creating the dashboard above that we showed at the Strata-Hadoopworld conference!
Learn more about the powerful combination of Tableau, Hadoop, and Impala
Want to learn more about how Tableau and Hadoop can work together? To find more case studies, user stories, news and info, visit our Hadoop resource page.
관련 스토리
블로그 구독
Tableau에서는 날마다 데이터, 분석 및 비주얼리제이션에 관한 흥미로운 소식을 찾고 있습니다. 블로그에서 그러한 소식을 공유하는 것은 사람들이 자신의 데이터를 보고 이해할 수 있도록 지원한다는 Tableau의 사명에서 매우 중요한 부분입니다. Tableau를 더 효과적으로 사용하는 요령부터 사람들이 일상에서 데이터 관련 문제를 어떻게 해결하는지 모두 찾아볼 수 있는 Tableau 블로그는 데이터 애호가들을 위한 곳입니다.