Despite all the hubbub and hype around Hadoop, few business intelligence (BI) professionals know much about what Hadoop is, how it does what does, or in which situations they should deploy it. In fact, numerous myths about Hadoop persist among BI professionals and their business counterparts.
Here are some myths that need busting:
- It’s a distributed file system, not a database management system
- It’s also an analytic processor called MapReduce, not just the Hadoop distributed file system
- It depends on MapReduce, Pig, and Hive to express queries and analyses, not standard SQL (although this will change)
- It’s for diverse document and data types, not just for big volumes of structured data
- It can handle real-time data, not just history oriented data
- It’s for analyzing diverse and real-time big data, not just managing big data
- It’s open source software, but also available in enterprise-ready versions from several vendors
- It’s a complement of BI platforms and data warehouses, not a standalone replacement of them
- It’s for insights that a business can get only by analyzing big data, not generic reporting and analytic purposes
It’s no wonder that myths persist. Hadoop and related technologies differ sharply from the standard BI and data warehouse technology stack. Yet TDWI expects Hadoop technologies to become common components of that stack. Based on a soon to be published TDWI Checklist report on Hadoop, Philip Russom will lead a panel discussion that will focus on how to help BI professionals prepare for that eventuality and discuss the myths of Hadoop to reveal its true value for BI.