What’s Better for Big Data Analytics: in Memory or on Disk?

InformationWeek’s 2012 Big Data Survey of 231 business technology pros shows that the biggest concern is speed of accessibility: “Let’s be clear: Disk-based databases, with their high-latency I/O [input/output] bottlenecks, place a severe constraint on how fast your business can move.” The authors add that: “Disk I/O is the weakest link in IT’s efforts to reduce latency in high-speed analytics and transactional applications.” For most organizations, the answer to "which is better?" is both. Lack of an in-memory solution will constrain analysis of massive, slow data sets. But always being forced into an in-memory approach can negate the investment you've made in a fast analytical database.

InformationWeek’s 2012 Big Data Survey of 231 business technology pros shows that the biggest concern is speed of accessibility: “Let’s be clear: Disk-based databases, with their high-latency I/O [input/output] bottlenecks, place a severe constraint on how fast your business can move.” The authors add that: “Disk I/O is the weakest link in IT’s efforts to reduce latency in high-speed analytics and transactional applications.”

For most organizations, the answer to "which is better?" is both. Lack of an in-memory solution will constrain analysis of massive, slow data sets. But always being forced into an in-memory approach can negate the investment you've made in a fast analytical database.


Source: InformationWeek, 2012 Big Data Survey of 231 business technology pros

Live connections are preferable when:

  • You have a fast database
  • You need up-to-the-minute data

In memory is ideal when:

  • Your database is too slow for interactive analytics
  • You need to take load off a transactional database
  • You need to be offline and can't connect to your data live

IN-MEMORY DATA IS BETTER:
WHEN YOUR DATABASE IS TOO SLOW FOR INTERACTIVE ANALYTICS…

If you’re working with live data and it’s too slow for interactive, speed-of-thought analysis, then you may want to bring your data in memory on your local machine. The advantage of working interactively with your data is that you can follow your train of thought and explore new ideas without being constantly slowed down while waiting for queries.

…AND WHEN YOU NEED TO TAKE LOAD OFF A TRANSACTIONAL DATABASE
If a database is the primary workhorse for your transactional systems, you may want to take non-transactional load off it. That includes analytics. Analytical queries can tax a transactional database and slow it down. So bring a set of that data in memory to do fast analytics without compromising the speed of critical business systems.

You don’t have to choose between in memory and live connect. You should be able to switch between in memory and live connection as needed.

THE TABLEAU DATA ENGINE
Tableau combines in memory and live direct connections. The Tableau Data Engine is a high-performing analytics database on your PC that enables ad-hoc analysis in memory of millions of rows of data in seconds.

Databases have found performance benefits by just using the top levels of the memory hierarchy on common laptops and requiring all data to be memory resident. Tableau’s architecture-aware design represents the next generation of in-memory solutions: by using different levels of memory at different times, it lets you take advantage of the computing power on every PC without limiting the size of the data to fit in memory.

Dr. Robin Bloor writes about Tableau's in-memory Data Engine in his whitepaper Analytics at the Speed of Thought: "It alters the way in which BI can be carried out. Because Tableau can now do analytics so swiftly and gives people the choice to connect directly to fast databases or use Tableau’s in-memory Data Engine, it has become much more powerful in respect of data exploration and data discovery. This leads to analytical insights that would most likely have been missed before.”

FLEXIBLE DATA MODEL
With Tableau, combine data from an unlimited number of sources and formats. Blend different relational, semi-structured and raw data in real time, without expensive up-front integration. Users don’t need to know the details of how data is stored to ask and answer questions. Whether your data is in spreadsheets, cubes, databases, a data warehouse, an open-source file system like Hadoop, or all of those, users can quickly connect to the data they need, consolidate it, and interactively explore the blended data. When using in memory schedule automatic refreshes or click to refresh so that your data is up to date regardless of what combination you are using of in memory or live direct connections.

eBay’s data architecture comprises Teradata, Hadoop and Tableau. Explains eBay Analytics Platform Director Kiril Evtimov: “Tableau’s capabilities and ease of use enable eBay’s teams to take a collaborative approach to exploring data – and to making results available seamlessly across the business.” For more on eBay’s data architecture and use of Tableau for big data visualization, see InfoWorld, “Big data visualization: A big deal for eBay”, December 6, 2012.

Already have an in-memory database? Connect it to Tableau using a live direct connection. David Ives, General Manager of Karabina Solutions, explains: “We love the fact that Tableau is open unlike other proprietary in-memory data visualization tools. It means clients are able to quickly compile and analyze large volumes of data from nearly any data source in minutes and then create interactive data visualizations, dashboards and analytics.”