The exponential growth of data is putting a strain on traditional business intelligence and analytic solutions. New strategies to manage this information and leverage greater business value are top of mind with both business users and IT management.
In this whitepaper, Shawn Rogers, VP of Research, Enterprise Management Associates (EMA), will review the elements of a successful Big Data strategy such as self-service analytics, in-memory data processing and augmenting corporate data. In addition, he will profile Yahoo!’s approach to large amounts of advertising data in a real-world, interactive example.
We've also pulled out the first several pages of the whitepaper for you to read. Download the PDF on the right to read the rest.
The exponential growth of data is putting a strain on traditional Business Intelligence (BI) and analytic solutions. New strategies to manage this information and leverage greater business value are top of mind with both business users and IT management. Supporting the needs of self-service users and connecting to a wider assortment of data adds hurdles to achieving a stronger BI environment, but are critical for success. This ENTERPRISE MANAGEMENT ASSOCIATES® (EMA™) report defines Big Data, identifies the necessary components to better manage it, and helps the user understand how to create BI value through Big Data.
Understanding Big Data
Over the past decade data growth in the enterprise has expanded at an exponential rate. The data that companies manage has quickly evolved from gigabytes to terabytes to petabytes with no limits in sight. In 2005, WinterCorp, a leading expert in large database management issues, identified the world’s first 100-terabyte data warehouse. It came as no surprise that an Internet company, Yahoo had set this new standard. Only six years later Wal-Mart, the worlds largest retailer, is logging one million customer transactions per day feeding information into databases estimated at 2.5 petabytes in size. The Large Hadron Collider at CERN the European Organization for Nuclear Research can generate 40 terabytes every second during experiments and Boeing jet engines produce ten terabytes of information every 30 minutes of operation.4 A four-engine jumbo jet can create 640 terabytes of data on just one Atlantic crossing. We have officially entered the Big Data era of computing.
The term Big Data has yet to be universally defined, but generally speaking Big Data represents data sets that can no longer be easily managed with traditional or common data management tools and methods. The velocity and scale of Big Data brings new challenges to access, search, integration, discovery and exploration, reporting and system maintenance. This new age brings with it new data sources that are adding stress to existing infrastructure. This is especially true of real-time data and unstructured data. Sensor and machine data are leading the way in overall data growth as highlighted in the Boeing jet example above. Internet commerce sites generate massive data tracking consumer behavior in ways that was not possible in an earlier age. Social networking data is a new and highly valuable source of information for the enterprise, but again its sheer volume and speed make it difficult to utilize. The Micro-blogging site Twitter serves over 150 million users who produce 90 million “tweets” per day.
That’s 800 “tweets” per second. Each of these “tweets” is approximately 200 bytes in size. On an average day this traffic equals 12 gigabytes and throughout the Twitter ecosystem the company produces a total of eight terabytes of data per day. In comparison, the NYSE produces just one terabyte per day.5 Social data brings with it new data types such as location, behavioral, sentiment, social graph and rich media data that all represent new value to the enterprise. Big Data is fueling new value in analytics and business intelligence.
Managing Big Data
Having more data should foster better decision-making. It should increase accuracy and insight. In years past BI professionals had to forego speed and deep insight because they were forced to archive or eliminate historical data due to cost considerations or limitations of the their BI tools and systems. Selecting too much data for a query could relegate you to the production queue where your job could wait for hours before the data gatekeepers allowed your large report to run. Or perhaps worse, you could bring your systems to a grinding halt by attempting to include Big Data in your decision
processes. These restrictions curtailed the growth of pervasive BI and forced end users to settle for “good enough” business intelligence. It also injected a control point or barrier to data that was often governed by IT. This created an adversarial relationship that negatively affected the impact of BI within the enterprise.
Many companies are no longer tolerating these types of physical or cultural restraints to their mission critical BI environments. They have issued a mandate to corporate IT and the vendor community demanding innovation and new strategies to overcome these issues. This innovation will require new architectures and functionality along with a need to address Big Data with tools that are specifically designed to leverage it.
Necessary components to leveraging Big Data for adding value to enterprise BI include:
- Self-Service BI is key to growing a successful BI community. It is a paradigm that can only be driven with tools that bridge the gap between IT and the business user. Fluid, powerful and easyto-operate User Interfaces (UIs) that support both exploration and presentation of information are key elements to enabling more business end users to embrace BI and to making better and stronger decisions. The most successful companies will add a culture of meritocracy that creates an environment where utilization of these tools rewards employees for partnering between IT and business owners to run the company better.
- In-Memory Analytics is a technical advancement that adds speed to analytics by accessing data that has been loaded into RAM memory. Storing data in-memory makes it easier for end users to slice and dice data and supports exploration at a faster level without the limitations often imposed by multidimensional cubes. This opens the door to analyze greater amounts of data in real-time or in an ad-hoc fashion. Eliminating the fixed Input/Output (I/O) speed of traditional disk drives creates instant value and faster decisions. Based on the specific computing environment and analytic needs of the end users the sheer volume of Big Data can create limitations for some in-memory environments. The ability to directly connect to Big Data can add flexibility, speed and support larger and more advanced workloads. The opportunity to choose from in-memory or direct data access will help users solve significant Big Data challenges.
- Wide Data Access supports the needs of self-service BI users who can’t or don’t want to get involved with the intricate aspects of data integration and ETL. This type of feature empowers the business user and relieves the IT group of unnecessary chores. End users look for opportunities to connect system information with spreadsheets on the desktop to do analysis – innovative BI platforms need to support this ability to federate data from multiple sources in an easy to understand manner.