Links for EMC's The Human Face of Big Data

Global Themes

Data is giving us new insight into our cultural lives. Is it changing our discussions? Music, dance, and film are a common thread throughout the world. EMC collected tweets about our shared cultural past times using a combination of big data technologies such as Hadoop and Amazon Web Services. In Tableau Desktop, tweets were grouped and visualized. You’ll find tweets about favorite bands, dance teams, and independent films.
Title image source:

Business and Commerce

How we do business around the world varies. Does how we talk about it vary too? Here’s a peek into the daily discussion about the business of business. See if your business conversation matches your country-mates.
Starting with a set of data that EMC pulled from the Twitter “fire hose” of all tweets over a period of time with business-related keywords, this visualization drills into conversations focusing on business, deals and jobs. Certain unrelated keywords, just as “good job” and “deal with”, were excluded. Ellie Fields used Tableau to combine four views of the data into a dashboard: 1. A summary of the number of tweets for each term; 2. A trend over time, using an area chart to show the aggregate number of tweets broken down by keyword; 3. A view of the data on the map, and 4. A detail showing individual tweets. Filters from one view to another allow the user to drill down into the data; for example, clicking on a slice of a pie chart in the map filters the other views on the page. One interesting finding from this data is how ubiquitous business terms are in conversations across society. While the detail view shows many conversations about business, it also includes tweets about politics and relationships.

Health and Wellness

Do we talk about exercise more than we really do it? The Human Face of Big Data looks into the ways Big Data can improve our health and here we get a look at what we all think and say about our exercise habits.
As obesity and diabetes spread throughout the world, discussions about exercise follow. How do those discussions breakdown? EMC collected and stored tweets containing “#exercise” using a combination of Hadoop and Amazon Web Services. In Tableau Desktop, three large groups of tweets were identified – those mentioning weight, run and health. Over the collection period, a spike in discussions about weight was noticed in the early morning hours.

Image credit: istockphoto

Government Affairs

No means of communication is more democratic than Twitter… so it makes sense to use Twitter to see how the world would vote in the Presidential election. Here we get a look at both the macro view by country—and the individual sentiments that determine which lever would get pulled in a global race for the White House.
The US Presidential election is one of the most closely watched political races in the world. EMC collected tweets containing hashtags commonly used by political geeks such as #election, #p2, #tcot, #tlot and #teaparty. By far the largest number of tweets contained mentions of the two candidates in the US Presidential race: Barack Obama and Mitt Romney. In this visualization, we look at tweets about Obama and Romney from around the world. The tweets were collected over a period of 11 days in August and September 2012 and they were processed using a combination of Hadoop and Amazon Web Services.
Title image source:


Big data can tell us a lot about the quality of the world we live in. And through this visualization we can see more about how people all over our world are discussion ways to make their diets more healthy and sustainable.

As the world population grows, we must develop the resources to feed every mouth. In recent days, however, it seems we’re beginning to also pay special care to the food we choose to eat. In the data pulled by EMC, three topics stood out: organic, local and sustainability.
This visualization shows which countries had tweeters discussing each of the three topics as well as the growth in users who contribute to the movement. People have a desire to support their immediate communities, eat foods that are less processed and ensure an agricultural future. Each view is linked, allowing you to see what people are specifically tweeting about in each country, and about which topic.

Safety, Crime & City Life

Here we combine large-scale data sets about crime rates and peacefulness with the very personal comments people make about their safety. The Human Face of Big Data project also takes a look into specific ways big data is being used to fight crime, neighborhood by neighborhood, in major cities.
As the majority of the world moves to large urban centers, how do people communicate about important aspects of city life? How does big data help control crime, ease traffic and monitor pollution?
This visualization takes public data on safety, quality of life and crime around the world and maps each country according to its performance given a selected metric. That data is then crossed with topics that people post about on Twitter. This allows you to see not only global trends regarding safety, but also reveals what people tweeting from that country are talking about.
While there are many aspects to city life, this data homes in on topics that are strictly about crime, safety and the general perception of the population—topics such as crime, safety and assaults.

London Stories


It was to bring the real world to GPS that Israelis Amir Shinar, Uri Levine, and Ehud Shabtai created Waze, a free community-driven application for mobile phones that gathers real-time travel data from users—and continuously updates its maps. Remarkably, it can even learn from a user’s usual travel times and routes and then offer personalized traffic updates and suggested reroutes. Nowhere is traffic more challenging than in the roads around London, as we see in this visualization of traffic—and the online debate it spurs.
Waze is an Israeli startup founded in 2009 that makes one of the most popular free mobile applications for traffic. Downloaded by more than 20 million users worldwide, Waze combines GPS navigation with real-time updates on traffic as reported by users on the road. Users can report heavy traffic, accidents, gas prices, speed traps, or other road hazards. The application officially launched in the United Kingdom in June 2011. What kind of potential is there for Waze in the UK?

Over the period from August 28, through September 6, 2012 EMC collected tweets about traffic and identified the country of the Twitter user based on their reported location. Lori Williams used Tableau to identify tweets from UK users who tweeted about traffic in cities around the United Kingdom. In this visualization, see the driver’s tweets and watch a demo of Waze version 3.2.
Title image credit: David Townsend, Flickr User highwaysagency


An estimated 700,000 lives are claimed globally by counterfeit drugs. It’s a $75-billion-per-year business, according to Fast Company, and the problem is global: 25 percent of all drugs in the developing world, and an estimated 10 percent worldwide, are counterfeit.
That’s why Bright Simons founded mPedigree, a nonprofit that aims to make drug counterfeiting tougher by making it easy for patients to find out instantly if the medicine they are about to purchase is real or fake.

Where Mosquitos
Malaria is a disease that kills thousands every year, particularly children or those in rural areas. In August, however, scientists from the University of Cape Town announced they may have developed a pill that can effectively eliminate malaria. Media organizations around the world reported the exciting news, but word on Twitter also spread like wildfire.
This visualization let’s you explore the reactions of both outlets, with links to various news articles and an interactive trio of views underneath that contain Twitter data. You can select a country or a topic to see relevant tweets related to malaria, the announcement of the cure and what potential effects this could have in the world.

Singapore Stories

Japan Recovers

Soon after reactors at the Fukushima Daiichi Nuclear Power Plant began to melt down, a group of three techie friends—one Japanese, living in Boston; one Dutch, living in Japan; and one American, living in LA—started looking for information about the radiation exposure in Japan. They could not find any so they took it upon themselves to help solve the problem. Within a week, they launched the website, where people could upload radiation levels and map them. Here we see a marriage of statistical data about radiation levels and very human discussions of the impacts and issues around it.
Safecast collects and shares radiation measurements to empower people with data about their environments. EMC pulled tweets from the Twitter “fire hose” from hashtags and keywords related to the March 2011 tsunami and Fukushima nuclear accident, and stored this Twitter data in a combination of Hadoop and Amazon Web Services. In Tableau, Brett Sheppard combined the Safecast sensor measurements with exploratory analytics of the millions of tweets using database extracts EMC sent Tableau via Aspera high-speed file transfer together with a direct database connection for incremental updates. This combination of social media and sensor data highlights individual stories of how citizens and residents of Japan are overcoming this tragedy.


The Integrated Marine Observing System (IMOS) is a network of sensor floats, underwater autonomous vehicles, animal tags, scientific monitoring stations, and remote satellite sensing. Designed and deployed by Australian scientists, IMOS monitors nearly a third of the world's oceans, gathering real-time information about factors like salinity and water temperature. Here we see discussions of the project and a look at some of the data about the oceans themselves.
The Integrated Marine is a fascinating story of big data in Australia’s oceans. The world conversation about oceans and marine life is strongly tied to climate change and, as a result, politics. In this visualization we started with data provided by EMC with keywords including Marine and Ocean, but excluding unrelated terms like “Frank Ocean” and “royal marine.” In Tableau, Ellie Fields created a time trend of data and a detail view of tweets ranked by the number of times they were tweeted. A color scheme shows how often each tweet, often a retweet itself, was then retweeted, creating a tidal wave for some comments. The top tweet, “to this day, 95% of the world’s oceans remain undiscovered” speaks to the importance of IMOS’s mission to collect and share information about the ocean.

Singapore Taxis

A study of taxicabs in Singapore made some interesting findings: Taxi drivers stopped driving when it rained, because the financial impact of a car accident far outweighed any upside from the fares they could get. Singaporeans have a viewpoint on this—and other topics, and you can see here how they think about their traffic…and their taxis.
Singaporeans talk a lot about traffic. Some would go far to say they ‘complain,’ but do they have a good reason to? Taxis are an integral form of public transpiration, and it’s important to be able to get one at an affordable price. At the same time, there are also stories of people being hurt by complaints. Is there a balance?

This set of data collected by EMC shows the different facets of the discussion, including cabs, taxis, traffic and more. Different views show the popularity of each topic and at what time, as well as specifically what the tweets are about. Each topic is also linked to a story revolving around that topic, whether it’s the price increase of taxi fares or the fatal crash of a Ferrari and a taxi. These are topics that affect Singaporeans daily; they are topics that affect all with a human face.

New York Stories

Major League Baseball

No sport is more data centric than baseball. Every element of the game is tracked, from batting average and ERA to fielding percentage. And all over the world, baseball fans who may hate to even balance a checkbook wil engage in heated arguments over arcane statistics about players long-retired from the game. Now Twitter shows us how they feel about the their team and the game from a totally different view.
Despite being relegated as a hobby, advanced baseball analysis has been pushing the limits of applied economics and statistics for decades. The hobbyist baseball analysts have sought to use data to answer every conceivable question about baseball. When given more data, those ‘stat-geeks’ have jumped on the opportunity to draw powerful new insight.
Exploring the possibility of baseball-related insights from Big Data is a natural continuation of the baseball analysis tradition. Is there a correlation between the amount of tweets that reference a team and how many games they are winning? Using Twitter data extracted by EMC, stored in Hadoop, hosted on Amazon Web Services, and combined with Major League Baseball win records from in Tableau Desktop, we can see how well the teams are doing and how their fans are reacting.
Reference: (Baseball’s Particle Accelerator;

Personal Health Monitoring

As everything analog shifts to digital, we can collect a huge amount of data about ourselves. The Quantified Self movement advocates for self-knowledge through numbers that measure such things as how long we sleep or how many stairs we can climb in a day. The rapidly growing movement includes fitness buffs, techno-geeks and patients with chronic conditions who obsessively monitor various personal metrics. This visualization shows us how we talk and think about exercise around the world.
The Quantified Self is a term coined by Gary Wolf and Kevin Kelly to describe the self-tracking movement that they founded. The idea is to use technology and quantitative data to understand personal patterns.
EMC collected tweets containing different things we track such as miles, calories, steps, and minutes from Twitter users around the world. In this visualization, see twitter conversations about tracking from around the globe and watch Gary Wolf talk about the meaning of quantified self data.
Title Image credit: Flickr user sterlic\


Do people vote with their conscience?-- or their pocketbooks. It’s a complicated mix, but here we get a look at one way those worlds’ collide. See the ways financial discussions intermingle with views on political candidates, taken from the grassroots world of the Twitter stream.
You might expect most finance talk to be centered on stock markets, investment banks and private equity firms. You would be right, but as this visualization shows, those discussions are also often framed around politics, especially as the US presidential election looms.
Lots of discussion surrounds GOP-nominee Romney’s days at Bain Capital, how the stock market reflects on President Obama and more. Unsurprisingly, given Romney’s financial background, more tweets about finance mention Romney in some fashion. Each part of the dashboard is connected, allowing you to drill down to see the tweets that mention each presidential candidate in addition to popular financial terms that are also being discussed.