Hey, data is everywhere. You probably know that by now, since it’s kind of hard to overlook it when it’s constantly in the news, a growing professional field, and data skills are increasingly valuable in every job market. However, data isn’t just for big businesses and you don’t have to collect your own data to analyse it. There are tonnes of public data sets out there!
If you’re looking to learn how to analyse data, create data visualisations or just boost your data literacy skills, public data sets are a perfect place to start. Here are some great public data sets you can analyse for free right now. If you need help with putting your findings into form, we also have write-ups on data visualisation blogs to follow and the best data visualisation examples for inspiration.
Curated by: Google
Example data set: "Cupcake" search results
This is one of the widest and most interesting public data sets to analyse. Google’s vast search engine tracks search term data to show us what people are searching for and when. You can explore statistics on search volume for almost any search term since 2004. Enter in any search term, or a handful of search terms, and click the download button to analyse the data outside of the Trends website.
There are a variety of filters to narrow down trends according to location (worldwide or by country), various time ranges, categories or even specific search types (web vs image vs YouTube search results). You can easily see what topics are popular at the moment and what is currently trending on the Trends homepage. Google also highlights several interesting examples of trends with data visuals on that homepage.
Curated by: National Centers for Environmental Information (formerly NOAA)
Example data set: Local Climatological Data (LCD)
If weather and climate science is your thing, you can’t get much more detailed than the National Climatic Data Center. They’ve done a little rebranding, merging the National Oceanic and Atmospheric Administration (NOAA) data centres to become the National Centers for Environmental Information (NCEI).
Here you can find an archive of climate and weather data sets across the US, the largest archive of environmental data in the world. It is a huge resource for all kinds of weather data, including meteorological, oceanic, climate, atmospheric and geophysical data.
Curated by: World Health Organization (WHO)
Example data set: Universal access to reproductive health
As part of their core goal for better health information worldwide, the World Health Organization make their data on global health publicly available through the Global Health Observatory (GHO). The GHO acts as a portal with which to access and analyse health situations and important themes.
The various data sets are organised according to themes such as mortality, health systems, communicable and non-communicable diseases, medicines and vaccines, health risks, and so on. The WHO’s health statistics are to go-to source for global health information and are also used in the work of the US Centers for Disease Control and Prevention.
Curated by: Singaporean government
Example data set: Singapore Residents By Age Group, Ethnic Group And Gender, End June, Annual (2017)
There are actually a lot of great government data websites on the internet. Most of them are incredible wealths of data and information. The US has one of the most known at data.gov, and the UK and Australia also have great corresponding sites. With all of those, and with large population samples, we have a lot of data to access. So why Singapore?
Frankly, Singapore’s government data website is just so visually accessible. The homepage is full of small visualisations telling stories about each data set. Part of data visualisation is making sure that not only does it display information in an accurate and relevant format, but also that it’s appealing to catch interest. Most government data sites are utilitarian and simple, enough to get the data across in an easy-to-understand way. Singapore, however, brightens it up with colourful visualisations, splashes of colour in the graphs and a “Similar Datasets” section at the bottom of every data set to encourage readers to explore.
Curated by: NASA
Example data set: Atmospheric Electricity (Lightning)
Earthdata is part of NASA’s Earth Science Data Systems Program, specifically the Earth Observing System Data and Information System (EOSDIS). EOSDIS acts as a means to process and distribute Earth science data from Earth observation satellites, aircraft and field measurements.
Via Earthdata, the public can access NASA’s data, news and event information. It covers data from Earth’s atmosphere, solar radiance, the cryosphere (arctic/frozen areas), the ocean, land surface (gravity, geomagnetism, tectonics) and human environments.
Curated by: Amazon
Example data set: 1000 Genome Project
As more organisations make their data available for public access, Amazon has created a registry to find and share those various data sets. There are over 50 public data sets supported through Amazon’s registry, ranging from IRS filings to NASA satellite imagery to DNA sequencing to web crawling. The data sets also include usage examples, showing what other organisations and groups have done with the data.
7. Pew Internet
Curated by: Pew Research Center
Example data set: Teens, Social Media & Technology 2018
The Pew Research Center’s mission is to collect and analyse data from all over the world. They cover all sorts of topics like politics, social media, journalism, the economy, online privacy, religion and demographic trends. While they do their own non-partisan, non-advocacy research and analysis, they also offer their raw data for public access. Access simply requires a brief registration on the site and credit to Pew Research Center as the source of the data, with a waiver that Pew is not responsible for alternative data conclusions.
In a way, making data accessible is also another research project for Pew. They already have all the information about how they use the data in their research and they are interested in learning how others use their data as well. They have one request – to contact them by email if anything is published as a result of the data acquired.