Data Sources

These are free Internet sources for real data and datasets available for public use. Great for use by Statistics teachers, by students, in the classroom, for projects, or for additional exploration. You may care to read some of my comments about these sources first. Please make sure to cite data appropriately. Feel free to email me if you have corrections, suggestions, or kind comments.

Arts and Entertainment
Business
  • Bureau of Labor Statistics – BLS databases, tables, and calculators organized by topic
  • Bureau of Transportation – data regarding planes, trains, and automobiles...
    • Flight Delays – including airport code, flight time, delay time, etc.
  • Fuel Economy – from U.S. Department of Energy and Environmental Protection Agency; car fuel economy data and specifications for all models through 2020.
  • Inside Airbnb – data sourced from Airbnb listings in major world cities on neighborhood locations, listing prices, room types, availability, etc. Interactive dataviz available via the site for each city, neighborhood, etc.
  • PayScale – U.S. Salary Data & Career Research Center (U.S.) searchable database of salary info on job title, degree/major, employer name or type, state, city, years experience, etc. Available for other countries as well.
  • Tax Foundation – non-profit think tank; includes federal and state data on tax rates, revenue, and burdens; interactive tools and calculators; discussion of international taxes as well.
  • World Bank – international data organized by country, topic, and indicator
  • Yahoo Finance – historical stock information including daily, weekly, and monthly prices, open and close prices, and volume.
COVID-19
  • Apple Mobility Data – data and data visualizations for Apple mobility data.
  • CDC COVID Data Tracker – data visualizations for cases, trends, demographics, cases & deaths by county, testing, serology surveillance, emergency visits, mobility, underlying medical conditions, social impact, and school closures.
    • COVID-19 Case Surveillance Public Use Data – includes patient-level deidentified data include demographic characteristics, exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and comorbidities.
  • covdata – R data package from Kieran Healy that collects and bundles datasets related to the COVID-19 pandemic from a variety of sources.
  • COVID Tracking Project – from The Atlantic, data compiled from 56 US states and territories in three main areas: testing, patient outcomes, and, via The COVID Racial Data Tracker, racial and ethnic demographic information. 
  • Google Mobility Data – data and reports for Google mobility data.
  • Gothamist Coronavirus Statistics – summary statistics and visualizations for New York state and city, including different benchmarks for reopening, testing, positivity rate, etc. NYC data available by zip code and borough.
  • Johns Hopkins Coronavirus Resource Center – tracking (including this global map: https://coronavirus.jhu.edu/map.html), testing, tracing, by region (state and country).
  • NYC COVID-19 – data and data visualizations from NYC Department of Health. Cases, hospitalizations, deaths; view by age, sex, race/ethnicity, poverty, and borough. 
  • NY Times COVID-19 Data – An ongoing repository of data on coronavirus cases, deaths, mask use in the U.S. by state. Historical data available.
  • Our World in Data Coronavirus Pandemic (COVID-19) – data and data visualizations for world pandemic data by country, cases, deaths, tests, mortality risks, and policy responses. Linear and log scales available for interactive data visualizations.
  • Rt COVID-19 – interactive data visualization (and data) of the effective reproduction rate of the virus calculated for each locale (state, US region, top 10, etc). 
  • See more health links below.
Education
  • The Chronicle of Higher Education – data on higher ed institutions, students, and staff; some with interactive features
  • College Board – AP, SAT, PSAT/NMSQT data
  • College InSight – higher education data inc. student debt, financial aid, cost of attendance, economic diversity, racial diversity, enrollment, and student success.
  • College Scorecard Data – from the U.S. Department of Education; comprehensive and well-documented data from higher ed institutions for the past 20 years inc. financial aid, earnings info, performance of schools, student outcomes, etc. 
  • See NYC and NYS education data links below
Health, Medicine, and Biological Sciences
  • American Cancer Society – facts and statistics reports give global, national, and state info on cancer inc. cases, deaths, risk factors, prevention, cancer screening, treatment, breast cancer, colorectal cancer, and for African Americans and Hispanic/Latinos. Bonus: Infographics Gallery.
  • Centers for Disease Control and Prevention – CDC data organized by topic; inc. vital statistics, state data, tools, and resources
  • Columbia Prediction of Infectious Diseases – CPID from Columbia University has compiled up-to-date Ebola data for Guinea, Liberia, Sierra Leone (as well as by province) from the World Health Organization and included three interactive predictive models for infection and mortality numbers: an optimized scenario, a no-change scenario, and a degraded scenario. Data is conveniently organized in tabular form. Models and maps for influenza incidence as well.
  • Cornell Bird Data – From Cornell's Lab of Ornithology; crowd-sourced bird data including bird species abundance, distribution, movements, breeding, population trends, and spread of diseases. Additional education guides; info about data collection, bird songs, feathers, and plumages.
  • HealthData.gov – From the U.S. Dept. of Health and Human Services (in beta); data and datasets from various federal and state sources inc. the CDC, NIH, and CMS. Filters for topic, publisher, and format.
    • National Cancer Institute Fast Stats – From the NCI Surveillance, Epidemiology, and End Results (SEER) Program via the NIH, "Fast Stats is an interactive tool for quick access to key SEER and US cancer statistics for major cancer sites by age, sex, race/ethnicity and data type. Statistics are presented as graphs and tables."
  • Nutritionix – Large nutrition database with information on common foods, packaged products, and restaurant items. 
  • Shark Attack Data – shark/human interaction data and graphs compiled from the Shark Research Institute; sortable by country, U.S. state, provocation status, and fatality status; incident details available; 1900 to present. Note: although very interesting, because these data are not in a format that can be easily analyzed without compiling first, these data might not be a good choice for a project.
  • World Health Organization – WHO data; some interactive graphs available
  • USDA National Nutrient Database for Standard Reference – food nutrient information
  • See NYC health data links below
  • See COVID-19 data links above
Miscellaneous
  • infoplease.com – lots of information from this Pearson Education site: everything from a word of the day to calendars to biographies to maps to general summary statistics. Lots of quantitative and qualitative info about business, economy, geography, sports, medicine, science, U.S. history and government, world history, technology, the 50 U.S. States, and more.
New York and Federal
  • Citi Bike – data from NYC Citi Bike inc. trip histories, daily ridership and membership, real-time data, and monthly operating reports.
  • Data.gov – U.S. government open data site for federal, state, and local data; 100,000+ datasets searchable by topic and keyword; resources and tools available for data visualizations
  • Data 2 Go NYC – created by the nonprofit Measure of America with funding from The Leona M. and Harry B. Helmsley Charitable Trust. Data and data visualizations by NYC community district for various indicators of demographics; education; environment; food systems; health; housing & infrastructure; political engagement; public funding & services; safety & security; and work, wealth, & poverty.
  • MTA Ridership – average and total ridership for NYC subway and bus (2012 – 17) by borough, station/route, by time of week, etc.
  • NYC Department of Education – NYCDOE data inc. school reports, graduation rates, and test results
  • NYC Department of Heath and Mental Hygiene – inc. vital statistics, HIV/AIDS data
  • New York State Education Department – NYSED public data inc. enrollment, teacher performance review ratings, and school report cards
  • NYC Open Data – NYC government open data site; 1300+ data sets, maps, and charts from various New York City agencies
  • New York Police Department – NYPD site with current and historical crime data, traffic data, crime statistics, reports, and information.
  • StreetEasy – monthly housing data regarding rents and sales in each neighborhood
Social Sciences
  • Bureau of Justice Statistics – BJS databases and tables organized by topic inc. courts, corrections, crime, and law enforcement data
  • Census Bureau – US census data tables (inc. those from the American Community Survey) organized by topic and geography; tools, infographics, and some visualizations available.
  • Center for American Women and Politics – from Rutgers University; data on women's participation in federal, state, and local elected offices (including women of color); and voting.
  • Encyclopedia Titanica – data and information from the RMS Titanic and it's 1912 sinking: deckplans, passenger and crew lists; statistics grouped by passenger class, boarding location, nationality, gender, age group, and survival status.
  • Federal Bureau of Investigation – FBI statistics site inc. data on hate crimes, terrorism, white-collar crimes, and campus crimes
  • General Social Survey – GSS data via the University of Chicago; since 1972, data collected regarding American people and society. Explore GSS questions, variables, and publications by subject, year, or keyword; downloadable in various formats.  
  • Gun Violence Archive – a not-for-profit group that provides free online public access to accurate information about gun-related violence in the United States. They claim to check for accuracy and provide some information about gun-related incidents.
  • Pew Research Center – downloadable datasets from nonpartisan public opinion polls, demographics, and social science research; some interactive visualizations available
  • Police Data Initiative – open datasets from various U.S. police departments regarding topics from accidents/crashes to workforce/demographics.
  • United Nations – UN searchable data on a wide variety of topics
  • US Mass Shootings, 1982-2019 – From Mother Jones’ Investigation. Available in multiple formats with variables including date, location, number of fatalities, type of weapon, etc.
  • Voter Turnout Database – From the International Institute for Democracy and Electoral Assistance (International IDEA) in Sweden. Data regarding country, year, number of registered voters, total voter turnout, voting-age population, and other election-related variables. Includes 200+  countries with historical data as far back as 1945. 
Sports
  • Advanced Football Analytics – NFL team and player stats; team, player, salary, and game analysis; game probabilities; playoff projections; stats glossary; calculators, visualization, and other tools; discussions about home field advantage, correlations, football fallacies, and more.
  • Baseball Prospectus – baseball data galore; data organized by offense, pitching, general, team, manager, and splits categories. 1871 to present.
  • Major League Baseball (MLB) – lots o' baseball data inc. pitching, hitting, batting, by team, by player, etc. 1876 to present.
  • Major League Soccer (MLS) – U.S. soccer data inc. team, player, leader, and all-time stats.
  • National Basketball Association – NBA data; team stats, player stats, lineups, etc.
  • National Collegiate Athletic Association (NCAA) – men and women's college sports stats searchable by sport, student-athlete, and team; records data, championship summaries, and championship results.
  • National Football League (NFL) – American football data; player stats, team stats, player info, history, rules, etc.
  • Sports Reference – Links to baseball, pro basketball, pro football, college basketball, college football, hockey, and olympic sports reference sites. Excellent for summary and micro data.
  • USA Today Sports – data for various professional and college sports inc. scores, schedules, player stats, team stats, salaries, odds, standings, and more.
Weather and Physical Sciences
  • Climate Stations – historical temperature and precipitation data for various cities
  • MySound – From UConn; real time weather, water quality, and wave data in the Long Island Sound.
  • National Aeronautics and Space Administration (NASA) – Catalog of aerospace, applied science, earth science, management/operations, and space science data.
  • National Oceanic and Atmospheric Administration – NOAA's National Climate Data Center inc. marine/ocean, satellite, radar, climate, and weather data.
  • Roller Coaster Database – world roller coaster information inc. name, park, location, operating status, number of inversions, speed, height, length, layout shape, manufacturer, and more.
  • US Geological Survey – USGS data searchable by geographic location, keyword, USGS mission area, data source, and scientist.
Data Libraries and Archives – These types of sources contain datasets that may be good for teaching but may not be suitable for projects and assignments for which you wish students to produce work that is original in nature.
  • Census at School Random Sampler – Sampling past student responses to the 40-question U.S. Census at School questionnaire.
  • The Data and Story Library – DASL via Data Desk; data sets with real-world context; search options: by title, by topic, by statistical method, by subject, and full-text.
  • Data is Plural – Google spreadsheet of thoughtfully archived datasets from Jeremy Singer-Vine, many of which are linked to current events and recent topics of interest. The source for many of the above on this page.
  • FiveThirtyEight – Data and code behind the stories at FiveThirtyEight. Well documented; linked to FiveThirtyEight articles and graphics. (Also available on the FiveThirtyEight GitHub page.)
  • Gapminder – compiled data from various sources with many different socioeconomic, health, and environmental indicators; dynamic visualizations à la Hans Rosling available.
  • Google Dataset Search – launched Sept. 2018, in beta. Exactly what it sounds like... a Google search engine for public data.
  • Inter-university Consortium for Political and Social Research – ICPSR data via the University of Michigan; searchable by topic, series, geographical location, and investigator; some online analysis, replication datasets, and learning guides available.
  • Journal of Statistics Education Data Archive – "Datasets and Stories," peer-reviewed, from JSE via ASA; teaching guides and resources; links to associated JSE articles featuring those data.
  • ProPublica – data archives for this independent, nonprofit newsroom that produces investigative journalism in the public interest.
  • Rice Virtual Lab Case Studies – via Rice University; small number of case studies with datasets, analysis, computer output, and teaching guides.
Minor updates made 02/23/2021.