Data Sources

These are free Internet sources for real data and datasets available for public use. Great for use by Statistics teachers, by students, in the classroom, for projects, or for additional exploration. You may care to read some of my comments about these sources first. Please make sure to cite data appropriately. Feel free to email me if you have corrections, suggestions, or kind comments.

Arts and Entertainment
Education
  • The Chronicle of Higher Education – data on higher ed institutions, students, and staff; some with interactive features
  • College Board – AP, SAT, PSAT/NMSQT data
  • College InSight – higher education data inc. student debt, financial aid, cost of attendance, economic diversity, racial diversity, enrollment, and student success.
  • College Scorecard Data – from the U.S. Department of Education; comprehensive and well-documented data from higher ed institutions for the past 20 years inc. financial aid, earnings info, performance of schools, student outcomes, etc. 
  • PayScale – U.S. Salary Data & Career Research Center (U.S.) searchable database of salary info on job title, degree/major, employer name or type, state, city, years experience, etc. Available for other countries as well.
  • See NYC and NYS education data links below
Health, Medicine, and Biological Sciences
  • American Cancer Society – facts and statistics reports give global, national, and state info on cancer inc. cases, deaths, risk factors, prevention, cancer screening, treatment, breast cancer, colorectal cancer, and for African Americans and Hispanic/Latinos. Bonus: Infographics Gallery.
  • Centers for Disease Control and Prevention – CDC data organized by topic; inc. vital statistics, state data, tools, and resources
  • Columbia Prediction of Infectious Diseases – CPID from Columbia University has compiled up-to-date Ebola data for Guinea, Liberia, Sierra Leone (as well as by province) from the World Health Organization and included three interactive predictive models for infection and mortality numbers: an optimized scenario, a no-change scenario, and a degraded scenario. Data is conveniently organized in tabular form. Models and maps for influenza incidence as well.
  • Cornell Bird Data – From Cornell's Lab of Ornithology; crowd-sourced bird data including bird species abundance, distribution, movements, breeding, population trends, and spread of diseases. Additional education guides; info about data collection, bird songs, feathers, and plumages.
  • General Social Survey – GSS data via the University of Chicago; since 1972, data collected regarding American people and society. Explore GSS questions, variables, and publications by subject, year, or keyword; downloadable in various formats.  
  • HealthData.gov – From the U.S. Dept. of Health and Human Services (in beta); data and datasets from various federal and state sources inc. the CDC, NIH, and CMS. Filters for topic, publisher, and format.
  • Nutritionix – Large nutrition database with information on common foods, packaged products, and restaurant items. 
  • Shark Attack Data – shark/human interaction data and graphs compiled from the Shark Research Institute; sortable by country, U.S. state, provocation status, and fatality status; incident details available; 1900 to present.
  • World Health Organization – WHO data; some interactive graphs available
  • USDA National Nutrient Database for Standard Reference – food nutrient information
  • See NYC health data links below
Miscellaneous
  • infoplease.com – lots of information from this Pearson Education site: everything from a word of the day to calendars to biographies to maps to general summary statistics. Lots of quantitative and qualitative info about business, economy, geography, sports, medicine, science, U.S. history and government, world history, technology, the 50 U.S. States, and more.
New York and Federal
  • Data.gov – U.S. government open data site for federal, state, and local data; 100,000+ datasets searchable by topic and keyword; resources and tools available for data visualizations
  • NYC Department of Education – NYCDOE data inc. school reports, graduation rates, and test results
  • NYC Department of Heath and Mental Hygiene – inc. vital statistics, HIV/AIDS data
  • New York State Education Department – NYSED public data inc. enrollment, teacher performance review ratings, and school report cards
  • NYC Open Data – NYC government open data site; 1300+ data sets, maps, and charts from various New York City agencies; to be more complete by 2018.
  • New York Police Department – NYPD site with current and historical crime data, traffic data, crime statistics, reports, and information.
Social Sciences
  • Bureau of Labor Statistics – BLS databases, tables, and calculators organized by topic
  • Bureau of Justice Statistics – BJS databases and tables organized by topic inc. courts, corrections, crime, and law enforcement data
  • Census Bureau – US census data tables (inc. those from the American Community Survey) organized by topic and geography; tools, infographics, and some visualizations available.
  • Encyclopedia Titanica – data and information from the RMS Titanic and it's 1912 sinking: deckplans, passenger and crew lists; statistics grouped by passenger class, boarding location, nationality, gender, age group, and survival status.
  • Federal Bureau of Investigation – FBI statistics site inc. data on hate crimes, terrorism, white-collar crimes, and campus crimes
  • Pew Research Center – downloadable datasets from nonpartisan public opinion polls, demographics, and social science research; some interactive visualizations available
  • Tax Foundation – non-profit think tank; includes federal and state data on tax rates, revenue, and burdens; interactive tools and calculators; discussion of international taxes as well.
  • United Nations – UN searchable data on a wide variety of topics
  • World Bank – international data organized by country, topic, and indicator
Sports
  • Advanced Football Analytics – NFL team and player stats; team, player, salary, and game analysis; game probabilities; playoff projections; stats glossary; calculators, visualization, and other tools; discussions about home field advantage, correlations, football fallacies, and more.
  • Baseball Prospectus – baseball data galore; data organized by offense, pitching, general, team, manager, and splits categories. 1871 to present.
  • Major League Baseball (MLB) – lots o' baseball data inc. pitching, hitting, batting, by team, by player, etc. 1876 to present.
  • Major League Soccer (MLS) – U.S. soccer data inc. team, player, leader, and all-time stats.
  • National Basketball Association – NBA data; team stats, player stats, lineups, etc.
  • National Collegiate Athletic Association (NCAA) – men and women's college sports stats searchable by sport, student-athlete, and team; records data, championship summaries, and championship results.
  • National Football League (NFL) – American football data; player stats, team stats, player info, history, rules, etc.
  • Sports Reference – Links to baseball, pro basketball, pro football, college basketball, college football, hockey, and olympic sports reference sites. Excellent for summary and micro data.
  • USA Today Sports – data for various professional and college sports inc. scores, schedules, player stats, team stats, salaries, odds, standings, and more.
Weather and Physical Sciences
  • MySound – From UConn; real time weather, water quality, and wave data in the Long Island Sound.
  • National Aeronautics and Space Administration (NASA) – Catalog of aerospace, applied science, earth science, management/operations, and space science data.
  • National Oceanic and Atmospheric Administration – NOAA's National Climate Data Center inc. marine/ocean, satellite, radar, climate, and weather data.
  • Roller Coaster Database – world roller coaster information inc. name, park, location, operating status, number of inversions, speed, height, length, layout shape, manufacturer, and more.
  • US Geological Survey – USGS data searchable by geographic location, keyword, USGS mission area, data source, and scientist.
Data Libraries and Archives – These types of sources contain datasets that may be good for teaching but may not be suitable for projects and assignments for which you wish students to produce work that is original in nature.
  • The Data and Story Library – DASL via Data Desk; data sets with real-world context; search options: by title, by topic, by statistical method, by subject, and full-text.
  • Data is Plural – Google spreadsheet of archived datasets from Jeremy Singer-Vine.
  • FiveThirtyEight – Data and code behind the stories at FiveThirtyEight. Available to download as .csv files. Well documented; linked to FiveThirtyEight articles and graphics.
  • Gapminder – compiled data from various sources with many different socioeconomic, health, and environmental indicators; dynamic visualizations à la Hans Rosling available.
  • Google Public Data Directory – limited archive for public data sets. Users may also upload their own datasets.
  • Inter-university Consortium for Political and Social Research – ICPSR data via the University of Michigan; searchable by topic, series, geographical location, and investigator; some online analysis, replication datasets, and learning guides available.
  • Journal of Statistics Education Data Archive – "Datasets and Stories," peer-reviewed, from JSE via ASA; teaching guides and resources; links to associated JSE articles featuring those data.
  • Rice Virtual Lab Case Studies – via Rice University; small number of case studies with datasets, analysis, computer output, and teaching guides.
Minor updates made 02/25/2017.