Wikipedia data dump

StackExchange data dump

US Supreme Court

US postal codes

Weather data

Million Song Dataset:

Transportation Dataset

Catalog of 33 datasource

Baseball game data: Dataset

Project Challenges

Youtube dataset:

Datasets for machine learning

Company profiles

Microfinance Lending Data

World Bank - International Debt Statistics

AWS public data set

S3 bucket for public dataset: s3://aws-publicdatasets

Amazon product review data

USDA Food database

US Federal Commission Data

Data published by

NYC Open Data

World Bank

Open Data Catalog

US Geological Survey Science Data Catalog

Geolocation and IP mapping

List of cities

Graph Data

ICON is a comprehensive index of research-quality network data sets from all domains of network science, including social, web, information, biological, ecological, connectome, transportation, and technological networks.

Each network record in the index is annotated with and searchable or browsable by its graph properties, description, size, etc., and many records include links to multiple networks. The contents of ICON are curated by volunteer experts from Prof. Aaron Clauset's research group at the University of Colorado Boulder.

KONECT is a comprehensive archive that provides not only the data (dozens of networks), but also summary statistics about each dataset.

Social Network Data

Medline Database - a database of academic papers that have been published in journals covering the life sciences and medicine

With 3.5 billion nodes and 128 billion edges, this is the largest known freely available real world graph dataset.

Case Studies on Benefits of Open Data

  • Business case for open data

English Dictionary Database

Awesome public dataset


CTR (click through rate) prediction

  • Criteo:
  • Avazu:
  • Outbrain:
  • RecSys 2015: