Back to Jeff Huang's Main Page
Web Search Query Log Downloads
Query logs from real search engines are hard to find. Here are ones that I've downloaded before without too much difficulty. Keep in mind nearly all contain a license you need to agree upon before downloading, and are for non-commercial uses only.
English Search Engines
- AOL Query Logs (2006) [36M queries]: Includes anonymized user ids and click data. The queries are unfiltered. If you're using this for a research publication, be aware that some reviewers will be unhappy about the controversy surrounding the release of these logs. Mirrors of the dataset are in the link, but only one still seems to be working last I checked.
- MSN Query Logs (2006 and 2007) [14M and 100M]: Includes session ids (no user ids) and click data. You will have to ask for this dataset from Microsoft (Evelyne Viegas or Nick Craswell are appropriate contacts).
Non-english Search Engines
- Sogou Query Logs (2008) [44M]: Queries from a Chinese search engine. Includes anonymized user ids and click data. Porn and "illegal" queries are filtered out. Also seems to filter out queries without clicks. To get this dataset, go to the link and download (link at the bottom); you will be asked to register, and then be provided an ftp link where you will need to prepend your username/password to get.
- Yandex Query Logs (2009) [341M]: From the most popular Russian search engine. Query text is anonymized as ids so it's not useful for any query processing. Includes session ids and click data (but anonymized). Commerical queries removed. To get this dataset, go to the link and register; you will be sent a download link.
If you know of more resources, please let me know and I will add them to the list.
Back to Jeff Huang