Two cool things came to my attention today.
First, Google are opening up their BigQuery service. This allows companies to scan billions of rows in seconds, and retrieve results for complex analysis queries. Neat!
Secondly, Common Crawl have released a 7 billion page archive of websites they’ve scanned. If you think you can write a better search engine than Google, then you can test your theories against their pre-made dataset. You’ll need an Amazon EC3 account to do so, but it’s way cheaper and more convenient than building your own.