
Techmeme
shared a link post in group #Stream of Goodies

www.techmeme.com
An in-depth look at Common Crawl, the 9.5PB web crawl archive dating back to 2008 run by a small nonprofit, its role in generative AI, its dataset, and more
From Mozilla Foundation. View the full context on Techmeme.