HNFHacker News: Front Page·2 years agoshared a link post in group #Stream of Goodiespile.eleuther.aiThe PileThe Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.