Pleias Releases Common Corpus, The Largest Open Multilingual Dataset for LLM training
As part of the Open Trusted Data Initiative, Pleias is releasing Common Corpus, the largest open and permissibly licenced dataset for training LLMs,…