Return to Projects



BigCode is an open scientific collaboration working on the responsible development and use of large language models for code (Code LLMs), empowering the machine learning and open source communities through open governance. Highlights of BigCode's contributions back to the community include The Stack dataset (6.4TB of source code in 358 programming languages from permissive licenses), the StarCoder LLMs, along with supporting papers, code, tools, demos, and project governance card.

BigCode invites AI researchers to work together on the development of state-of-the-art code LLMs, and collaborate on research topics such as:

1) Constructing a representative evaluation suite for code LLMs, covering a diverse set of tasks and programming languages;

2) Developing new methods for faster training and inference of LLMs;

3) The legal, ethics, governance, and safety aspects of code LLMs.