What does it do?

  • Code Generation
  • Programming Assistance
  • Large Language Models
  • GitHub Integration
See more

How is it used?

  • Code to Code

Who is it good for?

  • Education
  • Research & Development
  • Technology

Details & Features

  • Made By

  • Released On


StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) that have been developed using permissively licensed data from GitHub. The data used for training these models encompasses information from over 80 programming languages, Git commits, GitHub issues, and Jupyter notebooks. A model with approximately 15 billion parameters was trained on 1 trillion tokens, similar to the LLaMA model. The StarCoderBase model was then specifically fine-tuned for 35 billion Python tokens, leading to the development of a new model known as StarCoder.

Research has shown that StarCoderBase performs better than existing open Code LLMs on popular programming benchmarks. It also matches or surpasses the performance of closed models like code-cushman-001 from OpenAI, which was the original Codex model used in early versions of GitHub Copilot.

The StarCoder models have several key features:
- They can process a context length of over 8,000 tokens, which is more than any other open LLM available.
- They can be used in a variety of applications. For instance, they can act as a technical assistant when provided with a series of dialogues as prompts.

Developers seeking a powerful LLM for code generation may find StarCoder to be a useful tool.

  • Supported ecosystems
    Hugging Face
