What does it do?

  • Code Generation
  • Programming Assistance
  • Large Language Models
  • GitHub Integration
See more

How is it used?

  • Code to Code

Who is it good for?

  • Education
  • Research & Development
  • Technology

Details & Features

  • Made By

  • Released On


StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) that have been developed using permissively licensed data from GitHub. The data used for training these models encompasses information from over 80 programming languages, Git commits, GitHub issues, and Jupyter notebooks. A model with approximately 15 billion parameters was trained on 1 trillion tokens, similar to the LLaMA model. The StarCoderBase model was then specifically fine-tuned for 35 billion Python tokens, leading to the development of a new model known as StarCoder.

Research has shown that StarCoderBase performs better than existing open Code LLMs on popular programming benchmarks. It also matches or surpasses the performance of closed models like code-cushman-001 from OpenAI, which was the original Codex model used in early versions of GitHub Copilot.

The StarCoder models have several key features:
- They can process a context length of over 8,000 tokens, which is more than any other open LLM available.
- They can be used in a variety of applications. For instance, they can act as a technical assistant when provided with a series of dialogues as prompts.

Developers seeking a powerful LLM for code generation may find StarCoder to be a useful tool.

  • Supported ecosystems
    Hugging Face
  • What does it do?
    Code Generation, Programming Assistance, Large Language Models, GitHub Integration, Technical Assistance
  • Who is it good for?
    Education, Research & Development, Technology


  • GitHub Copilot generates code suggestions in real-time to enhance developer productivity.
  • CodeGPT: Create custom AI assistants without coding, integrate into projects via API, and boost developer productivity.
  • CodeT5 and CodeT5+ are open-source language models that automate coding tasks for developers.
  • Bloop converts legacy COBOL code into modern, readable Java code with identical behavior using AI.
  • Magic develops advanced code models that act as a coworker for developers, with a focus on AGI safety.
  • EasyCode is an AI-powered coding assistant that provides context-aware suggestions to enhance developer productivity.
  • Enhance coding with real-time suggestions, chat for queries, and multi-language support.
  • Sourcery provides instant AI-powered code reviews and refactoring suggestions for GitHub and GitLab pull requests.
  • Figstack helps developers understand, document, and optimize code using generative AI.
  • Wolfia is an AI platform that automates security reviews and questionnaires, providing accurate, contextual answers to save time.