SEA-LION Documentation
SEA-LION.AISEA-HELM LeaderboardPlaygroundAquarium
  • Overview
    • SEA-LION
      • Motivations
      • Contributing
      • Code of Conduct
      • FAQ
  • Models
    • SEA-LION v3.5 (Latest)
      • Llama-SEA-LION-v3.5-8B
      • Llama-SEA-LION-v3.5-70B
    • SEA-LION v3
      • Gemma-SEA-LION-v3-9B
      • Llama-SEA-LION-v3-8B
      • Llama-SEA-LION-v3-70B
    • SEA-LION v2
    • SEA-LION v1
    • SEA-LION Foundation Family
    • Getting the models
  • Benchmarks
    • SEA-HELM
  • Guides
    • Inferencing
      • SEA-LION API
      • Google Vertex AI
  • Resources
    • Publications
Powered by GitBook

@ 2025 AI Singapore

On this page
  • Key Features of SEA-LION
  • Performance and Benchmarks
  • Licensing
  • Community
  • To Cite SEA-LION
  • Acknowledgements
  • Contact
Export as PDF
  1. Overview

SEA-LION

NextMotivations

Last updated 6 days ago

Southeast Asian Languages in One Network

Built for Southeast Asia, by Southeast Asia

Southeast Asian Languages in One Network (SEA-LION) is a family of open-source Large Language Models (LLMs) that better understands Southeast Asia’s (SEA) diverse contexts, languages, and cultures.

It is an open-source project anchored by the Products Pillar of AI Singapore. Our work in SEA-LION aims to create LLMs that cater to under-represented population groups and low resource languages in the SEA region. You can .

This site provides information and resources on SEA-LION, including how to access the models, hosting, and how-to guides.

Key Features of SEA-LION

Model Collection
Size
Context Length
Training Strategy
Available in

8B

128K

SFT¹ of Llama-SEA-LION-v3-8B-IT

Reasoning, GGUF

70B

128K

SFT of Llama-SEA-LION-v3-70B-IT

Reasoning, GGUF

9B

8192

CPT² of Gemma2

Base, Instruct, GGUF

8B

128K

CPT of Llama 3.1 8B

Base, Instruct, GGUF

70B

128K

CPT of Llama 3.1 70B

Base, Instruct, GGUF

8B

8192

CPT of Llama3

Base, Instruct, GGUF

3B

2048

Pre-training from scratch

Base

7B

2048

Pre-training from scratch

Instruct

¹ Supervised Fine-Tuning

² Continued Pre-Training

Performance and Benchmarks

SEA-LION has seen:

  • In v1, ability to outperform most models based on SEA-HELM (Southeast Asian Holistic Evaluation of Language Models) when it was released

  • In v2, outperformance for SEA tasks, while retaining credible performance on standard (English) benchmarks

  • In v2.1, key improvements in conversational abilities across SEA languages, while providing more helpful and contextually appropriate responses to user prompts

  • In v3, outperforms similar sized open source models, and even some larger models in both general and SEA capabilities

  • In v3.5, ability to handle reasoning tasks, with the versatility of handling general tasks as well while maintaining similar performance with state-of-the-art models.

  1. How SEA-LION compares to other available models along different metrics

  2. What SEA-HELM is and the four key capabilities it is evaluated on: English performance, Proficiency in SEA chat, Instruction-following and Linguistic tasks

  3. What each of these globally recognized metrics mean under SEA-HELM

Licensing

Transparent and Open Source

We have benefited greatly from the open-source community and believe that efforts to better represent our region will similarly be well served by open-source efforts.

SEA-LION will also be open and transparent in the following areas throughout this guide:

  1. Pre-Training data

  2. Model training code

  3. Fine-Tuning data

  4. Evaluation benchmarks

Community

Some ways to contribute:

  • Report bugs and issues

  • Enhance the documentation

  • Add more model evaluation tasks and metrics

  • Train versions of the model in more SEA languages

To Cite SEA-LION

If you use SEA-LION in your work, please cite it as:

@misc{sea_lion_2024,
  title={SEA-LION (Southeast Asian Languages In One Network): A Family of Large Language Models for Southeast Asia},
  author={AI Singapore},
  year={2024},
  howpublished={\url{https://github.com/aisingapore/sealion}}
}

If you are using SEA-LION v3 for your work, please cite it as:

@misc{2504.05747,
      title={SEA-LION: Southeast Asian Languages in One Network},
      author={Raymond Ng and Thanh Ngan Nguyen and Yuli Huang and Ngee Chia Tai and Wai Yi Leong and Wei Qi Leong and Xianbin Yong and Jian Gang Ngui and Yosephine Susanto and Nicholas Cheng and Hamsawardhini Rengarajan and Peerat Limkonchotiwat and Adithya Venkatadri Hulagadri and Kok Wai Teng and Yeo Yeow Tong and Bryan Siow and Wei Yi Teo and Wayne Lau and Choon Meng Tan and Brandon Ong and Zhi Hao Ong and Jann Railey Montalan and Adwin Chan and Sajeban Antonyrex and Ren Lee and Esther Choa and David Ong Tat-Wee and Bing Jie Darius Liu and William Chandra Tjhi and Erik Cambria and Leslie Teo},
      year={2025},
      eprint={2504.05747},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.05747},
}

Acknowledgements

AI Singapore is a national programme supported by the National Research Foundation, Singapore and hosted by the National University of Singapore. Any opinion, finding, conclusion or recommendation expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore, or the National University of Singapore.

We also grateful for the support of the Infocomm Media Development Authority (IMDA) of Singapore.

SEA-LION would not be possible without a growing list of Singapore, regional, and international collaborators. Please see our website for more details.

Contact

We use a holistic approach to evaluation, including not just traditional Natural Language Processing (NLP) benchmarking tasks (such as sentiment analysis and question answering) but also .

Visit our for more detailed breakdown on:

All SEA-LION releases will therefore embrace an open-source ethos under the MIT license as much as possible; however, the exact licensing terms may vary depending on the underlying base model’s restrictions or requirements. For instance, if the model leverages Meta’s Llama3 codebase, it may be bound by the , which places certain restrictions on commercial use. Similarly, the Gemma-based variants may carry different terms. Users should always refer to the Hugging Face model card of each specific SEA-LION model for the most accurate, up-to-date license information.

We welcome contributions to SEA-LION! Check out the to get started.

Check out our also, for possible ways to further enhance and expand the capabilities of SEA-LION together.

If you have questions, comments, or issues, please open a GitHub issue or contact us via .

read more about our motivations for SEA-LION here
meticulously handcrafted linguistic and cultural diagnostic tests tailored to Southeast Asia
Leaderboard
Llama3 License
contributing guide
collaborations guide
sealion@aisingapore.org
SEA-LION v3.5
SEA-LION v3
SEA-LION v2
SEA-LION v1