vLLM on Linux

The following guide explains the procedures for deploying a SEA-LION model on a Linux server.

Prerequisites

  • OS: Linux

  • Python: 3.9-3.12

  • vLLM version: 0.10.1.1

  • uv 0.7.x installed

  • CUDA drivers version 12 installed

Environment Setup

To get started, you will need to create an environment and install vLLM. The steps below outline how to install and use uv as the package manager, and then proceeding to install vLLM.

  1. Navigate to the desired base directory. In this guide, the base folder is assumed to be /home/sealion-user.

cd /home/sealion-user
  1. Create a new working directory and enter it

mkdir sealion_test && cd sealion_test
  1. Initialize uv

uv init
  1. Add the vllm dependency

  1. Clone vllm github repository and rename the top-level directory to vllm_code. The renaming is necessary to prevent Python from importing modules from the vllm repository directory instead of the packages installed in the environment.

Model Deployment using vLLM

The steps below outline how to host a vLLM service using GPUs.

  1. Navigate to working directory and activate environment:

  1. Set relevant values of environment variables. It is necessary to set VLLM_CACHE_ROOT to prevent errors arising from insufficient disk space as a result of using the default vLLM cache directory:

  1. Start the server with the desired model. The aisingapore/Gemma-SEA-LION-v4-27B-IT model is used in this example.

Alternatively, the server can be started with the below command:

  1. Create a python script main.py similar to the following:

  1. Run the python script. The following assumes it is called main.py.

You can also query the model with input prompts using curl method:

The output obtained should be similar to the following:

Last updated