Tool-Calling

Introduction to Tool Calling

Tool calling is a powerful feature that enables Large Language Models (LLMs) to interact with external functions and APIs, extending their capabilities beyond text generation. SEA-LION models support tool calling with different implementations depending on the model version.

Tool calling allows models to:

  • Access real-time information (weather, time, web search)

  • Perform calculations and data processing

  • Interact with external systems and APIs

  • Execute specific functions based on user requests

This guide covers tool calling implementation for the SEA-LION model variants hosted on SEA-LION API, each with distinct behaviors and requirements. For demonstration purposes, the tools suggested in the tool implementation page will be used in the sample code snippets.

Model-Specific Tool Calling Guides

Gemma-SEA-LION-v4-27B-IT

Key Characteristics:

  • Uses text-based tool calling format

  • Requires parsing tool calls from response content

  • Does not utilize standard tool_calls parameter

  • Follows system prompt instructions for tool call formatting

Following the Gemma 3 chat template, Gemma-SEA-LION-v4-27B-IT does not parse the tools parameter, hence it is recommended to handle tool-calling via the parsing of the model's message response, similar to this example by Google DeepMind engineer Philipp Schmid.

When tool_choice is configured to enforce usage of a specific tool, the tool_calls parameter will be returned, but this removes flexibility from the LLM on determining whether tool call is required.

API Request Configuration

# For Gemma-SEA-LION-v4-27B-IT, DO NOT include tools in request
request_data = {
    "model": "aisingapore/Gemma-SEA-LION-v4-27B-IT",
    "messages": messages,
    "temperature": 0,
    # Note: No tools or tool_choice parameters
}

If enforcing tool call:

# For models that support native tool calling, include tools and enforce specific tool
request_data = {
    "model": "aisingapore/Gemma-SEA-LION-v4-27B-IT",
    "messages": messages,
    "temperature": 0,
    "tools": build_tool_schema(),
    "tool_choice": {
        "type": "function",
        "function": {"name": "get_current_weather"}  # Force use of specific tool
    }
}

# Alternative: Force any tool call (not a specific one)
request_data_any_tool = {
    "model": "aisingapore/Gemma-SEA-LION-v4-27B-IT", 
    "messages": messages,
    "temperature": 0,
    "tools": build_tool_schema(),
    "tool_choice": "required"  # Force model to use any available tool
}

Example Response (Tool-calling not enforced)

{
  "choices": [{
    "message": {
      "content": "```tool_code\nget_time(timezone=\"Asia/Singapore\")\n```",
      "role": "assistant"
    }
  }]
}

Llama-SEA-LION-v3-70B-IT

Key Characteristics:

  • Supports standard OpenAI-style function calling

  • Uses tool_calls parameter in responses

  • Requires tools configuration in API request

  • Works with tool_choice: "auto" setting

API Request Configuration

# For Llama-SEA-LION-v3-70B-IT, include tools and tool_choice
request_data = {
    "model": "aisingapore/Llama-SEA-LION-v3-70B-IT",
    "messages": messages,
    "temperature": 0,
    "tools": build_tool_schema(),
    "tool_choice": "auto"
}

Response Handling

def extract_tool_calls(data):
    """Extract tool calls from the response data."""
    choice = data.get("choices", [{}])[0] if data.get("choices") else {}
    return choice.get("message", {}).get("tool_calls")

# Usage
tool_calls = extract_tool_calls(response_data)
if tool_calls:
    # Execute tool calls directly
    tool_results = await execute_tool_calls(tool_calls, session)

Example Response

{
  "choices": [{
    "message": {
      "content": null,
      "role": "assistant",
      "tool_calls": [{
        "function": {
          "arguments": "{\"timezone\": \"Asia/Singapore\"}",
          "name": "get_time"
        },
        "id": "chatcmpl-tool-920019c71dd14d96a262ec798b778ccd",
        "type": "function"
      }]
    }
  }]
}

Llama-SEA-LION-v3.5-70B-R

Key Characteristics:

  • Reasoning model without tool calling capability

  • Tool-calling can be done via parsing from message response

  • Similar to Gemma-SEA-LION-v4-27B-IT using tool-calling via message content

  • Recommend not adding tools, tool_choice in API call

API Request Configuration

# For reasoning models, do NOT include tools to avoid errors
def is_reasoning_model(model_name):
    return model_name.endswith('-R')

# Request configuration
if is_reasoning_model(api_config["model"]):
    request_data = {
        "model": "aisingapore/Llama-SEA-LION-v3.5-70B-R",
        "messages": messages,
        "temperature": 0,
        # No tools or tool_choice parameters
    }
else:
    request_data = {
        "model": api_config["model"],
        "messages": messages,
        "temperature": 0,
        "tools": tools,
        "tool_choice": "auto"
    }

Implementation Example

Here's an examples that handles all three models, making use of the components provided in the tool implementation page:

async def process_user_message(user_message, messages, api_config, session):
    """Process a user message and handle tool calls for different model types."""
    messages.append({"role": "user", "content": user_message})
    tools = build_tool_schema()
    
    # Check model type
    is_reasoning_model = api_config["model"].endswith('-R')
    
    # Configure request based on model type
    request_data = {
        "model": api_config["model"],
        "messages": messages,
        "temperature": 0,
    }
    
    # Only add tools for non-reasoning models
    if not is_reasoning_model:
        request_data["tools"] = tools
        request_data["tool_choice"] = "auto"
    
    headers = {"Authorization": f"Bearer {api_config['api_key']}"}
    
    async with session.post(
        api_config["api_url"],
        json=request_data,
        headers=headers,
        timeout=30
    ) as response:
        data = await response.json()
    
    assistant_message = data.get("choices", [{}])[0].get("message")
    if not assistant_message:
        return
        
    messages.append(assistant_message)
    
    # Handle tool calls based on model type
    tool_calls = extract_tool_calls(data)
    if not tool_calls:
        # Tool call not found, parse tool call from message content
        message_content = assistant_message.get("content", "")
        if is_reasoning_model:
            # Check for tool call only in non-reasoning content to prevent excess calls
            message_content = message_content.split("</think>")[1].strip()
        tool_calls = parse_tool_calls_from_text(message_content)
    
    if tool_calls:
        # Execute tools and get final response
        tool_results = await execute_tool_calls(tool_calls, session)
        messages.extend(tool_results)
        
        # Get final response with tool results
        final_response = await session.post(
            api_config["api_url"],
            json={"model": api_config["model"], "messages": messages, "temperature": 0},
            headers=headers,
            timeout=30
        )
        final_data = await final_response.json()
        
        final_message = final_data.get("choices", [{}])[0].get("message")
        if final_message:
            print(final_message["content"])
            messages.append(final_message)
    else:
        # No tool calls, show direct response
        print(assistant_message.get("content", ""))

Points to Take Note Of

Model-Specific Considerations

  1. Gemma-SEA-LION-v4-27B-IT:

    • Typically uses text parsing instead of standard tool calling

    • System prompt should explicitly define tool call format

    • tools parameter in API requests is only utilized when tool_choice is set to "required" or specific tool is enforced

    • Tool calls are wrapped in ```tool_code blocks

    • Regex patterns needed for extraction

  2. Llama-SEA-LION-v3-70B-IT:

    • Fully supports OpenAI-style tool calling

    • Uses tools and tool_choice in API requests

    • Returns structured tool_calls in response

    • Reliable for production tool calling applications

  3. Llama-SEA-LION-v3.5-70B-R:

    • Reasoning model without tool calling capability

    • Tool-calling can be done via parsing from message response

    • Can reason about tool usage

    • Take note to parse from message content after reasoning segment, to prevent multiple redundant tool calls

    • Best used for complex reasoning tasks

General Best Practices

  1. Error Handling: Always implement proper error handling for tool execution failures and API timeouts.

  2. Model Detection: Use model name suffixes to determine the appropriate tool calling approach:

    is_reasoning_model = model_name.endswith('-R')
  3. Timeout Management: Set appropriate timeouts for both LLM API calls and tool execution

  4. Response Validation: Always validate tool call responses before processing:

    if not tool_calls or not isinstance(tool_calls, list):
        # Handle no tool calls case
  5. Conversation Flow: Maintain proper conversation history by adding all messages (user, assistant, tool results) to the messages array.

    • Gemma 3 chat template enforces the alternating of user and assistant in message history, hence in the example function execute_tool_calls, the role returned is user and not tool for the tool result

  6. Platform Considerations: Some models may behave differently on different platforms (e.g., Ollama vs cloud APIs). Test your implementation on your target platform.

  7. Token Efficiency: The text-based approach may use more tokens than standard function calling. Monitor usage accordingly.

Security Considerations

  • Validate all tool parameters before execution

  • Implement rate limiting for external API calls

  • Sanitize user inputs that will be passed to tools

  • Consider implementing tool execution sandboxing for production environments

Performance Optimization

  • Cache tool results where appropriate (e.g., weather data for short periods)

  • Implement parallel tool execution when multiple tools are called

  • Use connection pooling for HTTP requests in tool implementations

  • Consider implementing tool call batching for efficiency

Last updated