Tool-Calling

Introduction to Tool Calling

Tool calling is a powerful feature that enables Large Language Models (LLMs) to interact with external functions and APIs, extending their capabilities beyond text generation. SEA-LION models support tool calling with different implementations depending on the model version.

Tool calling allows models to:

  • Access real-time information (weather, time, web search)

  • Perform calculations and data processing

  • Interact with external systems and APIs

  • Execute specific functions based on user requests

This guide covers tool calling implementation for the SEA-LION model variants hosted on SEA-LION API, each with distinct behaviors and requirements. For demonstration purposes, the tools suggested in the tool implementation page will be used in the sample code snippets.

Model-Specific Tool Calling Guides

Gemma-SEA-LION-v4-27B-IT

Key Characteristics:

  • Uses text-based tool calling format

  • Requires parsing tool calls from response content

  • Does not utilize standard tool_calls parameter

  • Follows system prompt instructions for tool call formatting

Following the Gemma 3 chat template, Gemma-SEA-LION-v4-27B-IT does not parse the tools parameter, hence it is recommended to handle tool-calling via the parsing of the model's message response, similar to this example by Google DeepMind engineer Philipp Schmid.

When tool_choice is configured to enforce usage of a specific tool, the tool_calls parameter will be returned, but this removes flexibility from the LLM on determining whether tool call is required.

API Request Configuration

If enforcing tool call:

Example Response (Tool-calling not enforced)

Llama-SEA-LION-v3-70B-IT

Key Characteristics:

  • Supports standard OpenAI-style function calling

  • Uses tool_calls parameter in responses

  • Requires tools configuration in API request

  • Works with tool_choice: "auto" setting

API Request Configuration

Response Handling

Example Response

Llama-SEA-LION-v3.5-70B-R

Key Characteristics:

  • Reasoning model without tool calling capability

  • Tool-calling can be done via parsing from message response

  • Similar to Gemma-SEA-LION-v4-27B-IT using tool-calling via message content

  • Recommend not adding tools, tool_choice in API call

API Request Configuration

Implementation Example

Here's an examples that handles all three models, making use of the components provided in the tool implementation page:

Points to Take Note Of

Model-Specific Considerations

  1. Gemma-SEA-LION-v4-27B-IT:

    • Typically uses text parsing instead of standard tool calling

    • System prompt should explicitly define tool call format

    • tools parameter in API requests is only utilized when tool_choice is set to "required" or specific tool is enforced

    • Tool calls are wrapped in ```tool_code blocks

    • Regex patterns needed for extraction

  2. Llama-SEA-LION-v3-70B-IT:

    • Fully supports OpenAI-style tool calling

    • Uses tools and tool_choice in API requests

    • Returns structured tool_calls in response

    • Reliable for production tool calling applications

  3. Llama-SEA-LION-v3.5-70B-R:

    • Reasoning model without tool calling capability

    • Tool-calling can be done via parsing from message response

    • Can reason about tool usage

    • Take note to parse from message content after reasoning segment, to prevent multiple redundant tool calls

    • Best used for complex reasoning tasks

General Best Practices

  1. Error Handling: Always implement proper error handling for tool execution failures and API timeouts.

  2. Model Detection: Use model name suffixes to determine the appropriate tool calling approach:

  3. Timeout Management: Set appropriate timeouts for both LLM API calls and tool execution

  4. Response Validation: Always validate tool call responses before processing:

  5. Conversation Flow: Maintain proper conversation history by adding all messages (user, assistant, tool results) to the messages array.

    • Gemma 3 chat template enforces the alternating of user and assistant in message history, hence in the example function execute_tool_calls, the role returned is user and not tool for the tool result

  6. Platform Considerations: Some models may behave differently on different platforms (e.g., Ollama vs cloud APIs). Test your implementation on your target platform.

  7. Token Efficiency: The text-based approach may use more tokens than standard function calling. Monitor usage accordingly.

Security Considerations

  • Validate all tool parameters before execution

  • Implement rate limiting for external API calls

  • Sanitize user inputs that will be passed to tools

  • Consider implementing tool execution sandboxing for production environments

Performance Optimization

  • Cache tool results where appropriate (e.g., weather data for short periods)

  • Implement parallel tool execution when multiple tools are called

  • Use connection pooling for HTTP requests in tool implementations

  • Consider implementing tool call batching for efficiency

Last updated