Back to Articles

Building Function Calling for Open-Source LLMs: Inside Hermes-2-Pro's Tool Architecture

[ View on GitHub ]

Building Function Calling for Open-Source LLMs: Inside Hermes-2-Pro’s Tool Architecture

Hook

While developers pay per token for function calling APIs, the Hermes-2-Pro model runs entirely locally with tool-use capabilities. The approach relies on ChatML formatting and recursive prompt engineering.

Context

Function calling enables language models to invoke external tools, APIs, and databases, transforming them from text generators into agents capable of taking action. For developers working with sensitive data, running in air-gapped environments, or wanting to avoid ongoing API costs, locally-run implementations offer an alternative to commercial APIs.

The Hermes-Function-Calling repository implements function calling capabilities for the Hermes-2-Pro open-source LLM. Built by NousResearch, this project demonstrates how to structure prompts, parse tool invocations, and execute a recursive reasoning loop—all with a model you can run on your own hardware. With 1,236 stars, the repository provides insight into the architecture patterns that make local function calling possible.

Technical Insight

At its core, the Hermes function calling system uses ChatML (Chat Markup Language) to create a structured protocol between the model and execution environment. ChatML wraps different message types in special tokens: <|im_start|>user, <|im_start|>assistant, and critically, <|im_start|>tool for function results. This formatting helps the model distinguish between conversational text and structured tool invocations.

Functions are registered and exposed to the model through the functions.py script. Every function uses a @tool decorator and includes detailed docstrings. The README shows this pattern:

@tool
def get_new_function(symbol: str) -> dict:
    """
    Description of the new function.
    Args:
        symbol (str): The stock symbol.
    Returns:
        dict: Dictionary containing the desired information.
    """
    try:
        stock = yf.Ticker(symbol)
        new_info = stock.new_method()
        return new_info
    except Exception as e:
        print(f"Error fetching new information for {symbol}: {e}")
        return {}

These functions are converted to OpenAI-compatible tool schemas via convert_to_openai_tool(), creating a schema from Python type hints and docstrings. The model receives this schema in its system prompt.

The execution loop in functioncall.py implements recursive reasoning with configurable depth limits (default 5 iterations). When the model wants to invoke a tool, it generates a structured <tool_call> block containing JSON with the function name and arguments. The framework parses this output, executes the corresponding Python function, and injects the result back into the conversation as a <tool_response> message. The model can then use this data to answer the user’s question or make additional function calls.

For JSON schema adherence without function execution, jsonmode.py takes a different approach using Pydantic models. You define a structured data model—the README shows a Character class example with name, species, role, and optional fields—then serialize it to JSON schema with Character.schema_json(). The model receives this schema in its prompt and generates conforming JSON output.

The prompter.py script manages prompt assembly. According to the README, it “reads the system prompt from a YAML file, formats it with the necessary variables (e.g., tools, examples, schema), and generates the final prompt for the model.” Few-shot examples can be controlled via the --num_fewshot command-line argument.

Performance optimization comes through the --load_in_4bit flag, which uses bitsandbytes quantization to reduce memory footprint—making local deployment more feasible on consumer hardware.

Gotcha

The implementation appears designed specifically for Hermes-2-Pro and ChatML format. The default model path is “NousResearch/Hermes-2-Pro-Llama-3-8B” with ChatML as the default chat template. Using a different model architecture would likely require adapting the prompt engineering and parsing logic.

Function registration is manual and happens at startup. The README shows that adding a function requires editing functions.py, defining the decorated function, and manually adding it to the get_openai_tools() list. There’s no indication of a plugin system or dynamic tool discovery. For applications needing many tools or user-defined functions, this approach may become cumbersome.

Error handling in the README examples shows basic try-catch blocks returning empty dictionaries on failure. The example prints an error message but doesn’t show retry logic, rate limiting, or other production-oriented error handling patterns. In production scenarios, you’d likely want more robust error handling for external API calls and better error propagation to the model.

Verdict

Use if: you’re building AI agents that need tool access with open-source models, working in environments where external APIs are problematic, learning how function calling works architecturally, or prototyping with the Hermes-2-Pro model family specifically. The codebase provides a clear reference implementation for understanding function calling patterns with open-source LLMs.

Skip if: you need production-grade reliability with comprehensive error handling, require broad model compatibility beyond Hermes/ChatML formats, are already using commercial APIs where native function calling would suffice, or want a framework with extensive pre-built integrations. This repository serves as educational scaffolding and a starting point for custom implementations rather than a production-ready library.

// QUOTABLE

While developers pay per token for function calling APIs, the Hermes-2-Pro model runs entirely locally with tool-use capabilities. The approach relies on ChatML formatting and recursive prompt engi...

[ Tweet This ]
// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/nousresearch-hermes-function-calling.svg)](https://starlog.is/api/badge-click/developer-tools/nousresearch-hermes-function-calling)