vllm.tool_parsers.gemma4_utils ¶
Gemma4 tool call parsing utilities for offline inference.
Standalone functions that parse decoded model text to extract tool calls from Gemma4 models. These are pure-Python utilities with zero heavy dependencies — they work on raw decoded strings from any inference backend (vLLM, HuggingFace, TGI, etc.).
For the OpenAI-compatible API server tool parser (streaming + non-streaming), see vllm.tool_parsers.gemma4_tool_parser. For thinking/reasoning output parsing, see vllm.reasoning.gemma4_utils.
Usage with vLLM offline inference::
from vllm import LLM, SamplingParams
from vllm.tool_parsers.gemma4_utils import (
parse_tool_calls,
has_tool_response_tag,
)
llm = LLM(model="google/gemma-4-it")
outputs = llm.generate(prompt, SamplingParams(...))
text = tokenizer.decode(outputs[0].outputs[0].token_ids, skip_special_tokens=False)
# Extract tool calls
tool_calls = parse_tool_calls(text)
for tc in tool_calls:
print(f"{tc['name']}({tc['arguments']})")
Ported from transformers.models.gemma4.utils_gemma4 so that vLLM users do not need a transformers dependency for output parsing.
Functions:
-
has_tool_response_tag–Check if model output properly ends with a tool response tag.
-
parse_tool_calls–Parse tool calls from decoded Gemma4 model output.
_parse_tool_arguments(args_str) ¶
Parse tool call arguments from the Gemma4 compact format.
Delegates to the native <|"|>-aware parser from vllm.parser.gemma4, which handles internal quotes, nested objects, arrays, and all Gemma4 value types correctly.
Parameters:
Returns:
Source code in vllm/tool_parsers/gemma4_utils.py
has_tool_response_tag(text) ¶
Check if model output properly ends with a tool response tag.
Some Gemma4 models sometimes emit <eos> instead of <|tool_response> after a tool call. This helper detects whether the model used the proper termination, so callers can decide whether to inject <|tool_response> into the next prompt.
Parameters:
Returns:
Example::
>>> from vllm.tool_parsers.gemma4_utils import has_tool_response_tag
>>> if not has_tool_response_tag(model_output):
... # Model used <eos> instead — inject <|tool_response> manually
... next_prompt = "<|tool_response>" + tool_result
Source code in vllm/tool_parsers/gemma4_utils.py
parse_tool_calls(text, *, strict=False) ¶
Parse tool calls from decoded Gemma4 model output.
Uses a tiered parsing strategy to handle known output variations in Gemma4 models, which may emit non-standard tool call formats.
Parsing tiers
- Standard:
<|tool_call>call:name{args}<tool_call|>(special token IDs 48/49 in decoded text) - Fallback (when
strict=False): barecall:name{args}patterns, including<call>name{args}(fragmented tokens from multimodal inputs)
Parameters:
-
(text¶str) –Decoded model output text (from
tokenizer.decode(..., skip_special_tokens=False)). -
(strict¶bool, default:False) –If
True, only match the standard<|tool_call>format. IfFalse(default), also try fallback patterns for known Gemma4 output variations.
Returns:
-
list[dict]–A list of dicts, each with keys: -
"name": The tool function name (e.g."get_weather"). -"arguments": A dict of argument name → value.
Example::
>>> from vllm.tool_parsers.gemma4_utils import parse_tool_calls
>>> output = tokenizer.decode(outputs[0], skip_special_tokens=False)
>>> tool_calls = parse_tool_calls(output)
>>> for tc in tool_calls:
... print(f"Call: {tc['name']}({tc['arguments']})")