Skip to content

vllm.tool_parsers.gemma4_engine_tool_parser

Classes:

Gemma4EngineToolParser

Bases: Gemma4ParserToolAdapter

Methods:

  • adjust_request

    Skip structured-output JSON for required/named tool choice.

Source code in vllm/tool_parsers/gemma4_engine_tool_parser.py
class Gemma4EngineToolParser(Gemma4ParserToolAdapter):  # type: ignore[valid-type, misc]
    supports_required_and_named = False

    def adjust_request(
        self, request: ChatCompletionRequest | ResponsesRequest
    ) -> ChatCompletionRequest | ResponsesRequest:
        """Skip structured-output JSON for required/named tool choice.

        Gemma4 emits its native ``<|tool_call>call:...`` syntax, which the
        parser extracts directly. The base ``ToolParser.adjust_request`` would
        set ``structured_outputs`` for required/named and force JSON via guided
        decoding, conflicting with that native syntax (it leaks as content and
        crashes EngineCore under speculative decoding). Skip it so the model
        emits its native format (mirrors the GLM4 parser).
        """
        if request.tools:
            tc = request.tool_choice
            if tc == "required" or isinstance(
                tc, (ChatCompletionNamedToolChoiceParam, ToolChoiceFunction)
            ):
                request.skip_special_tokens = False
                return request
        return super().adjust_request(request)

adjust_request(request)

Skip structured-output JSON for required/named tool choice.

Gemma4 emits its native <|tool_call>call:... syntax, which the parser extracts directly. The base ToolParser.adjust_request would set structured_outputs for required/named and force JSON via guided decoding, conflicting with that native syntax (it leaks as content and crashes EngineCore under speculative decoding). Skip it so the model emits its native format (mirrors the GLM4 parser).

Source code in vllm/tool_parsers/gemma4_engine_tool_parser.py
def adjust_request(
    self, request: ChatCompletionRequest | ResponsesRequest
) -> ChatCompletionRequest | ResponsesRequest:
    """Skip structured-output JSON for required/named tool choice.

    Gemma4 emits its native ``<|tool_call>call:...`` syntax, which the
    parser extracts directly. The base ``ToolParser.adjust_request`` would
    set ``structured_outputs`` for required/named and force JSON via guided
    decoding, conflicting with that native syntax (it leaks as content and
    crashes EngineCore under speculative decoding). Skip it so the model
    emits its native format (mirrors the GLM4 parser).
    """
    if request.tools:
        tc = request.tool_choice
        if tc == "required" or isinstance(
            tc, (ChatCompletionNamedToolChoiceParam, ToolChoiceFunction)
        ):
            request.skip_special_tokens = False
            return request
    return super().adjust_request(request)