vllm_gaudi.ops.ops_selector
¶
Selector module to switch between PyTorch and Triton implementations of Mamba operations based on environment variable.
Set VLLM_MAMBA_USE_PYTORCH=1 to use PyTorch implementations. Default (unset or 0) uses optimized Triton implementations.
_USE_SELECTIVE_STATE_UPDATE_REF
module-attribute
¶
_USE_SELECTIVE_STATE_UPDATE_REF = (
get("VLLM_MAMBA_USE_SELECTIVE_STATE_UPDATE_REF_PT", "1")
== "1"
)
_use_pytorch_runtime
¶
Check at runtime whether to use PyTorch implementation.
This allows torch.compile to respect the environment variable.
_wrap_selective_state_update_ref
¶
Wrapper to adapt PyTorch selective_state_update_ref to match Triton API.
Source code in vllm_gaudi/ops/ops_selector.py
get_selective_state_update_impl
¶
Returns the selective state update implementation.
PyTorch version signature
selective_state_update_ref(state, x, dt, A, B, C, D=None, z=None, dt_bias=None, dt_softplus=False) Returns: output tensor