vllm_gaudi.utils
¶
HPUCompileConfig
¶
Configuration class, which holds arguments that will be passed to torch compile with HPU backend.
Source code in vllm_gaudi/utils.py
dynamic
instance-attribute
¶
fullgraph
instance-attribute
¶
__init__
¶
Allow to override the environment variables for corner case scenarios when single functions are compiled with torch.compile decorator. Env variables should not be overwritten when it comes to compilation of the whole model.
Source code in vllm_gaudi/utils.py
get_compile_args
¶
Returns a dictionary of compile arguments that can be used with torch.compile method or decorator
Source code in vllm_gaudi/utils.py
async_h2d_copy
¶
Asynchronously transfer data from host to device.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
CPU tensor or raw data to transfer |
required | |
dest_tensor
|
Optional pre-allocated destination tensor |
None
|
|
dtype
|
Required if source is raw data |
None
|
|
device
|
Target device |
'hpu'
|
Returns:
| Type | Description |
|---|---|
|
torch.Tensor on target device |
Source code in vllm_gaudi/utils.py
async_h2d_update
¶
Asynchronously update specific rows of a device tensor from a CPU tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Tensor
|
CPU tensor with data to copy |
required |
dest
|
Tensor
|
Device tensor to update |
required |
indices
|
list[int]
|
List of row indices in dest to update |
required |
device
|
Target device |
'hpu'
|
Source code in vllm_gaudi/utils.py
hpu_backend_string
cached
¶
hpu_device_string
cached
¶
make_mrope_positions_tensor_with_pad
¶
make_mrope_positions_tensor_with_pad(
input_positions: list[list[int]],
input_mrope_positions: list[list[list[int]]],
max_prompt_len: int,
pad: int,
) -> list[list[int]]
Source code in vllm_gaudi/utils.py
make_ndarray_with_pad_align
¶
make_ndarray_with_pad_align(
x: list[list[T]],
pad: T,
dtype: DTypeLike,
*,
max_len_align: int = 1024,
) -> NDArray
Make a padded array from 2D inputs.
The padding is applied to the end of each inner list until it reaches
max_len.
Source code in vllm_gaudi/utils.py
make_tensor_with_pad_align
¶
make_tensor_with_pad_align(
x: list[list[T]],
pad: T,
dtype: dtype,
*,
max_len_align: int = 1024,
device: Optional[Union[str, device]] = None,
pin_memory: bool = False,
) -> Tensor
Make a padded tensor from 2D inputs.
The padding is applied to the end of each inner list until it reaches
max_len_aligned, max_len_aligned is max_len rounding to the nearest
max_len_align.