vllm_gaudi.extension.bucketing.exponential
¶
ExponentialBucketingStrategy
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
check_for_user_flags
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
get_decode_cfgs
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
get_prompt_cfgs
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
warmup_range_with_limit
¶
NOTE(kzawora): we'll use exponential spacing for buckets in which scaled power will return bmin for first bucket iteration, and bmax for last iteration, with elements between determined by the exponent, and base being unchanged. Note that after padding to bstep, duplicates may occur, and then shall be removed. Example (bmin=128, bstep=128, bmax=2048, num_buckets=10): There are 16 possible buckets (2048/128), and we'll attempt to select 10 of them with exponential spacing. base = (bmax/bmin) ** (1/(num_buckets-1)); (2048/128) ** (1/9) = 1.36079 exponent = i power = base ** exponent scaled_power = b_min * power For i == 0 (first bucket), power is 1.36079 ** 0 = 1; scaled_power is 1 * 128 = 128 (==bmin) For i == 9 (last bucket), power is 1.36079 ** 9 = 16; scaled_power is 16 * 128 = 2048 (==bmax) So, computing for all buckets: scaled_powers_unpadded = [bminbase^0(==bmin), bminbase^1, bminbase^2, ..., bminbase^9(==bmax)] scaled_powers_unpadded = [128.00, 174.18, 237.02, 322.54, 438.91, 597.26, 812.75, 1105.98, 1505.01, 2048.00]
We then remove duplicate buckets
scaled_powers_padded = [ 128, 256, 256, 384, 512, 640, 896, 1152, 1536, 2048] ^_^ duplicates buckets = [ 128, 256, 384, 512, 640, 896, 1152, 1536, 2048] ^ duplicate bucket removed