vllm_gaudi.extension.bucketing.exponential
¶
ExponentialBucketingStrategy
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
check_for_user_flags
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
get_decode_cfgs
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
get_prompt_cfgs
¶
Source code in vllm_gaudi/extension/bucketing/exponential.py
warmup_range_with_limit
¶
NOTE(kzawora): we'll use exponential spacing for buckets in which scaled power will return bmin for first bucket iteration, and bmax for last iteration, with elements between determined by the exponent, and base being unchanged. Note that after padding to bstep, duplicates may occur, and then shall be removed. Example (bmin=128, bstep=128, bmax=2048, num_buckets=10): There are 16 possible buckets (2048/128), and we'll attempt to select 10 of them with exponential spacing. base = (bmax/bmin) ** (1/(num_buckets-1)); (2048/128) ** (1/9) = 1.36079 exponent = i power = base ** exponent scaled_power = b_min * power For i == 0 (first bucket), power is 1.36079 ** 0 = 1; scaled_power is 1 * 128 = 128 (==bmin) For i == 9 (last bucket), power is 1.36079 ** 9 = 16; scaled_power is 16 * 128 = 2048 (==bmax) So, computing for all buckets: scaled_powers_unpadded = [bminbase^0(==bmin), bminbase^1, bminbase^2, ..., bminbase^9(==bmax)] scaled_powers_unpadded = [128.00, 174.18, 237.02, 322.54, 438.91, 597.26, 812.75, 1105.98, 1505.01, 2048.00]
We then remove duplicate buckets
scaled_powers_padded = [ 128, 256, 256, 384, 512, 640, 896, 1152, 1536, 2048] ^_^ duplicates buckets = [ 128, 256, 384, 512, 640, 896, 1152, 1536, 2048] ^ duplicate bucket removed