Usage
Arguments
id of autoscale group to update
Options
minimum floor load in perf units/s (token/s for LLms)
target capacity utilization (fraction, max 1.0, default 0.9)
cold/stopped instance capacity target as multiple of hot capacity target (default 2.5)
min number of workers to keep ‘cold’ for this workergroup
number of workers to create to get an performance estimate for while initializing workergroup (default 3)
estimated GPU RAM req (independent of search string)
template hash (Note: if you use this field, you can skip search_params, as they are automatically inferred from the template)
template id
search param string for search offers ex: “gpu_ram>=23 num_gpus=2 gpu_name=RTX_4090 inet_down>200 direct_port_count>2 disk_space>=64”
Disable default search param query args (alias:
--no-default)launch args string for create instance ex: “—onstart onstart_wget.sh —env ‘-e ONSTART_PATH=https://s3.amazonaws.com/public.vast.ai/onstart_OOBA.sh’ —image atinoda/text-generation-webui:default-nightly —disk 64”
deployment endpoint name (allows multiple workergroups to share same deployment endpoint)
deployment endpoint id (allows multiple workergroups to share same deployment endpoint)
Description
Example: vastai update workergroup 4242 —min_load 100 —target_util 0.9 —cold_mult 2.0 —search_params “gpu_ram>=23 num_gpus=2 gpu_name=RTX_4090 inet_down>200 direct_port_count>2 disk_space>=64” —launch_args “—onstart onstart_wget.sh —env ‘-e ONSTART_PATH=https://s3.amazonaws.com/public.vast.ai/onstart_OOBA.sh’ —image atinoda/text-generation-webui:default-nightly —disk 64” —gpu_ram 32.0 —endpoint_name “LLama” —endpoint_id 2Global Options
The following options are available for all commands:| Option | Description |
|---|---|
--url URL | Server REST API URL |
--retry N | Retry limit |
--raw | Output machine-readable JSON |
--explain | Verbose explanation of API calls |
--api-key KEY | API key (defaults to ~/.config/vastai/vast_api_key) |