Usage
Options
minimum floor load in perf units/s (token/s for LLms)
minimum floor load in perf units/s (token/s for LLms), but allow handling with cold workers
target capacity utilization (fraction, max 1.0, default 0.9)
cold/stopped instance capacity target as multiple of hot capacity target (default 2.5)
min number of workers to keep ‘cold’ when you have no load (default 5)
max number of workers your endpoint group can have (default 20)
deployment endpoint name (allows multiple autoscale groups to share same deployment endpoint)
Description
Create a new endpoint group to manage many autoscaling groups Example: vastai create endpoint —target_util 0.9 —cold_mult 2.0 —endpoint_name “LLama”Global Options
The following options are available for all commands:| Option | Description |
|---|---|
--url URL | Server REST API URL |
--retry N | Retry limit |
--raw | Output machine-readable JSON |
--explain | Verbose explanation of API calls |
--api-key KEY | API key (defaults to ~/.config/vastai/vast_api_key) |