Use Cases

From real-time inference to large-scale training, Velar handles any GPU workload.

LLM Inference

Deploy large language models like Llama, Mistral, and GPT-J for real-time text generation. Velar handles GPU allocation, scaling, and load balancing so you can focus on building your application.

  • Auto-scale based on request volume
  • Support for vLLM, TGI, and custom serving stacks
  • Sub-second cold starts with image caching
Try it now
import velar

app = velar.App("llm-serving")

image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("vllm")

@app.function(gpu="A100", image=image)
def chat(prompt: str, max_tokens: int = 512):
    from vllm import LLM, SamplingParams
    llm = LLM(model="meta-llama/Llama-2-13b-chat-hf")
    params = SamplingParams(max_tokens=max_tokens)
    return llm.generate([prompt], params)[0].outputs[0].text

Image & Video Generation

Run Stable Diffusion, SDXL, and video generation models at scale. Generate images in milliseconds with warm GPU pools and automatic batching for high-throughput workloads.

  • Optimized for diffusion model architectures
  • Batch processing for bulk generation
  • Persistent model caching across invocations
Try it now
import velar

app = velar.App("image-gen")

image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("diffusers", "transformers", "accelerate")

@app.function(gpu="A10", image=image)
def generate_image(prompt: str, steps: int = 30):
    from diffusers import StableDiffusionXLPipeline
    import torch
    pipe = StableDiffusionXLPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        torch_dtype=torch.float16
    ).to("cuda")
    return pipe(prompt, num_inference_steps=steps).images[0]

Model Fine-Tuning

Fine-tune foundation models on your own data with managed training jobs. Velar provisions multi-GPU clusters, handles checkpointing, and streams training metrics in real time.

  • Multi-GPU distributed training
  • Automatic checkpointing and resumption
  • Integration with W&B and MLflow
Try it now
import velar

app = velar.App("fine-tune")

image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("transformers", "peft", "datasets", "trl")

@app.function(gpu="A100", timeout=3600, image=image)
def train(dataset_path: str, base_model: str):
    from transformers import AutoModelForCausalLM, TrainingArguments
    from trl import SFTTrainer
    from datasets import load_dataset

    model = AutoModelForCausalLM.from_pretrained(base_model)
    dataset = load_dataset("json", data_files=dataset_path)

    trainer = SFTTrainer(
        model=model,
        train_dataset=dataset["train"],
        args=TrainingArguments(output_dir="./output", num_train_epochs=3),
    )
    trainer.train()

Batch Processing

Process large datasets with GPU-accelerated batch jobs. Velar automatically parallelizes work across multiple GPUs and handles retries, making it ideal for embeddings, transcription, and data pipelines.

  • Automatic parallelization across GPU fleet
  • Built-in retry logic and error handling
  • Progress tracking and cost estimation
Try it now
import velar

app = velar.App("batch-embed")

image = velar.Image.from_registry(
    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
).pip_install("sentence-transformers")

@app.function(gpu="L4", image=image)
def embed_batch(texts: list[str]):
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("BAAI/bge-large-en-v1.5")
    return model.encode(texts, batch_size=64).tolist()

# Process 100k documents in parallel
@app.local_entrypoint()
def main():
    documents = load_documents()  # your data
    chunks = [documents[i:i+100] for i in range(0, len(documents), 100)]
    results = [embed_batch.remote(chunk) for chunk in chunks]

Ready to build?

Start with $10 in free GPU credits. No credit card required.