Now in Public Beta

Run AI workloads on GPUs.
70% cheaper than Modal.

Deploy containers, train models, and run inference. Multi-provider GPU cloud with per-second billing. No infrastructure to manage.

Get Started Free View Pricing

inference.py

import velar
app = velar.App("my-inference")
image = velar.Image.from_registry(    "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime").pip_install("transformers", "accelerate")
@app.function(gpu="A100", image=image)def generate(prompt: str):    from transformers import pipeline    pipe = pipeline("text-generation",                    model="meta-llama/Llama-2-7b")    return pipe(prompt, max_length=200)
# Deploy & runapp.deploy()result = generate.remote("What is machine learning?")

Everything you need to run GPU workloads

A complete platform for deploying, scaling, and monitoring AI infrastructure.

Multi-Provider GPUs

Access A100, A10, L4 GPUs from RunPod & Vast.ai. Best price guaranteed.

Per-Second Billing

Pay only for what you use. No minimums, no commitments.

Python-Native

Define everything in code. No YAML, no Docker, no Kubernetes.

Sub-Minute Deploys

From code to running GPU in under 60 seconds.

Scale to Zero

Pay nothing when idle. Scale up instantly on demand.

Built-in Monitoring

Real-time logs, cost tracking, and deployment status.

Transparent GPU pricing

Simple per-second billing. No hidden fees, no long-term commitments.

GPU	VRAM	Velar	Modal	Savings
A100 80GB	80 GB	$3.20/hr	$4.00/hr	20% cheaper
A10 24GB	24 GB	$1.40/hr	$1.10/hr	—
L4 24GB	24 GB	$0.85/hr	$0.80/hr	—

View full pricing details

Start building with $10 free credits

No credit card required. Deploy your first GPU workload in minutes.

No credit card required

Run AI workloads on GPUs.70% cheaper than Modal.