DeepInfra serverless LLM and embeddings
DeepInfra is the serverless inference platform for open-source ML models — Llama variants, Mixtral, Qwen, Stable Diffusion, embedding models, transcription, all behind an OpenAI-compatible API. Tiny Command exposes three actions, no triggers: Chat Completion (against text-generation models with the OpenAI-compatible message-array shape — pick from meta-llama/Llama-3.3-70B-Instruct, mistralai/Mixtral-8x22B-Instruct, Qwen/QwQ-32B, etc.), Create Embeddings (sentence embeddings for vector workflows — sentence-transformers and BGE models), List Models. The connection uses a DeepInfra API key from deepinfra.com. The "OpenAI-compatible" claim is real — the URL prefix changes (https://api.deepinfra.com/v1/openai/chat/completions) and the model parameter takes the DeepInfra model ID; the rest of the request matches OpenAI. DeepInfra's edge is open-weight model pricing — Llama and Qwen variants at a fraction of OpenAI's per-token cost.
No credit card required · Set up in under 2 minutes
Every action accepts dynamic inputs from upstream nodes, whether that's an AI output, a form field, or a search result.
| Action | What it does | Open action |
|---|---|---|
| DeepInfra Chat Completion | Runs chat completion against DeepInfra-hosted open-source models (Llama, Mixtral, Qwen, DeepSeek) using OpenAI-compatible message-array shape. Competitive pricing for OSS inference at scale. | |
| DeepInfra Embeddings | Generates embeddings from DeepInfra-hosted models (BGE, sentence-transformers). For RAG pipeline vector generation on a budget-friendly OSS inference provider. | |
| List DeepInfra Models | Returns the current DeepInfra model catalog with pricing per model. Useful for model-selection workflows and for per-model cost calculations. |
Clone any recipe and customize it in one click. Every recipe is fully editable.
Tiny Command counts a run the moment a trigger fires. Filtering early means only matching events spend your usage budget.
Connect DeepInfra once and every workflow on your account can use its triggers and actions. You don't have to re-auth per workflow.
Every DeepInfra field shows up in the visual picker for downstream nodes. The raw payload is there for power users, optional for everyone else.
If we missed yours, ping support. We usually reply within an hour.
Same category as DeepInfra, ordered by how often teams pair them. Hover the carousel to pause.
Wire it to Slack, Notion, HubSpot, Stripe, or any of the other 438 apps in our catalog. Setup takes roughly two minutes. Free to try, no credit card.