We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-to-speech

text-generation

automatic-speech-recognition

embeddings

text-to-image

text-to-video

zero-shot-image-classification

multimodal

Category/all

Replaced

CompVis/

stable-diffusion-v1-4

text-to-image

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

fp16

$0.065 / Mtoken

Gryphe/

MythoMax-L2-13b

text-generation

fp8

Replaced

Gryphe/

MythoMax-L2-13b-turbo

text-generation

Faster version of Gryphe/MythoMax-L2-13b running on multiple H100 cards in fp8 precision. Up to 160 tps.

KoboldAI/LLaMA2-13B-Tiefighter cover image

fp16

Replaced

KoboldAI/

LLaMA2-13B-Tiefighter

text-generation

LLaMA2-13B-Tiefighter is a highly creative and versatile language model, fine-tuned for storytelling, adventure, and conversational dialogue. It combines the strengths of multiple models and datasets, including retro-rodeo and choose-your-own-adventure, to generate engaging and imaginative content. With its ability to improvise and adapt to different styles and formats, Tiefighter is perfect for writers, creators, and anyone looking to spark their imagination.

NousResearch/Hermes-3-Llama-3.1-405B cover image

fp8

128k

$0.70/$0.80 in/out Mtoken

NousResearch/

Hermes-3-Llama-3.1-405B

text-generation

Hermes 3 is a cutting-edge language model that offers advanced capabilities in roleplaying, reasoning, and conversation. It's a fine-tuned version of the Llama-3.1 405B foundation model, designed to align with user needs and provide powerful control. Key features include reliable function calling, structured output, generalist assistant capabilities, and improved code generation. Hermes 3 is competitive with Llama-3.1 Instruct models, with its own strengths and weaknesses.

NovaSky-AI/Sky-T1-32B-Preview cover image

fp16

32k

$0.12/$0.18 in/out Mtoken

NovaSky-AI/

Sky-T1-32B-Preview

text-generation

This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.

Phind/Phind-CodeLlama-34B-v2 cover image

fp16

Replaced

Phind/

Phind-CodeLlama-34B-v2

text-generation

Phind-CodeLlama-34B-v2 is an open-source language model that has been fine-tuned on 1.5B tokens of high-quality programming-related data and achieved a pass@1 rate of 73.8% on HumanEval. It is multi-lingual and proficient in Python, C/C++, TypeScript, Java, and more. It has been trained on a proprietary dataset of instruction-answer pairs instead of code completion examples. The model is instruction-tuned on the Alpaca/Vicuna format to be steerable and easy-to-use. It accepts the Alpaca/Vicuna instruction format and can generate one completion for each prompt.

bfloat16

31k

Replaced

Qwen/

QVQ-72B-Preview

text-generation

QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark

bfloat16

32k

Replaced

Qwen/

QwQ-32B-Preview

text-generation

QwQ is an experimental research model developed by the Qwen Team, designed to advance AI reasoning capabilities. This model embodies the spirit of philosophical inquiry, approaching problems with genuine wonder and doubt. QwQ demonstrates impressive analytical abilities, achieving scores of 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench. With its contemplative approach and exceptional performance on complex problems.

bfloat16

32k

Replaced

Qwen/

Qwen2-72B-Instruct

text-generation

The 72 billion parameter Qwen2 excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

bfloat16

32k

Replaced

Qwen/

Qwen2-7B-Instruct

text-generation

The 7 billion parameter Qwen2 excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

fp8

32k

$0.12/$0.39 in/out Mtoken

Qwen/

Qwen2.5-72B-Instruct

text-generation

Qwen2.5 is a model pretrained on a large-scale dataset of up to 18 trillion tokens, offering significant improvements in knowledge, coding, mathematics, and instruction following compared to its predecessor Qwen2. The model also features enhanced capabilities in generating long texts, understanding structured data, and generating structured outputs, while supporting multilingual capabilities for over 29 languages.

bfloat16

32k

$0.04/$0.10 in/out Mtoken

Qwen/

Qwen2.5-7B-Instruct

text-generation

The 7 billion parameter Qwen2.5 excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning

Qwen/Qwen2.5-Coder-32B-Instruct cover image

fp8

32k

$0.06/$0.15 in/out Mtoken

Qwen/

Qwen2.5-Coder-32B-Instruct

text-generation

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). It has significant improvements in code generation, code reasoning and code fixing. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

32k

Replaced

Qwen/

Qwen2.5-Coder-7B

text-generation

Qwen2.5-Coder-7B is a powerful code-specific large language model with 7.61 billion parameters. It's designed for code generation, reasoning, and fixing tasks. The model covers 92 programming languages and has been trained on 5.5 trillion tokens of data, including source code, text-code grounding, and synthetic data.

$0.002 / Mtoken

Qwen/

Qwen3-Embedding-0.6B

embeddings

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).

$0.005 / Mtoken

Qwen/

Qwen3-Embedding-4B

embeddings

$0.010 / Mtoken

Qwen/

Qwen3-Embedding-8B

embeddings

Latest Models

bigcode/

starcoder2-15b

openchat/

openchat_3.5

openai/

whisper-tiny

Phind/

Phind-CodeLlama-34B-v2

Gryphe/

MythoMax-L2-13b

Featured Models

microsoft/

phi-4-reasoning-plus

google/

gemma-3-27b-it

sesame/

csm-1b

mistralai/

Mistral-Small-3.1-24B-Instruct-2503

hexgrad/

Kokoro-82M

openai/

whisper-large-v3-turbo

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms