The Principal Dev – Masterclass for Tech Leads

The Principal Dev – Masterclass for Tech Leads28-29 May

Join

mistral.rs

Fast, flexible LLM inference.

| Documentation | Rust SDK | Python SDK | Discord |

GitHub stars

Latest

Why mistral.rs?

Quick Start

Install

Linux/macOS:

curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh

Windows (PowerShell):

irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex

Manual installation & other platforms

Run Your First Model

# Interactive chat
mistralrs run -m Qwen/Qwen3-4B

# One-shot prompt (no interactive session)
mistralrs run -m Qwen/Qwen3-4B -i "What is the capital of France?"

# One-shot with an image
mistralrs run -m google/gemma-4-E4B-it --image photo.jpg -i "Describe this image"

# Or start a server with web UI
mistralrs serve --ui -m google/gemma-4-E4B-it

Then visit http://localhost:1234/ui for the web chat interface.

The mistralrs CLI

The CLI is designed to be zero-config: just point it at a model and go.

# Auto-tune for your hardware and emit a config file
mistralrs tune -m Qwen/Qwen3-4B --emit-config config.toml

# Run using the generated config
mistralrs from-config -f config.toml

# Diagnose system issues (CUDA, Metal, HuggingFace connectivity)
mistralrs doctor

Full CLI documentation

Web Chat Demo
Web Chat UI Demo

What Makes It Fast

Performance

Quantization (full docs)

Flexibility

Agentic Features

Full feature documentation

Supported Models

Text Models
Multimodal Models
Speech Models
Image Generation Models
Embedding Models

Request a new model | Full compatibility tables

Python SDK

pip install mistralrs  # or mistralrs-cuda, mistralrs-metal, mistralrs-mkl, mistralrs-accelerate
from mistralrs import Runner, Which, ChatCompletionRequest

runner = Runner(
    which=Which.Plain(model_id="Qwen/Qwen3-4B"),
    in_situ_quant="4",
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="default",
        messages=[{"role": "user", "content": "Hello!"}],
        max_tokens=256,
    )
)
print(res.choices[0].message.content)

Python SDK | Installation | Examples | Cookbook

Rust SDK

cargo add mistralrs
use anyhow::Result;
use mistralrs::{IsqType, TextMessageRole, TextMessages, MultimodalModelBuilder};

#[tokio::main]
async fn main() -> Result<()> {
    let model = MultimodalModelBuilder::new("google/gemma-4-E4B-it")
        .with_isq(IsqType::Q4K)
        .with_logging()
        .build()
        .await?;

    let messages = TextMessages::new().add_message(
        TextMessageRole::User,
        "Hello!",
    );

    let response = model.send_chat_request(messages).await?;

    println!("{:?}", response.choices[0].message.content);

    Ok(())
}

API Docs | Crate | Examples

Docker

For quick containerized deployment:

docker pull ghcr.io/ericlbuehler/mistral.rs:latest
docker run --gpus all -p 1234:1234 ghcr.io/ericlbuehler/mistral.rs:latest \
  serve -m Qwen/Qwen3-4B

Docker images

For production use, we recommend installing the CLI directly for maximum flexibility.

Documentation

For complete documentation, see the Documentation.

Quick Links:

Contributing

Contributions welcome! Please open an issue to discuss new features or report bugs. If you want to add a new model, please contact us via an issue and we can coordinate.

Credits

This project would not be possible without the excellent work at Candle. Thank you to all contributors!

mistral.rs is not affiliated with Mistral AI.

Back to Top

Join libs.tech

...and unlock some superpowers

GitHub

We won't share your data with anyone else.