Models
GET /v1/models
Lists all available models. Returns model IDs and their capabilities.
Chat models
Text generation and conversation. Use with /v1/chat/completions.
qwen
Qwen 3.6 35B (MoE, 3B active). Fast interactive model for conversations, analysis, and text generation.
- ~85 tok/s
- 128K context
- A40 GPU
Recommended for most use cases.
gemma
Gemma 4 31B (Dense). Deep analysis model with strong reasoning. Best for complex tasks.
- ~85 tok/s
- 128K context
- Blackwell GPU
Embedding models
Convert text to vectors for search, similarity, and RAG. Use with /v1/embeddings.
bge-m3
BGE-M3 multilingual embeddings. State-of-the-art for retrieval, supports 100+ languages including Dutch.
- 1024 dimensions
- 8K tokens max
Image generation
Generate images from text. Use with /v1/images/generations.
flux-schnell
FLUX Schnell. Fast generation (~2s per image) for rapid iteration and prototyping.
- 1024×1024
- ~2s
flux-dev
FLUX Dev. Higher quality output with more detail and better prompt adherence.
- 1024×1024
- ~8s
Speech to text
Transcription with speaker diarization. Use with /v1/audio/diarize.
whisperx
WhisperX with speaker diarization. Transcribes audio and identifies who said what. Supports Dutch and 90+ other languages.
- MP3, WAV, FLAC
- Speaker labels
Model aliases
Use short aliases or full model names interchangeably.
| Alias | Model |
|---|---|
qwen |
Qwen/Qwen3.6-35B-A3B-FP8 |
qwen-fast |
Qwen/Qwen3.6-35B-A3B-FP8 |
gemma |
RedHatAI/gemma-4-31B-it-FP8-block |
gemma-4 |
RedHatAI/gemma-4-31B-it-FP8-block |
flux-schnell |
schnell |
flux-dev |
dev |
diarize |
whisperx |
Need a different model? We can deploy additional models on request. Contact us at support@appelon.ai