# Appelon API Documentation

> Base URL: https://router.appelon.ai/v1
> OpenAI-compatible. Use any OpenAI SDK with a custom base_url.

---


# Quickstart

> **OpenAI-compatible API.** If you already have an OpenAI client, change the base URL to `router.appelon.ai/v1` and you're done.

## Create an API key

Sign up at [appelon.ai/signup](/signup) and create an API key in your dashboard. Store it as an environment variable. Never commit it to version control.

```bash
export APPELON_API_KEY="sk-your-api-key"
```

## Make your first request

Call the chat completions endpoint. The example below uses Qwen 3.6 running on GPUs in Groningen.

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["APPELON_API_KEY"],
    base_url="https://router.appelon.ai/v1"
)

response = client.chat.completions.create(
    model="qwen",
    messages=[{"role": "user", "content": "Hallo!"}]
)

print(response.choices[0].message.content)
```
```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.APPELON_API_KEY,
  baseURL: 'https://router.appelon.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'qwen',
  messages: [{ role: 'user', content: 'Hallo!' }],
});

console.log(response.choices[0].message.content);
```

## Read the response

Responses follow the OpenAI completion schema. You'll find the model's output in `choices[0].message.content`.

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "Qwen/Qwen3.6-35B-A3B-FP8",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hallo! Hoe kan ik je helpen?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}
```

## Using with AI coding assistants

For full API documentation in a single file: [appelon.ai/llms.txt](/llms.txt)

This works with Claude Code, Cursor, Copilot, and other AI coding tools.


---


# Authentication

## API keys

Your API key authenticates all requests. Include it in the `Authorization` header as a Bearer token.

```
Authorization: Bearer sk-your-api-key
```

> **Keep your key secret.** Never commit API keys to version control or expose them in client-side code. Use environment variables instead.

## Using with SDKs

Appelon's API is OpenAI-compatible. Use the official OpenAI SDK with a custom base URL.

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://router.appelon.ai/v1"
)
```
```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://router.appelon.ai/v1',
});
```

## Environment variables

Store your key in an environment variable for security.

```bash
# Add to .bashrc, .zshrc, or .env
export APPELON_API_KEY="sk-your-api-key"
```

With the environment variable set, initialize the client without hardcoding the key:

```python
client = OpenAI(
    api_key=os.environ["APPELON_API_KEY"],
    base_url="https://router.appelon.ai/v1"
)
```

## Authentication errors

If authentication fails, you'll receive one of these responses:

| Code | Description |
|------|-------------|
| `401` | Missing, invalid, or revoked API key. Check the `Authorization` header. |


---


# Data residency

## Where your data is processed

All inference requests are processed in our Groningen datacenter. The request and response flow is entirely within the Netherlands.

| Component | Location |
|-----------|----------|
| API Gateway | Groningen, Netherlands |
| GPU Compute | Groningen, Netherlands (NVIDIA A40, Blackwell) |
| Model weights | Groningen, Netherlands |
| Usage logs | Groningen, Netherlands |

## What we store

We log usage metadata for billing and debugging. Your prompts and responses are not stored by default.

**Stored:**
- Timestamp, model used, token counts, account ID, latency
- Used for billing and service monitoring

**Not stored:**
- Prompt content, model responses, user messages
- Your data passes through and is not retained

## Compliance

With all processing in the Netherlands, Appelon simplifies compliance with European data protection requirements.

- **GDPR:** No data transfers outside the EU. No need for SCCs or adequacy decisions.
- **Dutch law:** Processing falls under Dutch and EU jurisdiction only.
- **No US Cloud Act exposure:** Infrastructure is not operated by US hyperscalers.

## For your DPO

Need documentation for your data protection assessment? We can provide:

- Data processing agreement (DPA)
- Technical and organizational measures (TOMs)
- Subprocessor list
- Infrastructure documentation

Contact us at [privacy@appelon.ai](mailto:privacy@appelon.ai) for compliance documentation.


---


# Chat completions

```
POST /v1/chat/completions
```

## Request body

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | string | Model to use. Try `qwen` or `gemma`. **Required** |
| `messages` | array | Array of message objects with `role` and `content`. **Required** |
| `stream` | boolean | Stream partial responses as server-sent events. Default `false`. |
| `temperature` | number | Sampling temperature, 0-2. Lower = more deterministic. Default `1`. |
| `max_tokens` | integer | Maximum tokens to generate. Model decides if not set. |
| `top_p` | number | Nucleus sampling threshold, 0-1. Default `1`. |

## Message roles

Each message in the array has a `role` that determines how it's treated.

| Role | Description |
|------|-------------|
| `system` | Sets context for the conversation. Placed first in messages array. |
| `user` | Messages from the user. The model generates a response to this. |
| `assistant` | Previous model responses. Use for multi-turn conversations. |

## Example request

```python
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.appelon.ai/v1",
    api_key=os.environ["APPELON_API_KEY"]
)

response = client.chat.completions.create(
    model="qwen",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of the Netherlands?"}
    ]
)

print(response.choices[0].message.content)
# → "The capital of the Netherlands is Amsterdam."
```
```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://router.appelon.ai/v1',
  apiKey: process.env.APPELON_API_KEY,
});

const response = await client.chat.completions.create({
  model: 'qwen',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of the Netherlands?' },
  ],
});

console.log(response.choices[0].message.content);
// → "The capital of the Netherlands is Amsterdam."
```

## Response

Returns a completion object with the generated message.

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "Qwen/Qwen3.6-35B-A3B-FP8",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of the Netherlands is Amsterdam."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 12,
    "total_tokens": 40
  }
}
```

## Streaming

Set `stream: true` to receive tokens as they're generated.

```python
stream = client.chat.completions.create(
    model="qwen",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
```
```javascript
const stream = await client.chat.completions.create({
  model: 'qwen',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
```


---


# Embeddings

```
POST /v1/embeddings
```

## Request body

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | string | Embedding model. Use `bge-m3`. **Required** |
| `input` | string or array | Text to embed. String or array of strings. **Required** |
| `encoding_format` | string | Output format: `float` (default) or `base64`. |

## Example request

```python
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.appelon.ai/v1",
    api_key=os.environ["APPELON_API_KEY"]
)

response = client.embeddings.create(
    model="bge-m3",
    input="This is a sample text to embed."
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
# → Dimensions: 1024
```

## Response

Returns an array of embedding objects, one for each input text.

```json
{
  "object": "list",
  "model": "bge-m3",
  "data": [{
    "object": "embedding",
    "index": 0,
    "embedding": [-0.023, 0.017, 0.042, ...]
  }],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}
```

## Batch embeddings

Embed multiple texts in a single request for better efficiency.

```python
response = client.embeddings.create(
    model="bge-m3",
    input=[
        "First document to embed",
        "Second document to embed",
        "Third document to embed"
    ]
)

# Returns 3 embeddings in response.data
```

## Common use cases

- **Semantic search:** Embed documents and queries, then find similar documents using cosine similarity.
- **RAG:** Retrieve relevant context before generating responses with chat models.
- **Clustering & classification:** Group similar documents or classify content based on embedding similarity.

> **BGE-M3** produces 1024-dimensional vectors optimized for multilingual retrieval. It supports 100+ languages including Dutch, English, German, and French.


---


# Image generation

```
POST /v1/images/generations
```

## Request body

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | string | Image model. Use `schnell` (fast) or `dev` (quality). **Required** |
| `prompt` | string | Text description of the image to generate. **Required** |
| `size` | string | Image dimensions. Default `1024x1024`. |
| `n` | integer | Number of images to generate. Default `1`. |
| `response_format` | string | `url` (default) or `b64_json` for base64-encoded image data. |

## Example request

```python
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.appelon.ai/v1",
    api_key=os.environ["APPELON_API_KEY"]
)

response = client.images.generate(
    model="schnell",
    prompt="A serene Dutch landscape with windmills at sunset",
    size="1024x1024"
)

image_url = response.data[0].url
print(image_url)
```
```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://router.appelon.ai/v1',
  apiKey: process.env.APPELON_API_KEY,
});

const response = await client.images.generate({
  model: 'schnell',
  prompt: 'A serene Dutch landscape with windmills at sunset',
  size: '1024x1024',
});

console.log(response.data[0].url);
```

## Response

Returns an array of image objects with URLs or base64 data.

```json
{
  "created": 1234567890,
  "data": [{
    "url": "https://..."
  }]
}
```

With `response_format: "b64_json"`:

```json
{
  "created": 1234567890,
  "data": [{
    "b64_json": "iVBORw0KGgoAAAANSUhEUgAA..."
  }]
}
```

## Available models

| Model | Speed | Best for |
|-------|-------|----------|
| `schnell` | ~2s | Rapid iteration, prototyping |
| `dev` | ~8s | Higher quality, better prompt adherence |

## Supported sizes

FLUX supports flexible aspect ratios. Common sizes:

| Size | Aspect ratio |
|------|--------------|
| `1024x1024` | 1:1 (square) |
| `1024x768` | 4:3 (landscape) |
| `768x1024` | 3:4 (portrait) |
| `1280x720` | 16:9 (widescreen) |
| `720x1280` | 9:16 (mobile) |


---


# Speech to text

> **Speaker diarization:** This endpoint transcribes audio AND identifies who said what. For simple transcription without speaker labels, this is still the endpoint to use.

```
POST /v1/audio/diarize
```

## Request body

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | string | Transcription model. Use `whisperx`. **Required** |
| `file` | file | Audio file to transcribe. MP3, WAV, FLAC supported. **Required** |
| `language` | string | Language code (e.g., `nl`, `en`). Auto-detected if not specified. |

## Example request

The diarization endpoint is not part of the OpenAI SDK, so use a direct HTTP request.

```python
import os
import requests

response = requests.post(
    "https://router.appelon.ai/v1/audio/diarize",
    headers={"Authorization": f"Bearer {os.environ['APPELON_API_KEY']}"},
    files={"file": open("interview.mp3", "rb")},
    data={"model": "whisperx", "language": "nl"}
)

result = response.json()
print(result["text"])
```
```bash
curl -X POST "https://router.appelon.ai/v1/audio/diarize" \
  -H "Authorization: Bearer $APPELON_API_KEY" \
  -F "file=@interview.mp3" \
  -F "model=whisperx" \
  -F "language=nl"
```

## Response with speaker labels

WhisperX provides speaker diarization: it identifies different speakers in the audio.

```json
{
  "text": "Welkom bij dit interview. Dank je wel voor de uitnodiging.",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Welkom bij dit interview.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 2.8,
      "end": 5.1,
      "text": "Dank je wel voor de uitnodiging.",
      "speaker": "SPEAKER_01"
    }
  ]
}
```

## Supported formats

- MP3
- WAV
- FLAC
- M4A
- OGG

## Language support

WhisperX supports 90+ languages including Dutch, English, German, French, Spanish, and more. Language is auto-detected, but specifying it improves accuracy.


---


# Models

```
GET /v1/models
```

Lists all available models. Returns model IDs and their capabilities.

## Chat models

Text generation and conversation. Use with `/v1/chat/completions`.

### qwen

Qwen 3.6 35B (MoE, 3B active). Fast interactive model for conversations, analysis, and text generation.

- ~85 tok/s
- 128K context
- A40 GPU

**Recommended** for most use cases.

### gemma

Gemma 4 31B (Dense). Deep analysis model with strong reasoning. Best for complex tasks.

- ~85 tok/s
- 128K context
- Blackwell GPU

## Embedding models

Convert text to vectors for search, similarity, and RAG. Use with `/v1/embeddings`.

### bge-m3

BGE-M3 multilingual embeddings. State-of-the-art for retrieval, supports 100+ languages including Dutch.

- 1024 dimensions
- 8K tokens max

## Image generation

Generate images from text. Use with `/v1/images/generations`.

### flux-schnell

FLUX Schnell. Fast generation (~2s per image) for rapid iteration and prototyping.

- 1024×1024
- ~2s

### flux-dev

FLUX Dev. Higher quality output with more detail and better prompt adherence.

- 1024×1024
- ~8s

## Speech to text

Transcription with speaker diarization. Use with `/v1/audio/diarize`.

### whisperx

WhisperX with speaker diarization. Transcribes audio and identifies who said what. Supports Dutch and 90+ other languages.

- MP3, WAV, FLAC
- Speaker labels

## Model aliases

Use short aliases or full model names interchangeably.

| Alias | Model |
|-------|-------|
| `qwen` | Qwen/Qwen3.6-35B-A3B-FP8 |
| `qwen-fast` | Qwen/Qwen3.6-35B-A3B-FP8 |
| `gemma` | RedHatAI/gemma-4-31B-it-FP8-block |
| `gemma-4` | RedHatAI/gemma-4-31B-it-FP8-block |
| `flux-schnell` | schnell |
| `flux-dev` | dev |
| `diarize` | whisperx |

> **Need a different model?** We can deploy additional models on request. Contact us at [support@appelon.ai](mailto:support@appelon.ai)


---


# Migrating from OpenAI

> Appelon's API is fully OpenAI-compatible. Your existing code, SDKs, and tools work without modification. Just change the base URL.

## The one-line change

Add `base_url` to your OpenAI client initialization.

### Python

```python
# Before (OpenAI)
client = OpenAI(
    api_key="sk-openai-key"
)

# After (Appelon)
client = OpenAI(
    api_key="sk-appelon-key",
    base_url="https://router.appelon.ai/v1"  # ← add this
)
```

### Node.js

```javascript
// Before (OpenAI)
const client = new OpenAI({
  apiKey: 'sk-openai-key',
});

// After (Appelon)
const client = new OpenAI({
  apiKey: 'sk-appelon-key',
  baseURL: 'https://router.appelon.ai/v1',  // ← add this
});
```

## Model mapping

Update model names to use Appelon's models.

| OpenAI model | Appelon model | Use for |
|--------------|---------------|---------|
| gpt-4o | `qwen` | General chat, fast |
| gpt-4-turbo | `gemma` | Complex reasoning |
| text-embedding-3-small | `bge-m3` | Embeddings, search |
| dall-e-3 | `flux-schnell` | Image generation |
| whisper-1 | `whisperx` | Speech to text |

## Environment variables

Set environment variables to avoid code changes entirely.

```bash
# Add to .env or shell profile
export OPENAI_API_KEY="sk-appelon-key"
export OPENAI_BASE_URL="https://router.appelon.ai/v1"
```

With these variables set, code that uses `OpenAI()` without arguments will automatically use Appelon.

## What works

- **Chat completions:** streaming, system prompts, multi-turn
- **Embeddings:** single and batch
- **Image generation:** FLUX models via /v1/images/generations
- **Transcription:** WhisperX with speaker diarization
- **Models endpoint:** /v1/models lists available models

## Not yet supported

These OpenAI features are not available yet:

- Function calling / tool use
- Vision (image input)
- Assistants API
- Fine-tuning

## Using with LangChain

LangChain's OpenAI integration works out of the box.

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="qwen",
    openai_api_key="sk-appelon-key",
    openai_api_base="https://router.appelon.ai/v1"
)
```

> **Need help migrating?** Contact us at [support@appelon.ai](mailto:support@appelon.ai) and we'll help you switch.


---