Chat completions
POST /v1/chat/completions
Request body
| Parameter | Type | Description |
|---|---|---|
model |
string | Model to use. Try qwen or gemma. Required |
messages |
array | Array of message objects with role and content. Required |
stream |
boolean | Stream partial responses as server-sent events. Default false. |
temperature |
number | Sampling temperature, 0-2. Lower = more deterministic. Default 1. |
max_tokens |
integer | Maximum tokens to generate. Model decides if not set. |
top_p |
number | Nucleus sampling threshold, 0-1. Default 1. |
Message roles
Each message in the array has a role that determines how it’s treated.
| Role | Description |
|---|---|
system |
Sets context for the conversation. Placed first in messages array. |
user |
Messages from the user. The model generates a response to this. |
assistant |
Previous model responses. Use for multi-turn conversations. |
Example request
import os
from openai import OpenAI
client = OpenAI(
base_url="https://router.appelon.ai/v1",
api_key=os.environ["APPELON_API_KEY"]
)
response = client.chat.completions.create(
model="qwen",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of the Netherlands?"}
]
)
print(response.choices[0].message.content)
# → "The capital of the Netherlands is Amsterdam."
Response
Returns a completion object with the generated message.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "Qwen/Qwen3.6-35B-A3B-FP8",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of the Netherlands is Amsterdam."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 12,
"total_tokens": 40
}
}
Streaming
Set stream: true to receive tokens as they’re generated.
stream = client.chat.completions.create(
model="qwen",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content, end="")