Transcribe audio with speaker diarization. Identify who said what in interviews, meetings, and conversations.

Speech to text

Speaker diarization: This endpoint transcribes audio AND identifies who said what. For simple transcription without speaker labels, this is still the endpoint to use.

POST /v1/audio/diarize

Request body

Parameter	Type	Description
`model`	string	Transcription model. Use `whisperx`. Required
`file`	file	Audio file to transcribe. MP3, WAV, FLAC supported. Required
`language`	string	Language code (e.g., `nl`, `en`). Auto-detected if not specified.

Example request

The diarization endpoint is not part of the OpenAI SDK, so use a direct HTTP request.

import os
import requests

response = requests.post(
    "https://router.appelon.ai/v1/audio/diarize",
    headers={"Authorization": f"Bearer {os.environ['APPELON_API_KEY']}"},
    files={"file": open("interview.mp3", "rb")},
    data={"model": "whisperx", "language": "nl"}
)

result = response.json()
print(result["text"])

curl -X POST "https://router.appelon.ai/v1/audio/diarize" \
  -H "Authorization: Bearer $APPELON_API_KEY" \
  -F "file=@interview.mp3" \
  -F "model=whisperx" \
  -F "language=nl"

Response with speaker labels

WhisperX provides speaker diarization: it identifies different speakers in the audio.

{
  "text": "Welkom bij dit interview. Dank je wel voor de uitnodiging.",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Welkom bij dit interview.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 2.8,
      "end": 5.1,
      "text": "Dank je wel voor de uitnodiging.",
      "speaker": "SPEAKER_01"
    }
  ]
}

Supported formats

MP3
WAV
FLAC
M4A
OGG

Language support

WhisperX supports 90+ languages including Dutch, English, German, French, Spanish, and more. Language is auto-detected, but specifying it improves accuracy.