API reference / Speech to text

Speech to text

GRONINGEN · NL

Transcribe audio with speaker diarization. Identify who said what in interviews, meetings, and conversations.

Speech to text

Speaker diarization: This endpoint transcribes audio AND identifies who said what. For simple transcription without speaker labels, this is still the endpoint to use.

POST /v1/audio/diarize

Request body

Parameter Type Description
model string Transcription model. Use whisperx. Required
file file Audio file to transcribe. MP3, WAV, FLAC supported. Required
language string Language code (e.g., nl, en). Auto-detected if not specified.

Example request

The diarization endpoint is not part of the OpenAI SDK, so use a direct HTTP request.

import os
import requests

response = requests.post(
    "https://router.appelon.ai/v1/audio/diarize",
    headers={"Authorization": f"Bearer {os.environ['APPELON_API_KEY']}"},
    files={"file": open("interview.mp3", "rb")},
    data={"model": "whisperx", "language": "nl"}
)

result = response.json()
print(result["text"])

Response with speaker labels

WhisperX provides speaker diarization: it identifies different speakers in the audio.

{
  "text": "Welkom bij dit interview. Dank je wel voor de uitnodiging.",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Welkom bij dit interview.",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 2.8,
      "end": 5.1,
      "text": "Dank je wel voor de uitnodiging.",
      "speaker": "SPEAKER_01"
    }
  ]
}

Supported formats

  • MP3
  • WAV
  • FLAC
  • M4A
  • OGG

Language support

WhisperX supports 90+ languages including Dutch, English, German, French, Spanish, and more. Language is auto-detected, but specifying it improves accuracy.