Diarize
Connect to Diarize to transcribe and diarize audio and video content from YouTube, X, Instagram, and TikTok. Submit transcription jobs and retrieve results in JSON, TXT, SRT, or VTT format.
Supports authentication: Bearer Token
Tool list
Section titled “Tool list”diarize_create_transcription_job
Section titled “diarize_create_transcription_job”Submit a new transcription and diarization job for an audio or video URL (YouTube, X, Instagram, TikTok). Returns a job ID that can be used to check status and download results.
| Name | Type | Required | Description |
|---|---|---|---|
language | string | No | Language code for transcription (e.g. ‘en’, ‘es’, ‘fr’). Defaults to auto-detection if not provided. |
num_speakers | integer | No | Expected number of speakers in the audio. Helps improve diarization accuracy. |
schema_version | string | No | Optional schema version to use for tool execution |
tool_version | string | No | Optional tool version to use for execution |
url | string | Yes | The URL of the audio or video content to transcribe (e.g. YouTube, X, Instagram, TikTok link) |
diarize_download_transcript
Section titled “diarize_download_transcript”Download the transcript output for a completed transcription job in JSON, TXT, SRT, or VTT format, including speaker diarization, segments, and word-level timestamps.
| Name | Type | Required | Description |
|---|---|---|---|
format | string | No | Output format for the transcript. Supported formats: ‘json’, ‘txt’, ‘srt’, ‘vtt’. |
job_id | string | Yes | The unique ID of the completed transcription job |
schema_version | string | No | Optional schema version to use for tool execution |
tool_version | string | No | Optional tool version to use for execution |
diarize_get_job_status
Section titled “diarize_get_job_status”Retrieve the current status of a transcription job by its job ID. Returns job state (pending, processing, completed, failed), metadata, and an estimatedTime field (in seconds) indicating how long processing is expected to take. Use estimatedTime to determine polling frequency and max wait duration — for example, a 49-minute episode may have an estimatedTime of ~891s (~15 mins), so the agent should wait at least that long before giving up.
| Name | Type | Required | Description |
|---|---|---|---|
job_id | string | Yes | The unique ID of the transcription job to check |
schema_version | string | No | Optional schema version to use for tool execution |
tool_version | string | No | Optional tool version to use for execution |