# Selecting the Best AI Model for Transcription

CCV AI Services provides [multiple AI models ](/ai-tools/services/transcribe/comparing-speech-to-text-models.md)for your transcription needs. Each of these AI models comes with different characteristics for different transcription needs. This page discusses which model works best for you based on the type of audio files that you have.

## TL;DR:

Below is the general rule-of-thumb:

### Select **Gemini for**:

* short audio/video files below approximately 30 minutes in duration
* audio/video files contain conversational content without much background noise or long segments with silence, as they are prone to causing the model to hallucinate
* better speaker diarization
* better speed
* accurate transcription text without accurate timestamps
* translation

{% hint style="warning" %}
Note

Although Gemini models technically support audio input of over 9 hours with its long context window, we do NOT recommend Gemini models for files longer than 1 hour.
{% endhint %}

### Select **OpenAI Whisper** or Cohere Transcribe model for

* long audio/video files over 30 minutes
* more accurate timestamps
* word-level timestamps
* private transcription without a 3rd party API
* captions/subtitles

### Select **Qwen3-ASR** model for

* verbatim transcription of disfluencies, filler words (such as "you know," "like," etc.)
* more accurate timestamps
* word-level timestamps
* private transcription without a 3rd party API
* captions/subtitles
* at least 1.5 times faster speed vs. Whisper
* better performance with audio with noisy backgrounds
* better performance with singing voices
* better performance with Chinese/Cantonese dialects


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ccv.brown.edu/ai-tools/services/transcribe/selecting-the-best-ai-model-for-transcription.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
