Description
Evaluates transcription models on multi-lingual, multi-speaker audio with varying levels of background noise across multiple business domains such as software-development, finance, classifieds, food-delivery, and healthcare. The dataset consists of 150 unique audio samples, with each sample being augmented to generate a low and a high noise version.
Provider
Prosus
Language
English, Hindi, Portuguese, Polish, Afrikaans and Dutch
Evaluation
Accuracy is reported as 1 - WER (Word Error Rate).
Data Statistics
Number of Samples450
Collection PeriodAugust 2024
SyntheticYes
Language
The language of the conversation.
Noise
The level of background noise in the audio sample.
Domain
The business domain of the conversation.

Results based on 0 entries.

Last updated: Invalid Date

#
Model
Provider
Size
Accuracy
No results.

Rows per page

Page 1 of 0

Have a unique use-case you’d like to test?

We want to evaluate how LLMs perform on your specific, real world task. You might discover that a small, open-source model delivers the performance you need at a better cost than proprietary models. We can also add custom filters, enhancing your insights into LLM capabilities. Each time a new model is released, we'll provide you with updated performance results.

Leaderboard

An open-source model beating GPT-4 Turbo on our interactive leaderboard.

Don’t worry, we’ll never spam you.

Please, briefly describe your use case and motivation. We’ll get back to you with details on how we can add your benchmark.