Have a unique use-case you’d like to test?
We want to evaluate how LLMs perform on your specific, real world task. You might discover that a small, open-source model delivers the performance you need at a better cost than proprietary models. We can also add custom filters, enhancing your insights into LLM capabilities. Each time a new model is released, we'll provide you with updated performance results.

An open-source model beating GPT-4 Turbo on our interactive leaderboard.