Image Understanding

Description

Evaluates the ability of models to interpret and understand food images from delivery applications through two distinct tasks: detailed caption generation and image quality assessment. Models must produce accurate and detailed captions describing dishes, ingredients, and presentation without hallucinating information. Additionally, they must identify relevant image-quality issues from a standardized set of labels, such as text overlays, human presence, unappealing presentation, or other similar factors.

Provider

iFood and Glovo

Language

English

Evaluation

Auto-evaluation with GPT-4o over reference captions for measuring the quality of the generated captions. F1 score is used to evaluate model's capability to correctly label issues in the image.

Data Statistics

Number of Samples149

Collection PeriodMarch 2025

Results based on 0 entries.

Last updated: Invalid Date

#	Model	Provider	Size	Caption Score	F1 Score
No results.

Rows per page

Page 1 of 0