Image Understanding
Description
Evaluates the ability of models to interpret and understand food images from delivery applications through two distinct tasks: detailed caption generation and image quality assessment. Models must produce accurate and detailed captions describing dishes, ingredients, and presentation without hallucinating information. Additionally, they must identify relevant image-quality issues from a standardized set of labels, such as text overlays, human presence, unappealing presentation, or other similar factors.Provider
iFood and GlovoLanguage
EnglishEvaluation
Auto-evaluation with GPT-4o over reference captions for measuring the quality of the generated captions. F1 score is used to evaluate model's capability to correctly label issues in the image.Data Statistics
Number of Samples149
Collection PeriodMarch 2025
Results based on 0 entries.
Last updated: Invalid Date
# | Model | Provider | Size | Caption Score | F1 Score |
---|---|---|---|---|---|
No results. |
Rows per page
Page 1 of 0