Description
Evaluates an LLM's ability to disambiguate user requests for generating SQL queries based on the given business rules and database schema. A question can either be answered using the schema, a combination of the schema and the business rules, or requires additional information to be answered.
Provider
iFood
Language
Portuguese
Evaluation
Weighted accuracy score to evaluate the LLM's ability to accurately classify user requests into one of the three categories: schema-based, schema-and-rule-based, or additional-information-required. Ground truth was curated using domain expert labels.
Data Statistics
Number of Samples245
Collection PeriodApril 2024 - May 2024

Results based on 0 entries.

Last updated: Invalid Date

#
Model
Provider
Size
Accuracy
No results.

Rows per page

Page 1 of 0

Examples

User Question (Mock)

"Show me the average sale amount for the electronics category."

Database Schema

Tables:

  1. sales_transactions
  • transaction_id [STRING]: Unique identifier for each transaction.
  • transaction_date [DATE]: The date when the transaction occurred.
  • customer_id [STRING]: Unique identifier for the customer who made the purchase.
  • product_id [STRING]: Unique identifier for the product sold.
  • sale_amount [DECIMAL]: The total amount of money generated from the transaction.
  1. products
  • product_id [STRING]: Unique identifier for each product.
  • product_name [STRING]: Name of the product.
  • product_category [STRING]: The category to which the product belongs.
  1. customers
  • customer_id [STRING]: Unique identifier for each customer.
  • customer_name [STRING]: The name of the customer.
  • customer_region [STRING]: The region where the customer is located.

Business Rules

These are business-specific rules that can be used to resolve ambiguities in the user question:

  1. Category Definition: The product_category field in the products table contains categories such as 'Electronics', 'Furniture', 'Clothing', etc.

  2. Category Synonyms: The business rules define 'Electronics' as including products with categories labeled 'Electronics', 'Gadgets', and 'Devices'.

  3. Sale Amount Calculation: Average sale amount is calculated as the sum of sale_amount values for the specified category divided by the number of transactions in that category.

Output

{
  "reasoning": "The user question asks for the average sale amount for the 'electronics' category. However, there is an ambiguity regarding the interpretation of 'electronics' since it could include products labeled under different but related categories such as 'Gadgets' and 'Devices'. According to the business rules, these categories are considered synonymous with 'Electronics'. Therefore, the ambiguity can be resolved by including all these categories in the calculation, making this a resolvable ambiguity.",
  "predictedLabel": "resolvable ambiguity"
}

User Question (Mock)

"The brand 'TechCo' needs a report on the revenue and customer satisfaction metrics for their premium products during the Black Friday sales period."

Database Schema

Tables:

  1. sales_transactions
  • transaction_id [STRING]: Unique identifier for each transaction.
  • transaction_date [DATE]: The date when the transaction occurred.
  • customer_id [STRING]: Unique identifier for the customer who made the purchase.
  • product_id [STRING]: Unique identifier for the product sold.
  • sale_amount [DECIMAL]: The total amount of money generated from the transaction.
  1. products
  • product_id [STRING]: Unique identifier for each product.
  • product_name [STRING]: Name of the product.
  • product_category [STRING]: The category to which the product belongs.
  • product_tier [STRING]: The tier of the product, such as 'Standard', 'Premium', etc.
  1. customers
  • customer_id [STRING]: Unique identifier for each customer.
  • customer_name [STRING]: The name of the customer.
  • customer_satisfaction [INTEGER]: A rating from 1 to 5 given by customers based on their satisfaction with the product.

Business Rules

These are business-specific rules that can be used to resolve ambiguities in the user question:

  1. Product Tier Definition: Products are categorized into tiers like 'Standard', 'Premium', etc.

  2. Revenue Calculation: Revenue is calculated as the sum of sale_amount values from transactions.

  3. Customer Satisfaction: Customer satisfaction is gauged by the customer_satisfaction field, where 5 indicates the highest satisfaction.

Output

{
  "reasoning": "The user question asks for a report on revenue and customer satisfaction metrics for 'TechCo' during the Black Friday sales period, specifically focusing on 'premium' products. However, there are multiple ambiguities: 'TechCo' could refer to different product lines or brands under the same company name, and the timeframe 'Black Friday' is not clearly specified in terms of exact dates. These ambiguities cannot be fully resolved using the available schema and business rules. Therefore, clarification from the user is required to accurately generate the report.",
  "predictedLabel": "clarification required"
}

User Question (Mock)

"Show all the customers who purchased the product with ID '123e4567-e89b-12d3-a456-426614174000'."

Database Schema

Tables:

  1. sales_transactions
  • transaction_id [STRING]: Unique identifier for each transaction.
  • transaction_date [DATE]: The date when the transaction occurred.
  • customer_id [STRING]: Unique identifier for the customer who made the purchase.
  • product_id [STRING]: Unique identifier for the product sold.
  • sale_amount [DECIMAL]: The total amount of money generated from the transaction.
  1. customers
  • customer_id [STRING]: Unique identifier for each customer.
  • customer_name [STRING]: The name of the customer.
  • customer_email [STRING]: The email address of the customer.
  • customer_phone [STRING]: The phone number of the customer.

Business Rules

These are business-specific rules that can be used to understand the user question:

  1. Product Purchase Tracking: Customers who have purchased a specific product can be identified by filtering the sales_transactions table based on the product_id.

  2. Customer Identification: The customers table links each customer_id to their respective personal information such as name, email, and phone number.

Output

{
  "reasoning": "The user request is to list all customers who purchased a specific product identified by its `product_id`. This query is clear and unambiguous because the schema provides direct access to the necessary tables (`sales_transactions` and `customers`) and fields required to identify the customers. No additional business rules or assumptions are needed to answer the query.",
  "predictedLabel": "clear query"
}

Have a unique use-case you’d like to test?

We want to evaluate how LLMs perform on your specific, real world task. You might discover that a small, open-source model delivers the performance you need at a better cost than proprietary models. We can also add custom filters, enhancing your insights into LLM capabilities. Each time a new model is released, we'll provide you with updated performance results.

Leaderboard

An open-source model beating GPT-4 Turbo on our interactive leaderboard.

Don’t worry, we’ll never spam you.

Please, briefly describe your use case and motivation. We’ll get back to you with details on how we can add your benchmark.