Analytics Text-to-SQL Dataset

A community-driven, open dataset of real-world analytics use cases for training and benchmarking text-to-SQL systems.

Overview

Most text-to-SQL benchmarks use synthetic or academic data. Real-world analytics involves:

  • Complex joins across fact and dimension tables

  • Domain-specific terminology (slow-moving inventory, high-value customers, claims processing)

  • Nuanced logic (return rate benchmarking, cohort revenue, seasonality indexing)

  • Domain knowledge generic datasets don’t capture

This dataset closes that gap — contributions accepted across all domains.