Getting Started
===============
.. note::
📊 Open Dataset · SQL · Multi-Domain · Community Contributed
Help build the largest open Text-to-SQL dataset for real-world business analytics.
Every query you submit trains better AI for real business problems — across any domain.
----
Who Should Contribute
---------------------
- Data Analysts writing SQL against business data warehouses
- BI Engineers building dashboards and reports
- Data Scientists working on NLP and text-to-SQL research
- Domain Experts (Healthcare, Finance, SaaS, Manufacturing, Supply Chain)
- Anyone who has turned a business question into a SQL query
----
Supported Domains
-----------------
Submissions accepted across all domains:
.. list-table::
:header-rows: 1
:widths: 25 75
* - Domain
- Example Use Cases
* - **Retail**
- Sales revenue, inventory turnover, return rates, customer segmentation
* - **Healthcare**
- Readmission rates, claims denial, patient outcomes, utilization
* - **HighTech (SaaS)**
- Churn rate, feature adoption, ARR, funnel conversion, DAU/MAU
* - **Finance**
- Portfolio performance, risk exposure, transaction anomalies, alpha
* - **Manufacturing**
- OEE, downtime analysis, yield rate, defect tracking
* - **Supply Chain**
- Supplier lead time, SLA breach rate, stock-out frequency
* - **Other**
- Any domain with structured SQL data and real business questions
----
Supported Databases
-------------------
Submissions accepted for all major SQL engines:
``BigQuery`` · ``Snowflake`` · ``Redshift`` · ``PostgreSQL`` · ``MySQL`` · ``Oracle`` · ``Azure Synapse`` · ``Other``
Specify your ``db_type`` accurately — dialect differences (e.g. ``DATE_DIFF`` vs ``DATEDIFF``)
are expected and valuable for the dataset.
----
How to Submit *(~5 minutes per entry)*
---------------------------------------
All contributions go through the UI Form. The dataset repository is managed
privately by maintainers — contributors submit via the form and maintainers
handle ingestion.
Step 1 — Open the Form
~~~~~~~~~~~~~~~~~~~~~~~
Open the `Query Entry Form `_
in your browser. No login or account needed.
Step 2 — Select Your Domain
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Choose your domain from the dropdown: Retail, Healthcare, HighTech (SaaS),
Finance, Manufacturing, Supply Chain, or Other.
Step 3 — Fill Required Fields
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Complete all required sections. The **Live JSON panel** on the right
updates in real time as you type. See :doc:`data_spec` for full field reference.
Required sections:
1. **Meta** — difficulty, db_type, domain
2. **Business Question** — how a real user would ask it
3. **Business Context** — who needs this and why
4. **Metrics & Aggregation** — KPI names + formulas
5. **Schema Tables** — fact and dimension tables used
6. **Data Model Layers** — hierarchies, aggregations, snapshots
Optional but strongly recommended:
7. **Chain of Thought** — step-by-step reasoning
8. **SQL Answer** — actual query (significantly improves quality)
Step 4 — Copy and Submit
~~~~~~~~~~~~~~~~~~~~~~~~~
Once all fields are complete, click the **Submit** button in the form.
Your entry will be sent directly to the maintainers for review.
.. tip::
Best submissions have clear business context, realistic KPIs,
and SQL that actually runs. See :doc:`examples` before writing your first entry.
----
What Makes a Good Submission
-----------------------------
**Strong submissions have:**
- A business question written the way a non-technical stakeholder would ask it
- Context that explains *who* needs this data and *what decision* it drives
- KPIs with clear, plain-English aggregation formulas (avoid SQL jargon in KPI names)
- SQL that runs cleanly against the standard schema or your specified db_type
- Chain of thought that walks through the reasoning step by step
**Weak submissions often have:**
- Vague instructions like "get sales data" with no business context
- KPI names that are just SQL expressions (``SUM(net_sales)``) instead of business terms
- Missing or mismatched metric/formula pairs
- SQL with syntax errors or non-standard table names without explanation
----
Difficulty Guide
----------------
Use this as a reference when selecting difficulty:
.. list-table::
:header-rows: 1
:widths: 15 25 60
* - Level
- Typical Pattern
- Examples
* - **Easy**
- Single table, simple filter + aggregate
- Total sales last month, top 10 products by revenue
* - **Medium**
- 2–3 table joins, date windows, basic window functions
- YoY revenue comparison, customer segment breakdown, return rate
* - **Hard**
- Multi-CTE, advanced window functions, complex conditions
- Seasonality index, OEE score, churn funnel, portfolio alpha
* - **Expert**
- Recursive CTEs, nested window functions, multi-step derivations
- Cohort retention curves, graph traversal, multi-period attribution
----
Examples
--------
Click any example below to jump to the full walkthrough:
**Retail**
- :ref:`example-category-revenue-yoy`
- :ref:`example-slow-moving-inventory`
**Healthcare**
- :ref:`example-hospital-readmission-rate`
- :ref:`example-claims-denial-rate`
**HighTech (SaaS)**
- :ref:`example-saas-churn-rate`
- :ref:`example-saas-feature-adoption`
**Finance**
- :ref:`example-finance-portfolio-performance`
**Manufacturing**
- :ref:`example-manufacturing-oee`
**Supply Chain**
- :ref:`example-supplychain-supplier-lead-time`
----
FAQ
---
**Do I need to know SQL to contribute?**
SQL is optional but strongly recommended. Entries with SQL are higher quality
and more useful for model training. See :doc:`examples` for reference before writing.
**Can I submit from non-retail domains?**
Yes — all domains are welcome. Use equivalent fact/dimension table naming conventions
(e.g. ``fact_claims``, ``dim_patient``) following the star schema pattern in :doc:`schema_reference`.
**How many submissions can I make?**
No limit. Each unique business question counts as one entry. Bulk submissions
with diverse domains and difficulty levels are especially valued.
**What if my SQL has dialect-specific syntax?**
Specify your ``db_type`` correctly (e.g. BigQuery, Snowflake). Dialect-specific
functions like ``DATE_DIFF``, ``DATEADD``, ``FORMAT_DATE`` are expected and kept as-is.
**Can I submit without the SQL?**
Yes — SQL is optional. Entries without SQL are still accepted if all other
required fields are complete and high quality.
**What if my schema differs from the standard one?**
Note the variation in your submission. Non-standard tables are accepted
as long as they follow fact/dimension naming conventions.
**How long does review take?**
Target turnaround is 7 days. Complex or ambiguous entries may take longer.
You'll receive feedback directly in your Discussion thread.
----
.. seealso::
:doc:`data_spec` · :doc:`schema_reference` · :doc:`examples` · `Submit Form `_