Téléchargement text2sql data - text2sql data DONNÉE CODE Téléchargement

text2sql data

Autre code source

v. 4.0

Télécharger

text2sql-data

Ce référentiel contient des données et du code pour la construction et l'évaluation de systèmes qui mappent les phrases à SQL, développés dans le cadre de:

Amélioration de la méthodologie d'évaluation du texte à SQL, Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang et Dragomir Radev, ACL 2018

Pour une gamme de domaines, nous fournissons:

Phrases avec des variables annotées
Requêtes SQL
Un schéma de base de données
Une base de données

Ce sont des formes améliorées d'ensembles de données antérieurs et un nouvel ensemble de données que nous avons développé. Nous avons des fichiers séparés décrivant les ensembles de données, les systèmes et les outils.

Version	Description
4	Correctifs de données
3	Données de données et ajout de données de Spider et Wikisql
2	Données avec des correctifs pour les variables incorrectement définies dans les questions
1	Données utilisées dans le document ACL 2018

Citant ce travail

Si vous utilisez ces données dans votre travail, veuillez citer notre papier ACL et les sources originales appropriées et énumérez le numéro de version des données. Par exemple, dans votre article, vous pouvez écrire (en utilisant le bibtex ci-dessous):

 In this work, we use version 4 of the modified SQL datasets from citet{data-advising}, based on citet{data-academic,data-atis-original,data-geography-original,data-atis-geography-scholar,data-imdb-yelp,data-restaurants-logic,data-restaurants-original,data-restaurants,data-spider,data-wikisql}

Si vous n'utilisez qu'un seul ensemble de données, voici des exemples de commandes de citation:

Données	Citer
Académique	`citet{data-advising,data-academic}`
Conseiller	`citet{data-advising}`
ATI	`citet{data-advising,data-atis-original,data-atis-geography-scholar}`
Géographie	`citet{data-advising,data-geography-original,data-atis-geography-scholar}`
Restaurants	`citet{data-advising,data-restaurants-logic,data-restaurants-original,data-restaurants}`
Savant	`citet{data-advising,data-atis-geography-scholar}`
Araignée	`citet{data-advising,data-spider}`
Imdb	`citet{data-advising,data-imdb-yelp}`
Japper	`citet{data-advising,data-imdb-yelp}`
Wikisql	`citet{data-advising,data-wikisql}`

@InProceedings{data-sql-advising,
  dataset   = {Advising},
  author    = {Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev},
  title     = {Improving Text-to-SQL Evaluation Methodology},
  booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {July},
  year      = {2018},
  location  = {Melbourne, Victoria, Australia},
  pages     = {351--360},
  url       = {http://aclweb.org/anthology/P18-1033},
}

@InProceedings{data-sql-imdb-yelp,
  dataset   = {IMDB and Yelp},
  author    = {Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig},
  title     = {SQLizer: Query Synthesis from Natural Language},
  booktitle = {International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM},
  month     = {October},
  year      = {2017},
  pages     = {63:1--63:26},
  url       = {http://doi.org/10.1145/3133887},
}

@article{data-academic,
  dataset   = {Academic},
  author    = {Fei Li and H. V. Jagadish},
  title     = {Constructing an Interactive Natural Language Interface for Relational Databases},
  journal   = {Proceedings of the VLDB Endowment},
  volume    = {8},
  number    = {1},
  month     = {September},
  year      = {2014},
  pages     = {73--84},
  url       = {http://dx.doi.org/10.14778/2735461.2735468},
} 

@InProceedings{data-atis-geography-scholar,
  dataset   = {Scholar, and Updated ATIS and Geography},
  author    = {Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer},
  title     = {Learning a Neural Semantic Parser from User Feedback},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  year      = {2017},
  pages     = {963--973},
  location  = {Vancouver, Canada},
  url       = {http://www.aclweb.org/anthology/P17-1089},
}

@article{data-atis-original,
  dataset   = {ATIS, original},
  author    = {Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christine Pao, Alexander Rudnicky, and Elizabeth Shriber},
  title     = {{Expanding the scope of the ATIS task: The ATIS-3 corpus}},
  journal   = {Proceedings of the workshop on Human Language Technology},
  year      = {1994},
  pages     = {43--48},
  url       = {http://dl.acm.org/citation.cfm?id=1075823},
}

@inproceedings{data-geography-original
  dataset   = {Geography, original},
  author    = {John M. Zelle and Raymond J. Mooney},
  title     = {Learning to Parse Database Queries Using Inductive Logic Programming},
  booktitle = {Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2},
  year      = {1996},
  pages     = {1050--1055},
  location  = {Portland, Oregon},
  url       = {http://dl.acm.org/citation.cfm?id=1864519.1864543},
}

@inproceedings{data-restaurants-logic,
  author    = {Lappoon R. Tang and Raymond J. Mooney},
  title     = {Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing},
  booktitle = {2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora},
  year      = {2000},
  pages     = {133--141},
  location  = {Hong Kong, China},
  url       = {http://www.aclweb.org/anthology/W00-1317},
}

@inproceedings{data-restaurants-original,
 author    = {Ana-Maria Popescu, Oren Etzioni, and Henry Kautz},
 title     = {Towards a Theory of Natural Language Interfaces to Databases},
 booktitle = {Proceedings of the 8th International Conference on Intelligent User Interfaces},
 year      = {2003},
 location  = {Miami, Florida, USA},
 pages     = {149--157},
 url       = {http://doi.acm.org/10.1145/604045.604070},
}

@inproceedings{data-restaurants,
  author    = {Alessandra Giordani and Alessandro Moschitti},
  title     = {Automatic Generation and Reranking of SQL-derived Answers to NL Questions},
  booktitle = {Proceedings of the Second International Conference on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge},
  year      = {2012},
  location  = {Montpellier, France},
  pages     = {59--76},
  url       = {https://doi.org/10.1007/978-3-642-45260-4_5},
}

@InProceedings{data-spider,
  author    = {Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev},
  title     = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
  year      = {2018},
  location  = {Brussels, Belgium},
  pages     = {3911--3921},
  url       = {http://aclweb.org/anthology/D18-1425},
}

@article{data-wikisql,
  author    = {Victor Zhong, Caiming Xiong, and Richard Socher},
  title     = {Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning},
  year      = {2017},
  journal   = {CoRR},
  volume    = {abs/1709.00103},
}

Contributions

Nous mettons des efforts substantiels dans la correction des bogues dans les ensembles de données, mais aucun d'entre eux n'est parfait. Si vous trouvez un bogue, veuillez soumettre une demande de traction avec un correctif. Nous fusionnerons les correctifs dans une branche de développement et ne fusionnerons que toutes ces modifications dans la branche principale (à quel point cette page sera ajustée pour noter qu'il s'agit d'une nouvelle version). Cette approche est destinée à équilibrer la nécessité de comparaisons claires entre les systèmes, tout en améliorant les données.

Pour certaines idées de problèmes à résoudre, consultez notre liste de questions connues.

Remerciements

Ce matériel est basé en partie sur les travaux soutenus par IBM en vertu du contrat 4915012629. Toutes les opinions, conclusions, conclusions ou recommandations exprimées sont celles des auteurs et ne reflètent pas nécessairement les vues d'IBM.

Développer

Informations supplémentaires

Version v. 4.0
Type Autre code source
Date de mise à jour 2025-04-18
taille 31.02MB
Provenant de Github

Applications connexes

MMEarth data

2024-11-12
EMIT Data Resources

2024-11-09
data pump log analyzer

2024-11-06
Application d'ensemble de données cosmiques

2024-03-15
Exploration de données biologiques

2010-03-22
Récupération de données intelligente

2009-06-18

Recommandé pour vous

chat.petals.dev

Autre code source

1.0.0
GPT Prompt Templates

Autre code source

1.0.0
GPTyped

Autre code source

GPTyped 1.0.5
Google Dorks

Autre code source

1.0
shepherd

Autre code source

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Autre code source

v1.1.0-rc-3
Google Dorks

Autre code source

1.0
shepherd

Autre code source

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Autre code source

v1.1.0-rc-3

Actualités connexes Tout