Repositori ini berisi data dan kode untuk membangun dan mengevaluasi sistem yang memetakan kalimat ke SQL, dikembangkan sebagai bagian dari:
Untuk berbagai domain, kami menyediakan:
Ini adalah bentuk yang ditingkatkan dari set data sebelumnya dan dataset baru yang kami kembangkan. Kami memiliki file terpisah yang menggambarkan dataset, sistem, dan alat.
| Versi | Keterangan |
|---|---|
| 4 | Perbaikan data |
| 3 | Perbaikan data dan penambahan data dari Spider dan WikISQL |
| 2 | Data dengan perbaikan untuk variabel yang salah didefinisikan dalam pertanyaan |
| 1 | Data yang digunakan dalam kertas ACL 2018 |
Jika Anda menggunakan data ini dalam pekerjaan Anda, silakan kutip kertas ACL kami dan sumber asli yang sesuai, dan daftarkan nomor versi data. Misalnya, dalam makalah Anda dapat Anda tulis (menggunakan Bibtex di bawah):
In this work, we use version 4 of the modified SQL datasets from citet{data-advising}, based on citet{data-academic,data-atis-original,data-geography-original,data-atis-geography-scholar,data-imdb-yelp,data-restaurants-logic,data-restaurants-original,data-restaurants,data-spider,data-wikisql}
Jika Anda hanya menggunakan satu dataset, berikut adalah contoh perintah kutipan:
| Data | Mengutip |
|---|---|
| Akademik | citet{data-advising,data-academic} |
| Menasihati | citet{data-advising} |
| Atis | citet{data-advising,data-atis-original,data-atis-geography-scholar} |
| Geografi | citet{data-advising,data-geography-original,data-atis-geography-scholar} |
| Restoran | citet{data-advising,data-restaurants-logic,data-restaurants-original,data-restaurants} |
| Sarjana | citet{data-advising,data-atis-geography-scholar} |
| Laba -laba | citet{data-advising,data-spider} |
| IMDB | citet{data-advising,data-imdb-yelp} |
| Menyalak | citet{data-advising,data-imdb-yelp} |
| Wikisql | citet{data-advising,data-wikisql} |
@InProceedings{data-sql-advising,
dataset = {Advising},
author = {Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev},
title = {Improving Text-to-SQL Evaluation Methodology},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
year = {2018},
location = {Melbourne, Victoria, Australia},
pages = {351--360},
url = {http://aclweb.org/anthology/P18-1033},
}
@InProceedings{data-sql-imdb-yelp,
dataset = {IMDB and Yelp},
author = {Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig},
title = {SQLizer: Query Synthesis from Natural Language},
booktitle = {International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM},
month = {October},
year = {2017},
pages = {63:1--63:26},
url = {http://doi.org/10.1145/3133887},
}
@article{data-academic,
dataset = {Academic},
author = {Fei Li and H. V. Jagadish},
title = {Constructing an Interactive Natural Language Interface for Relational Databases},
journal = {Proceedings of the VLDB Endowment},
volume = {8},
number = {1},
month = {September},
year = {2014},
pages = {73--84},
url = {http://dx.doi.org/10.14778/2735461.2735468},
}
@InProceedings{data-atis-geography-scholar,
dataset = {Scholar, and Updated ATIS and Geography},
author = {Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer},
title = {Learning a Neural Semantic Parser from User Feedback},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
year = {2017},
pages = {963--973},
location = {Vancouver, Canada},
url = {http://www.aclweb.org/anthology/P17-1089},
}
@article{data-atis-original,
dataset = {ATIS, original},
author = {Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christine Pao, Alexander Rudnicky, and Elizabeth Shriber},
title = {{Expanding the scope of the ATIS task: The ATIS-3 corpus}},
journal = {Proceedings of the workshop on Human Language Technology},
year = {1994},
pages = {43--48},
url = {http://dl.acm.org/citation.cfm?id=1075823},
}
@inproceedings{data-geography-original
dataset = {Geography, original},
author = {John M. Zelle and Raymond J. Mooney},
title = {Learning to Parse Database Queries Using Inductive Logic Programming},
booktitle = {Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2},
year = {1996},
pages = {1050--1055},
location = {Portland, Oregon},
url = {http://dl.acm.org/citation.cfm?id=1864519.1864543},
}
@inproceedings{data-restaurants-logic,
author = {Lappoon R. Tang and Raymond J. Mooney},
title = {Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing},
booktitle = {2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora},
year = {2000},
pages = {133--141},
location = {Hong Kong, China},
url = {http://www.aclweb.org/anthology/W00-1317},
}
@inproceedings{data-restaurants-original,
author = {Ana-Maria Popescu, Oren Etzioni, and Henry Kautz},
title = {Towards a Theory of Natural Language Interfaces to Databases},
booktitle = {Proceedings of the 8th International Conference on Intelligent User Interfaces},
year = {2003},
location = {Miami, Florida, USA},
pages = {149--157},
url = {http://doi.acm.org/10.1145/604045.604070},
}
@inproceedings{data-restaurants,
author = {Alessandra Giordani and Alessandro Moschitti},
title = {Automatic Generation and Reranking of SQL-derived Answers to NL Questions},
booktitle = {Proceedings of the Second International Conference on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge},
year = {2012},
location = {Montpellier, France},
pages = {59--76},
url = {https://doi.org/10.1007/978-3-642-45260-4_5},
}
@InProceedings{data-spider,
author = {Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev},
title = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
year = {2018},
location = {Brussels, Belgium},
pages = {3911--3921},
url = {http://aclweb.org/anthology/D18-1425},
}
@article{data-wikisql,
author = {Victor Zhong, Caiming Xiong, and Richard Socher},
title = {Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning},
year = {2017},
journal = {CoRR},
volume = {abs/1709.00103},
}Kami berupaya keras untuk memperbaiki bug dalam dataset, tetapi tidak ada yang sempurna. Jika Anda menemukan bug, silakan kirim permintaan tarik dengan perbaikan. Kami akan menggabungkan perbaikan ke dalam cabang pengembangan dan hanya jarang menggabungkan semua perubahan itu ke dalam cabang master (pada titik mana halaman ini akan disesuaikan untuk dicatat bahwa itu adalah rilis baru). Pendekatan ini dimaksudkan untuk menyeimbangkan kebutuhan untuk perbandingan yang jelas antara sistem, sementara juga meningkatkan data.
Untuk beberapa ide masalah yang harus diatasi, lihat daftar masalah kami yang diketahui.
Materi ini sebagian didasarkan pada pekerjaan yang didukung oleh IBM berdasarkan Kontrak 4915012629. Setiap pendapat, temuan, kesimpulan, atau rekomendasi yang diungkapkan adalah pendapat penulis dan tidak harus mencerminkan pandangan IBM.