2016/12/01, Room B200.
Claire, Emilie, Yannick
Goal: exploring new directions in grammar engineering and use.
Discussion about issues araising with hand-crafted (meta)grammar, namely:
- high development cost (time consuming task)
- correctness issues (ensure quality while extending the resource)
- coverage issues (especially over-generation)
Discussion about grammar extraction:
- seminal work of English: Automated Extraction of TAGs from the Penn Treebank by John Chen and Vijay-Shanker, IWPT 2000
Directions worth exploring:
- Experiment with TAG grammar extraction for French
- Experiment with grammar extraction for various target formalisms
- Enhance state-of-the-art extraction (cf above paper) by using information coming from dependency structures
- Enhance TAG parsing by coupling grammar extraction with supertagging (e.g. computing probabilities)
A prerequesite for these directions is to get/create a French (constituency and dependency) treebank. Araising questions:
- availability (open source corpus, see e.g. Gutenberg Project)
- grammaticality and complexity (automatic adequacy computation)
- get parse structures (run the Stanford Parser on selected texts)
- French corpus frWaC from the WaCKy project frWaCK cannot be directly download, it is available via web query interfaces or via download on request.
Next meeting: 2016/12/08 - 16h Room A217