Date : 11 May 2017
Speaker: Claire Gardent, CNRS/LORIA, Nancy
Place: Google, Zuirch
Title: Creating Benchmarks for Natural Language Generation Micro-planners The WebNLG Challenge
Joint work with Anastasia Shimorina (CNRS/LORIA, Nancy), Shashi Narayan (School of Informatics, University of Edinburgh) and Laura Perez-Beltrachini (School of Informatics, University of Edinburgh)
In Natural Language Generation, micro-planning focuses on modeling the complex interactions arising between lexicalisation, sentence segmentation, surface realisation, aggregation and referringexpression generation.
In my talk, I will introduce a novel framework for semi-automatically creating data-to-text corpora from RDF datasets which can be used (i) to train RDF verbalisers and (ii) to benchmark micro-planners. This framework combines a content selection method designed to automatically select from existing RDF knowledge bases data units that can serve as input for generation with a crowdsourcing methodology for associating these data units with high quality texts verbalising their content.
I will present a dataset created by applying this framework to DBpedia data, compare it with existing datasets and argue that, because our framework pairs data of varying size and shape with texts ranging from simple clauses to short texts, a dataset created using this framework provides a challenging benchmark for microplanning.
To encourage researchers to take up this challenge, we made available a dataset of 21,855 data/text pairs created using this framework in the context of the WebNLG shared task.