The challenge is over. Results and data to download are published online.
If you plan to participate in the WebNLG challenge, here is how it goes. All requests should be sent to firstname.lastname@example.org
- 1 July - 22 August 2017: Test data submission period
- Fill in the form and retrieve data
- Submit test data outputs at the latest 48 hours after download and no later than August 22nd.
- 22 August 2017: Final deadline for submission of test data outputs
- 22 August - 4 September 2017: Evaluation period
- 5 September 2017: WebNLG meeting at INLG 2017 and presentation of the results of the automatic evaluation
- October 2017: Results of Human Evaluation published online
The test data will consist of around 1700 meaning representations (sets of DBPedia triples) equally distributed in terms of size (1 to 7 triples) and divided into two halves. The first half will contain inputs from DBpedia categories that have been seen in the training data (Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam, City, and WrittenWork), the second half will contain input extracted for entities belonging to 5 unseen categories.
The results must be submitted (email to email@example.com) to the organisers 48 hours after the organisers have sent the data. To allow for a fair comparison, late submissions will be rejected.
In addition to system outputs, the participants are requested to send by email (firstname.lastname@example.org) to the organisers a 2 page description of their system. This description will be made available on the WebNLG challenge portal.
Test data will be in the same format as training data (see documentation), but without
<lex>sections. Each set of DBpedia triples has an ID.
The example of test data is here.
Your submission file must be in plain text, lowercased and tokenised. Multiple verbalisations per set of DBpedia triples are not allowed.
The example of submission file is here.
Each line corresponds to a verbalisation of a DBpedia triple set. Line 1 must represent the verbalisation of the DBpedia triple set with the ID=1, line 2 — the DBpedia triple set with the ID=2, etc.
Evaluation will proceed in two steps.
First, the results of automatic metrics (BLEU, TER, METEOR) will be provided. We will provide global and detailed results (per DBPedia category, per input size, per Category and Input Size, etc.). These results will be presented at the INLG conference in Santiago de Compostelle, Spain on September 5th.
Second, the results of a human evaluation will be provided. The human evaluation will seek to assess such criteria as fluency, grammaticality and appropriateness (does the text correctly verbalise the input?)