The WebNLG Baseline System

For the WebNLG challenge, we provide a baseline system which can serve as a starting point for your experiments.

Scripts to reproduce our experiments are available on GitLab.

Preparing data

Linearisation, tokenisation, delexicalisation

  • Unpack the archive with the WebNLG dataset into a data-directory folder.
  • Run a preprocessing script.

    python3 webnlg_baseline_input.py -i <data-directory>

The script extracts tripleset-lexicalisation pairs, linearises triples, performs tokenisation and delexicalisation using the exact match, and writes source and target files.

After the preprocessing, an original pair "tripleset-lexicalisation" [1] is modified into a pair of a source and target sequence [2].

Original [1]

<modifiedtripleset>
    <mtriple>Indonesia | leaderName | Jusuf_Kalla</mtriple>
    <mtriple>Bakso | region | Indonesia</mtriple>
    <mtriple>Bakso | ingredient | Noodle</mtriple>
    <mtriple>Bakso | country | Indonesia</mtriple>
</modifiedtripleset>
<lex>
Bakso is a food containing noodles;it is found in Indonesia where Jusuf Kalla is the leader.
</lex>

Modified [2]

source files *.triple:

COUNTRY leaderName LEADERNAME FOOD region COUNTRY FOOD ingredient INGREDIENT FOOD country COUNTRY

target files *.lex:

FOOD is a food containing noodles ; it is found in COUNTRY where LEADERNAME is the leader .

The script writes training and validation files which are used as input to neural generation, as well as reference files for evaluation.

Training a model and generating verbalisations

A simple sequence-to-sequence model with the attention mechanism was trained using the OpenNMT toolkit using the default parameters for training and translating.

  1. Install OpenNMT.

  2. Navigate to the OpenNMT directory.

  3. Process data files and convert them to the OpenNMT format.

    th preprocess.lua -train_src <data-directory>/train-webnlg-all-delex.triple -train_tgt <data-directory>/train-webnlg-all-delex.lex -valid_src <data-directory>/dev-webnlg-all-delex.triple -valid_tgt <data-directory>/dev-webnlg-all-delex.lex -src_seq_length 70 -tgt_seq_length 70 -save_data baseline

    baseline-train.t7 file will be generated, which is used in the training phase

  4. Train the model.

    th train.lua -data baseline-train.t7 -save_model baseline

    After training for 13 epochs, the script outputs the model file baseline_epoch13_*.t7. Training takes several hours on a GPU.

  5. Translating.

    th translate.lua -model baseline_epoch13_*.t7 -src <data-directory>/dev-webnlg-all-delex.triple -output baseline_predictions.txt

    The script generates the file baseline_predictions.txt.

Relexicalisation

  • Relexicalise data.

    python3 webnlg_relexicalise.py -i <data-directory> -f <OpenNMT-directory>/baseline_predictions.txt

    The script generates the file relexicalised_predictions.txt with the initial RDF subjects and objects.

Evaluating on a development set

  • BLEU-score

    Calculate BLEU on the development set. We use multi-bleu.pl from Moses SMT. (Note that the official script for MT evaluations is mteval-v13a.pl)

    ./calculate_bleu_dev.sh

    BLEU = 54.03

    Additional note about BLEU scoring: multi-bleu.pl does not work properly in case of references of different length (e.g., one test instance has 3 references, and another has 5), that's why the challenge evaluation was done with three references only. Consider using other scripts to calculate BLEU: * SacreBLEU (produces official WMT scores) * BLEU from NLTK (different smoothing methods available) * Maluuba metrics for NLG * metrics used for E2E Challenge

  • Prepare input files for other evaluation metrics.

    python3 metrics.py

  • METEOR

    Download and install METEOR.

    Navigate to the METEOR directory (cd meteor-1.5/).

    java -Xmx2G -jar meteor-1.5.jar <data-directory>/relexicalised_predictions.txt <data-directory>/all-notdelex-refs-meteor.txt -l en -norm -r 8

    METEOR = 0.39

  • TER

    Download and install TER.

    Navigate to the TER directory (cd tercom-0.7.25/).

    java -jar tercom.7.25.jar -h <data-directory>/relexicalised_predictions-ter.txt -r <data-directory>/all-notdelex-refs-ter.txt

    TER = 0.40