For the WebNLG challenge, we provide a baseline system which can serve as a starting point for your experiments.
Scripts to reproduce our experiments are available on GitLab.
Linearisation, tokenisation, delexicalisation
- Unpack the archive with the WebNLG dataset into a
Run a preprocessing script.
python3 webnlg_baseline_input.py -i <data-directory>
The script extracts tripleset-lexicalisation pairs, linearises triples, performs tokenisation and delexicalisation using the exact match, and writes source and target files.
<modifiedtripleset> <mtriple>Indonesia | leaderName | Jusuf_Kalla</mtriple> <mtriple>Bakso | region | Indonesia</mtriple> <mtriple>Bakso | ingredient | Noodle</mtriple> <mtriple>Bakso | country | Indonesia</mtriple> </modifiedtripleset> <lex> Bakso is a food containing noodles;it is found in Indonesia where Jusuf Kalla is the leader. </lex>
source files *.triple:
COUNTRY leaderName LEADERNAME FOOD region COUNTRY FOOD ingredient INGREDIENT FOOD country COUNTRY
target files *.lex:
FOOD is a food containing noodles ; it is found in COUNTRY where LEADERNAME is the leader .
The script writes training and validation files which are used as input to neural generation, as well as reference files for evaluation.
Training a model and generating verbalisations
A simple sequence-to-sequence model with the attention mechanism was trained using the OpenNMT toolkit using the default parameters for training and translating.
Navigate to the OpenNMT directory.
Process data files and convert them to the OpenNMT format.
th preprocess.lua -train_src <data-directory>/train-webnlg-all-delex.triple -train_tgt <data-directory>/train-webnlg-all-delex.lex
-valid_src <data-directory>/dev-webnlg-all-delex.triple -valid_tgt <data-directory>/dev-webnlg-all-delex.lex
-src_seq_length 70 -tgt_seq_length 70 -save_data baseline
baseline-train.t7 file will be generated, which is used in the training phase
Train the model.
th train.lua -data baseline-train.t7 -save_model baseline
After training for 13 epochs, the script outputs the model file baseline_epoch13_*.t7. Training takes several hours on a GPU.
th translate.lua -model baseline_epoch13_*.t7 -src <data-directory>/dev-webnlg-all-delex.triple -output baseline_predictions.txt
The script generates the file baseline_predictions.txt.
python3 webnlg_relexicalise.py -i <data-directory> -f <OpenNMT-directory>/baseline_predictions.txt
The script generates the file relexicalised_predictions.txt with the initial RDF subjects and objects.
Evaluating on a development set
BLEU = 54.03
Additional note about BLEU scoring: multi-bleu.pl does not work properly in case of references of different length (e.g., one test instance has 3 references, and another has 5), that's why the challenge evaluation was done with three references only. Consider using other scripts to calculate BLEU: * SacreBLEU (produces official WMT scores) * BLEU from NLTK (different smoothing methods available) * Maluuba metrics for NLG * metrics used for E2E Challenge
Prepare input files for other evaluation metrics.
Download and install METEOR.
Navigate to the METEOR directory (
java -Xmx2G -jar meteor-1.5.jar <data-directory>/relexicalised_predictions.txt <data-directory>/all-notdelex-refs-meteor.txt -l en -norm -r 8
METEOR = 0.39
Download and install TER.
Navigate to the TER directory (
java -jar tercom.7.25.jar -h <data-directory>/relexicalised_predictions-ter.txt -r <data-directory>/all-notdelex-refs-ter.txt
TER = 0.40