Date : 1 September 2016
Speaker: Yassine Mrabet, Lister Hill National Center for Biomedical Communications, National Library of Medicine, USA
Unsupervised Ranking of Knowledge Bases for Named Entity Recognition
With the continuous growth of freely accessible knowledge bases and the heterogeneity of textual corpora, selecting the most adequate knowledge base for named entity recognition is becoming a challenge in itself. In this talk, we will present an unsupervised method to rank knowledge bases according to their adequacy for the recognition of named entities in a given corpus. Building on a state-of-the-art, unsupervised entity linking approach, we propose several evaluation metrics to measure the lexical and structural adequacy of a knowledge base for a given corpus. We study the correlation between these metrics and three standard performance measures: precision, recall and F1 score. Our multi-domain experiments on 9 different corpora with 6 knowledge bases show that three of the proposed metrics are strong performance predictors having 0.62 to 0.76 Pearson correlation with precision and 0.96 correlation with both recall and F1 score.