Melvin's digital garden

One class per named entity

CREATED: 200612290843 Authors: Wong Yingchuan, Ng Hwee Tou

** Named entity recognition

  • treat as classification into 17 classes
  • use maximum entropy modeling
  • features ** local: same sentence ** global

** Observation in CoNLL 2003

  • 98% of NE types have exactly one class
  • 91% of NE tokens have exactly one class
  • Base algorithm base on majority tag, does quite well (~80%) on seen entities
  • Difficulty lies in unseen entities
  • Therefore majority tag provides useful information

** Exploiting unlabelled text, U

train h1 using labelled data

label U using h1 to get U’

extract majority tags list L from U’

** L = list of (NE, MajTag) ** case sensitive (Bush vs bush), NE appearing only once is pruned away

train h2 using labelled text and L

** L is used as an additional feature

use h2 and L to evaluate test set

improves performance on unseen entities

** Results

  • using unlabelled text improves performance up to a certain point, beyond that the performance drops slightly
  • better performance improvement when labelled data is less

Links to this note