🌱

Melvin's digital garden

One class per named entity

CREATED: 200612290843 Authors: Wong Yingchuan, Ng Hwee Tou

** Named entity recognition

treat as classification into 17 classes
use maximum entropy modeling
features ** local: same sentence ** global

** Observation in CoNLL 2003

98% of NE types have exactly one class
91% of NE tokens have exactly one class
Base algorithm base on majority tag, does quite well (~80%) on seen entities
Difficulty lies in unseen entities
Therefore majority tag provides useful information

** Exploiting unlabelled text, U

train h1 using labelled data

label U using h1 to get U’

extract majority tags list L from U’

** L = list of (NE, MajTag) ** case sensitive (Bush vs bush), NE appearing only once is pruned away

train h2 using labelled text and L

** L is used as an additional feature

use h2 and L to evaluate test set

improves performance on unseen entities

** Results

using unlabelled text improves performance up to a certain point, beyond that the performance drops slightly
better performance improvement when labelled data is less

Links to this note

Literature notes