Sense Disambiguation

From CSWiki
Revision as of 04:02, 26 August 2006 by Boo (talk | contribs)

Jump to: navigation, search

Automatic Strategies for Sense Disambiguation

Similarity Algorithms

A broad category of disambiguation schemes is to start with a similarity metric that gives a score between every two synsets and seeks to maximize that score across all of the possible synsets.


When inquisitive readers encounter a word they don't know or a word that is used in a way that is unfamiliar to them, they look up the term in a dictionary. Confronted with multiple options for a term, they find that definition (or gloss) that best fits the context the word appeared in. Using this intuition, Mike Lesk developed an elegant scheme whereby the sense of a word that has the largest number of overlaps the other senses is chosen.

This work has been extended by Siddharth Patwardhan and Ted Pedersen to use context vectors for each of the words in a WordNet gloss and connected synsets (via a number of other WordNet linkages) to form a basis for a new similarity measure.


Philip Resnik introduced the idea of adding the idea of information content to synsets in order to add a quantitative measure of distance between synsets separate from just the topological separation within WordNet. Resnik took a corpus, and for every word appearing in the corpus, adds one over the number of senses for every sense of that word and all of its hypernym parents.

Computational Issues

As the number of words increases and the number of possible senses increases, there is additional computational overhead in trying the ensemble of senses that maximizes the similarity measure under investigation. Cowie, Guthrie, and Guthrie created a disambiguation strategy that sampled from the possible ensembles using simulated annealing.

Statistical Methods

Hiding a Semantic Hierarchy in a Markov Model

Although Abney and Light focused on the problem of disambiguation in the context of selectional restrictions, their approach is very similar to our approach. They train, for every part of speech relationship that links to a noun, to create a hidden Markov model that would generate the word from a sense.

They were unable to improve upon Resnik's approach of using class-based measures to