Topic wsd

From CSWiki
Revision as of 11:56, 14 February 2007 by Ezubaric (talk | contribs) (Feb 14)

Jump to: navigation, search

Feb 13

Created class to strip documents from BNC using one of three tags:

  • s - too small
  • p - right size, excludes speech
  • bncdoc - too big, variable

It then parses the paragraph with minipar.

Tried getting topic WSD to work with JCN from Pedersen's IC file and 3.0 structure, but it gave horrible results.

Feb 14

Power went out, so parser died. THought that I should also be doing stemming and writing out LDA counts. Stemming will be:

  • See if morphy or the Porter stemmer have suggestions
  • If it's in WordNet, keep it and count it