To Do

New features:

  • Lin's grammatical similarity
  • Log conditional probability (as provided by Wortschatz)
  • Wikipedia links
  • Disambiguated HowNet

Learning ideas:

  • Try lasso
  • High recall learning technique: We don't really need good accuracy. If we can winnow down samples of interest and then put them to a human, then that should be good enough

Programming ideas:

  • Make everything a single derived python class, so queries can be formed without playing with a bunch of ever-changing text files