April 28

Began writing code, prepared presentation.

April 29

Tested Multinomial sampler and Dirichlet prior classes; still need to test conditioned sampling.

April 30

Started processing parsed files. Presented to WN group, got following suggestions:

  • For parsing using some order, switch to tree parser
  • In a tree parser, you can use parent node to define the distribution
  • Embed whole thing as graphical model (Jonathan's idea)

May 8

Things seem to be working. If speed is an issue, stop sampling corroborators for monosemous words.

Need to vary hyperparameters ...

May 14

Normalizer -- there are problems starting it out at safe_log(0) if the things you're normalizing are zero. Perhaps using NaN would be better.

May 15

To generate input files:

  • run (note: you may have to modify EXTENSION variable depending on what the files look like ... it could be either ".bnc.parsed" or ".parsed")
  • That generates "role-count.dat" and "word-count.dat" ... if you want to change the vocab files, you'll need to sort these and rename them to "sorted-word.dat" and "sorted-role.dat"
  • Then run (again, modifying EXTENSION)