Began writing code, prepared presentation.
Tested Multinomial sampler and Dirichlet prior classes; still need to test conditioned sampling.
Started processing parsed files. Presented to WN group, got following suggestions:
- For parsing using some order, switch to tree parser
- In a tree parser, you can use parent node to define the distribution
- Embed whole thing as graphical model (Jonathan's idea)
Things seem to be working. If speed is an issue, stop sampling corroborators for monosemous words.
Need to vary hyperparameters ...
Normalizer -- there are problems starting it out at safe_log(0) if the things you're normalizing are zero. Perhaps using NaN would be better.
To generate input files:
- run makeVocabDat.py (note: you may have to modify EXTENSION variable depending on what the files look like ... it could be either ".bnc.parsed" or ".parsed")
- That generates "role-count.dat" and "word-count.dat" ... if you want to change the vocab files, you'll need to sort these and rename them to "sorted-word.dat" and "sorted-role.dat"
- Then run mergeFile.py (again, modifying EXTENSION)