Difference between revisions of "Syntop wsd"
From CSWiki
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
== April 28 == | == April 28 == | ||
Line 15: | Line 14: | ||
* In a tree parser, you can use parent node to define the distribution | * In a tree parser, you can use parent node to define the distribution | ||
* Embed whole thing as graphical model (Jonathan's idea) | * Embed whole thing as graphical model (Jonathan's idea) | ||
+ | |||
+ | == May 8 == | ||
+ | |||
+ | Things seem to be working. If speed is an issue, stop sampling corroborators for monosemous words. | ||
+ | |||
+ | Need to vary hyperparameters ... | ||
+ | |||
+ | == May 14 == | ||
+ | |||
+ | Normalizer -- there are problems starting it out at safe_log(0) if the things you're normalizing are zero. Perhaps using NaN would be better. | ||
+ | |||
+ | == May 15 == | ||
+ | |||
+ | To generate input files: | ||
+ | * run makeVocabDat.py (note: you may have to modify EXTENSION variable depending on what the files look like ... it could be either ".bnc.parsed" or ".parsed") | ||
+ | * That generates "role-count.dat" and "word-count.dat" ... if you want to change the vocab files, you'll need to sort these and rename them to "sorted-word.dat" and "sorted-role.dat" | ||
+ | * Then run mergeFile.py (again, modifying EXTENSION) |
Latest revision as of 23:41, 15 May 2007
Contents
April 28
Began writing code, prepared presentation.
April 29
Tested Multinomial sampler and Dirichlet prior classes; still need to test conditioned sampling.
April 30
Started processing parsed files. Presented to WN group, got following suggestions:
- For parsing using some order, switch to tree parser
- In a tree parser, you can use parent node to define the distribution
- Embed whole thing as graphical model (Jonathan's idea)
May 8
Things seem to be working. If speed is an issue, stop sampling corroborators for monosemous words.
Need to vary hyperparameters ...
May 14
Normalizer -- there are problems starting it out at safe_log(0) if the things you're normalizing are zero. Perhaps using NaN would be better.
May 15
To generate input files:
- run makeVocabDat.py (note: you may have to modify EXTENSION variable depending on what the files look like ... it could be either ".bnc.parsed" or ".parsed")
- That generates "role-count.dat" and "word-count.dat" ... if you want to change the vocab files, you'll need to sort these and rename them to "sorted-word.dat" and "sorted-role.dat"
- Then run mergeFile.py (again, modifying EXTENSION)