Difference between revisions of "Syntop wsd"

From CSWiki
Jump to: navigation, search
 
Line 15: Line 15:
 
* Embed whole thing as graphical model (Jonathan's idea)
 
* Embed whole thing as graphical model (Jonathan's idea)
  
== Random Notes (on desktop in office) ==
+
== May 8 ==
 +
 
 +
Things seem to be working.  If speed is an issue, stop sampling corroborators for monosemous words.
  
 
Need to vary hyperparameters ...  
 
Need to vary hyperparameters ...  
 +
 +
== May 14 ==
 +
 +
Normalizer -- there are problems starting it out at safe_log(0) if the things you're normalizing are zero.  Perhaps using NaN would be better.
  
 
== May 15 ==
 
== May 15 ==
Line 24: Line 30:
 
* run makeVocabDat.py (note: you may have to modify EXTENSION variable depending on what the files look like ... it could be either ".bnc.parsed" or ".parsed")
 
* run makeVocabDat.py (note: you may have to modify EXTENSION variable depending on what the files look like ... it could be either ".bnc.parsed" or ".parsed")
 
* That generates "role-count.dat" and "word-count.dat" ... if you want to change the vocab files, you'll need to sort these and rename them to "sorted-word.dat" and "sorted-role.dat"
 
* That generates "role-count.dat" and "word-count.dat" ... if you want to change the vocab files, you'll need to sort these and rename them to "sorted-word.dat" and "sorted-role.dat"
* Then run
+
* Then run mergeFile.py (again, modifying EXTENSION)

Latest revision as of 23:41, 15 May 2007

April 28

Began writing code, prepared presentation.

April 29

Tested Multinomial sampler and Dirichlet prior classes; still need to test conditioned sampling.

April 30

Started processing parsed files. Presented to WN group, got following suggestions:

  • For parsing using some order, switch to tree parser
  • In a tree parser, you can use parent node to define the distribution
  • Embed whole thing as graphical model (Jonathan's idea)

May 8

Things seem to be working. If speed is an issue, stop sampling corroborators for monosemous words.

Need to vary hyperparameters ...

May 14

Normalizer -- there are problems starting it out at safe_log(0) if the things you're normalizing are zero. Perhaps using NaN would be better.

May 15

To generate input files:

  • run makeVocabDat.py (note: you may have to modify EXTENSION variable depending on what the files look like ... it could be either ".bnc.parsed" or ".parsed")
  • That generates "role-count.dat" and "word-count.dat" ... if you want to change the vocab files, you'll need to sort these and rename them to "sorted-word.dat" and "sorted-role.dat"
  • Then run mergeFile.py (again, modifying EXTENSION)