LDAWN

From CSWiki
Revision as of 11:35, 19 June 2006 by Ezubaric (talk | contribs) (Files)

Jump to: navigation, search

CVS Access

The files are in the repository under ldawn. The repository is named wnp. To access it, follow the instructions here. You'll need to be approved by the repository owner, who is JBG. You'll need gsl installed. If you don't have root on a machine and can't add to the normal include directory, look at the "make jbg" entry in the Make file to see how to point to a different directory. In MSVC, you'll need to look at [this http://www.sourceware.org/ml/gsl-discuss/2004-q2/msg00000.html] to get GSL linked up.

Files

You'll also need some data files, which can be found [here http://www.cs.princeton.edu/~jbg/wn/ldawn/]. They also require some libraries from the py-evo-feat directory in the wnp archive, which can be accessed by adding it to the python path.

Program Files

  • mixture.cpp
    Creates a mixture model of topic walks; still not working completely
  • generateReport.py
    Given the stem of inference synsets (e.g. "inf-synset."), creates a report on the accuracy, report.out.
  • LDAWN.cpp
    The main file, from which all other functions are called
  • WN.cpp
    Reads in the WordNet information and serves as the basis for the topic walks
  • TopicWalk.cpp
    The topic walk parameters that exist on top of the WN class
  • Path.cpp
    An individual path through WN that ends in a synset

Data Files

  • bnc-par.dat
    The BNC corpus split into paragraphs. Words occurring fewer than 10 times were excluded, as were paragraphs with fewer than five terms (although those terms were counted toward the frequency ... this was done because some headers were counted as paragraphs). Uses bnc-par
  • semcor-par.dat
    The SemCor corpus split into paragraphs. Uses the same vocab and word files as bnc-par.dat.

Output Files

  • name.entropyN
    The entropy after the N th round
  • name.alpha
    The alpha parameter of the model
  • name.beta
    The beta parameter of the model
  • name.walkN
    The Nth topic parameters of the TopicWalk

How to Run

I'll add more soon. Until then, after compiling with make, run ./ldawn -help to see all the options.

It's easier just to show examples.

  • ./ldawn -modelName five -numTopics 5

Run the LDA topic walk with five topics and write the output to "five"


    A part of Wordnet_plus