LDAWN

From CSWiki
Revision as of 10:53, 28 June 2006 by Ezubaric (talk | contribs) (Conditional Probability on SemCor paragraphs)

Jump to: navigation, search

CVS Access

The files are in the repository under ldawn. The repository is named wnp. To access it, follow the instructions here. You'll need to be approved by the repository owner, who is JBG. You'll need gsl installed. If you don't have root on a machine and can't add to the normal include directory, look at the "make jbg" entry in the Make file to see how to point to a different directory. In MSVC, you'll need to look at [this http://www.sourceware.org/ml/gsl-discuss/2004-q2/msg00000.html] to get GSL linked up.

Files

You'll also need some data files, which can be found [here http://www.cs.princeton.edu/~jbg/wn/ldawn/]. They also require some libraries from the py-evo-feat directory in the wnp archive, which can be accessed by adding it to the python path.

Program Files

  • mixture.cpp
    Creates a mixture model of topic walks; still not working completely
  • generateReport.py
    Given the stem of inference synsets (e.g. "inf-synset."), creates a report on the accuracy, report.out.
  • LDAWN.cpp
    The main file, from which all other functions are called
  • WN.cpp
    Reads in the WordNet information and serves as the basis for the topic walks
  • TopicWalk.cpp
    The topic walk parameters that exist on top of the WN class
  • Path.cpp
    An individual path through WN that ends in a synset

Data Files

  • bnc-par.dat
    The BNC corpus split into paragraphs. Words occurring fewer than 10 times were excluded, as were paragraphs with fewer than five terms (although those terms were counted toward the frequency ... this was done because some headers were counted as paragraphs). Uses bnc-par
  • semcor-par.dat
    The SemCor corpus split into paragraphs. Uses the same vocab and word files as bnc-par.dat.

Output Files

  • name.entropyN
    The entropy after the N th round
  • name.alpha
    The alpha parameter of the model
  • name.beta
    The beta parameter of the model
  • name.walkN
    The Nth topic parameters of the TopicWalk

How to Run

I'll add more soon. Until then, after compiling with make, run ./ldawn -help to see all the options.

It's easier just to show examples.

  • ./ldawn -modelName five -numTopics 5

Run the LDA topic walk with five topics and write the output to "five"

Experiments

Conditional Probability on SemCor paragraphs

This took longer than expected to get working. Apart from the usual bugs/missteps, there was one particular problem that took me forever to root out. There apparently is some inconsistency with how the STL handles queries to empty vectors. I was using MSVC for debugging, and everything worked fine. But when I used gcc, I was getting some odd assertion breaks.

An SemCor paragraph that caused problems in the testing phase.

Apparently, the following conditions were causing a problem:

  • If a word appeared in a synset that was the parent of another synset that also appears in that word
  • If the parent duplicate synset's first link goes in the direction of the child duplicate synset

And of course, this only occurred in a word with tons of possible paths, so it too forever to figure out what was happening with parallel debugging on two platforms. Such a situation happened in this document:

Paragraphs that Worked Well

2010
  • After completing the payments prescribed by paragraphs 2-3 of this subsection, to to make payments, from time to time in ratable proportions, on account of the unpaid principal of all awards in the principal amount of more than $1000, according to the proprotion which the unpaid principal of such awards bear to the total amount in the fund available for distribution at the time such payments are made; and after payment has been made of the principal amounts of all such awards, to make pro_rata payments on account of accrued interest on such awards bear interest.
+ 2	payment	01105532	[1105532, 0.66674199999999995]
+ 1	paragraph	06314305	[6314305, 1.0]
+ 1	subsection	04297199	[4297199, 1.0]
+ 2	payment	01105532	[1105532, 0.66674199999999995]
+ 4	proportion	13634395	[13634395, 0.50875599999999999]
+ 5	principal	13227919	[13227919, 0.47344199999999997]
+ 3	award	00085363	[85363, 0.40011600000000003]
+ 5	principal	13227919	[13227919, 0.47344199999999997]
- 4	amount[5047581, 0.52172799999999997]
   13159702	6
   C(6,134):	a quantity of money; "he borrowed a large sum"; "the amount he had in cash was insufficient"
   W(6,99):	how much of something is available; "an adequate amount of food for four people"
+ 4	proportion	13634395	[13634395, 0.50875599999999999]
+ 5	principal	13227919	[13227919, 0.47344199999999997]
+ 3	award	00085363	[85363, 0.40011600000000003]
- 4	amount[5047581, 0.52172799999999997]
   13159702	6
   C(6,134):	a quantity of money; "he borrowed a large sum"; "the amount he had in cash was insufficient"
   W(6,99):	how much of something is available; "an adequate amount of food for four people"
+ 3	fund	13186788	[13186788, 0.63194099999999997]
- 4	distribution[1098151, 0.444635]
   01068743	6
   C(6,55):	the act of distributing or spreading or apportioning
   W(1,1):	the commercial activity of transporting and selling goods from a producer to a consumer
- 10	time[26997, 0.13068099999999999]
   07209466	7
   C(7,1):	an instance or single occasion for some event; "this time he succeeded"; "he called four times"; "he could do ten at a clip"
   W(4,197):	the continuum of experience in which events pass from the future through the present to the past
+ 2	payment	01105532	[1105532, 0.66674199999999995]
+ 2	payment	01105532	[1105532, 0.66674199999999995]
+ 5	principal	13227919	[13227919, 0.47344199999999997]
- 4	amount[5047581, 0.52172799999999997]
   13159702	6
   C(6,134):	a quantity of money; "he borrowed a large sum"; "the amount he had in cash was insufficient"
   W(6,99):	how much of something is available; "an adequate amount of food for four people"
+ 3	award	00085363	[85363, 0.40011600000000003]
+ 2	payment	01105532	[1105532, 0.66674199999999995]
+ 7	interest	13147070	[13147070, 0.23111999999999999]
+ 3	award	00085363	[85363, 0.40011600000000003]
+ 7	interest	13147070	[13147070, 0.23111999999999999]
687
  • Man was create with the capacity for immortality, but the devil's promise of immortality in exchange for disobedience cost Adam immortality. He was, in the words of Irenaeus, "beguiled by another under the pretext of immortality." The true way to immortality lay through obedience, but man did not believe this.


- 9	capacity[901650, 0.260459]
  05141907	7
  C(7,1):	ability to perform or produce
  W(1,1):	the maximum production possible; "the plant is working at 80 per cent capacity"
+ 2	immortality	04997158	[4997158, 0.50053700000000001]
- 5	devil[10017885, 0.45038]
  09406617	8
  C(8,1):	(Judeo-Christian and Islamic religions) chief spirit of evil and adversary of God; tempter of mankind; master of Hell
  W(6,1):	a rowdy or mischievous person (usually a young man); "he chased the young hellions out of his yard"
+ 2	promise	07127600	[7127600, 0.79994299999999996]
+ 2	immortality	04997158	[4997158, 0.50053700000000001]
+ 11	exchange	01151051	[1151051, 0.19575400000000001]
+ 2	disobedience	01164483	[1164483, 0.50047699999999995]
+ 3	adam	09450366	[9450366, 0.62506799999999996]
+ 2	immortality	04997158	[4997158, 0.50053700000000001]
+ 2	pretext	06669029	[6669029, 0.75048599999999999]
+ 2	immortality	04997158	[4997158, 0.50053700000000001]
- 12	way[5720477, 0.12898100000000001]
   00169254	3
   C(3,34):	how a result is obtained or an end is achieved; "a means of control"; "an example is the best agency of instruction"; "the true way to success"
   W(9,1):	doing as one pleases or chooses; "if I had my way"
+ 2	immortality	04997158	[4997158, 0.50053700000000001]
+ 3	obedience	01151922	[1151922, 0.333762]


2074
  • Determine if the particular State's unadjusted allotment result obtained in item 11 above is less_than its minimum base allotment, and if so raise its unadjusted allotment to its minimum allotment. Regardless of its unadjusted allotment, State is guaranteed by law a minimum allotment each_year equal to the allotment which it recieved in fiscal_year 1954 - increased by a uniform percentage of 5.49 which brings total 1954 allotments to all States up to $230000000.
- 8	state[8065574, 0.29231099999999999]
   08533584	1
   C(1,266):	the territory occupied by one of the constituent administrative districts of a nation; "his state is in the deep south"
   W(8,4):	the group of people comprising the government of a sovereign state; "the state has lowered its  income tax"
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
- 4	result[6248523, 0.68911500000000003]
  06652837	6
  C(6,2):	a statement that solves a problem or explains how to solve the problem; "they were trying to find a peaceful solution"; "the answers were in the back of the book"; "he computed the result to four decimal places"
  W(2,1):	the semantic role of the noun phrase whose referent exists only by virtue of the activity denoted  by the verb in the clause
+ 5	item	06396434	[6396434, 0.299485]
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
- 8	state[8065574, 0.29231099999999999]
   08533584	1
   C(1,266):	the territory occupied by one of the constituent administrative districts of a nation; "his state is in the deep south"
   W(8,4):	the group of people comprising the government of a sovereign state; "the state has lowered its income tax"
+ 7	law	06445723	[6445723, 0.24918599999999999]
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
+ 1	fiscal_year	15004020	[15004020, 1.0]
+ 2	percentage	13636179	[13636179, 0.57021599999999995]
+ 2	allotment	13118358	[13118358, 0.66663799999999995]
- 8	state[8065574, 0.29231099999999999]
  08533584	1
  C(1,266):	the territory occupied by one of the constituent administrative districts of a nation; "his state is in the deep south"
  W(8,4):	the group of people comprising the government of a sovereign state; "the state has lowered its income tax"
    A part of Wordnet_plus