From CSWiki
Revision as of 17:40, 27 June 2006 by Ezubaric (talk | contribs) (Establishing Ground Truth)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


  • General
    • Reading!
  • Synset scoring
    • Link weighting based on frequency.
    • Transitive meronyms
    • Word competition
  • Image Assignments
    • Homogeneity penalty.
    • Smarter k-means pruning.
    • Max-flow approach to assignment.

Establishing Ground Truth

In order to establish what methods work well, we need to have a "ground truth" for comparison, which must be derived from human data. We could ask any one of the following questions of human subjects:

  • How useful would this image be as a supplement to a dictionary definition of synset S? Use 100 to signify that the image would be great to include. Use 0 to signify that (for whatever reason) including the image would be a bad idea.
  • To what extent does this image illustrate --- or provide the "gist" of --- the synset S?
  • What fraction of this image could be removed without altering its quality as an illustration of synset S?

What are the advantages or disadvantages of these techniques?

Chandra could perhaps add links or her own notes here.


A website with the current results can be found at http://psy-build2.princeton.edu/wnimage. There are results for 1000 images using a random seed of 8675309.

The files are named topNsynsets_PARAMS.html, where N is the number of synsets ranked and PARAMS is a string of letters specifying the parameters (see below). For example, a file named top5synsets_AY.html would have the top 5 synsets for each image with the parameters alpha=0.5, gamma=1.0, rho=0.0, upsilon=1.0, mu=1.0, nu=1.0, kappa=0.0.

These results are then used to assign images to synsets (illustrate synsets). These assignments are in files named assignments_PARAMS_k.html (similar to above). k is the parameter used in the k-means assignment.


Here are the various parameters:

  • alpha
    Determines how the score for a synset within an Xsynset decays as a function of the path length from the generator synset to the given synset.
  • gamma
    Words which are potentially colors are weighted by this factor.
  • rho
    Determines how the score for a synset within an Xsynset decays as a function of its shallowness from the root of the wordnet database.
  • mu
    If a word and its hypernym are found within the caption set, no matter how distant they may be wordnet, then mu that sense of the hypernym and the hyponym pair are multiplied by mu.
  • nu
    In the previous scenario, the other senses (i.e. the ones which did not form the hypernym/hyponym pair) of both the hypernym and hyponym are multiplied by nu.
  • upsilon
    In the previous scenario, the hyponym of the pair is given an extra boost by this amount.
  • kappa
    Word frequency constant. In order to give more weight to infrequent words, each word is weighted by -log(f*kappa) where f is its frequency in the corpus.

In order to keep things wieldly, sets of parameters will be named according to the following shorthand:

  • A
    • alpha = 0.5
    • gamma = 1.0
    • rho = 0.0
  • B
    • alpha = 0.5
    • gamma = 0.2
    • rho = 0.90
  • C
    • alpha = 0.6667
    • gamma = 0.2
    • rho = 0.90
  • Y
    • upsilon = 1.0
    • mu = 1.0
    • nu = 1.0
    • kappa = 0.0
  • Z
    • upsilon = 1.5
    • mu = 1.5
    • nu = 0.67
    • kappa = 0.05

CVS Access

The WNImage tools in the repository under wnimage. The repository is named wnp. To access it, follow the instructions at [1].


CVS Files

  • gen_Xsynsets.py - generates the Xsynset database file for a given list of words on stdin. The db gets pickled to Xsynsetdb.pkl. The max_depth parameter specifies how many links to follow. It currently only follows hypernyms and it crawls all senses of a word.
  • gen_weighted_Xsynsets.py - generates the wXsynset database file all the Xsynsets in Xsynsetdb.pkl and outputs the new pickled db as wXsynsetdb.pkl. Using alpha=0.5, it computed a weighted Xsynset, i.e. one which simply has a numerical value assigned to each word. It is currently alpha^path_length from source to target.
  • gen_caption_vects.py - generates a db of the weighted (alpha=0.5) Xsynset for each image caption. pass in the captions file as the first parameter. You need to run this after gen_Xsynsets.py and it expects the result of that to be named Xsynsetdb.py. Note: this generates a pretty large file and it doesn't save much time so it might be scrapped.
  • rank_caption_synsets.py - ranks the top N (second parameter) synsets for all the captions. The captions file is the first parameter.
  • assign_images.py - assigns images to synsets. Takes in a results file (output of rank_caption_synsets.py) on stdin, and prints out the assignments on stdout. Currently just picks the top one.
  • gen_captions_cache.py - reads in a list of words and their counts (i.e. captionwords) and generates a pickle file for a hash_table with this data. The indices are normalized by morphy.
  • extract_caption_words.sh - extracts and uniquifies all the words in the captions of a captions file. Input on stdin and output on stdout.
  • Xsynsettools.py - a library for utility functions relating to Xsynsets. Currently just has a function to generate Xsynsets.
  • similarity.py - a library for similarity computations. Currently just has cosine similarity.
  • captionstools.py - a library for utility functions relating to image caption manipulation (i.e. reading, vector extraction, etc.)
  • cluster.py - a library with clustering routines. It currently just implements k-means.
  • chi2.py - a library for computing chi^2 statistics.
  • results2html.sh - takes the output of rank_caption_synsets.py and generates a decent looking webpage.
  • assignments2html.sh - takes the output of assign_images.py and generates a decent looking webpage.

Experimental Files

These are large data files (too large and/or time consuming to put in everybody's CVS). All files are in on psy-build2 at /wnimage.

  • captions - This is the captions file that has been dos2unix-ified.
  • captionwords - A sorted and uniquified list of all the words that occur in the captions file. This file also has the counts of each of the words.
  • Xsynsetdb.pkl - This is the database of Xsynsets generated using gen_Xsynsets.py. You should create a simlink from this file to your working dir.
  • wXsynsetdb.pkl - This is the database of weighted Xsynsets generated using gen_weighted_Xsynsets.py. You should create a simlink from this file to your working dir.
  • word_counts.pkl - A database of wordcounts for all the nouns in the captions.

The results subdirectory contains working results for experiments.

  • topNsynsets_PARAMS - The top N synsets for each image caption. Raw results named analogously to topNsynsets_PARAMS.html (see above). These should be fed into results2html.sh or assign_images.py.
  • assignments_PARAMS_k - The raw results fed into assignments2html.sh.



  • XM will concentrate on assigning images to synsets while JC will focus on scoring the most relevant synsets for any given image.
  • Ideas discussed:
    • Words can compete. If two words have the same parent, then their score should be attenuated. By fighting with each other, they split the vote. Another way of implementing this would be by looking for words that are at a similar depth in the hierarchy. This is problematic because of the arbitrary density of the hierarchy.
    • This might be solved with some normalization based on how often words appear in the corpus. Infrequent synsets can be elided from the wordnet database. This might obviate the need for other ad hoc solutions.
  • Ways of conducting experiments:
    • Directly testing the technique by showing images and having subjects judge the correctness.
    • Another way of testing is to just show the captions instead of the images and have them rate the relation of the captions to the synset. Since we only operate using the captions, this is a better test of the technique itself (or as least a good point of comparison).
    • Yet another way of testing caption quality is to have them rate the captions versus the images.
  • Long term goals:
    • By mid-June 
      Have many (4-ish) techniques/parameter sets that we think are pretty good and ready for testing.
      Proposal. $$$.
      Interns will develop the web interface for the testing.
      Late Summer 
      Show off something to google. $$$.
      let the humans at 'em!
      Late fall 
      Publication goal? (ask Christiane)
  • To get there:
    • A good point of comparison would be to use google images. We could create webpages for the esp data set and let google loose on it and see how we fare in comparison (just type the synset forms into google).
    • There's probably a lot of related work out there. We should get up to speed.


  • The goal of the project is to use Xsynsets to associate images with synsets (read: illustrate synsets).
  • The first step of this is to do the opposite, namely to rank Xsynsets for each image.
    • Once we have a scored ranking, we can adjust competition parameters to determine how to assign it to a synset.
      • Homogeneity penalty which devaluates the assignment if the image could be associated to many synsets (because there are many synsets of a similar high-score).
    • We will also explore prior bias parameters.
      • Lower the weight given to colors.
      • Lower the weight the further up the hypernym hierarchy a word is (in principle, giving more weight to more specific terms).
    • Last, we can explore different ways of computing and using extended synsets.
      • Change the function between path length and score (a^length vs. alpha/(alpha+length)).
      • Weight different relations differently.
      • Boosting words that are synset kings.
      • Normalization via out degree.
  • I will set up a large (~1000) set of images on the website using a randomized seed with the results of different approaches so that we can judge each of the techniques.
  • In order to test our technique eventually, we will give users a synset (+gloss) and ask them to judge how representative the picture is of the synset (summer intern).
  • XM will implement basic similarity between captions and synset glosses as a baseline technique.


  • The principal goal (or first milestone) of this project is to use Xsynsets rank the synsets associated with the given image.
  • Each Xsynset will be implemented in python as a dictionary. In summary, an Xsynset uses the following structures:
    • synset 
      the synset number within wordnet is recorded.
      a path is a list of synsets (starting node to ending node).
      a tuple where the first element is a wordnet connection type and the second is a path. This represents a path through wordnet where all the traversed edges are of the type specified in the first element of the tuple.
      is a dictionary entry where the key is a synset and the value is a list of typed_paths. The list of typed_paths are all those paths which go from the Xsynset's generator synset to the given target synset, while only traversing one type of connection.
      is a list of entries. If a synset does not appear as any key in the Xsynset, then it cannot be reached from the generator synset within the threshold number of steps.


JBG has done some basic disambiguation using Lesk. There are pages for both synsets and each individual image. The results are pretty shoddy.

Personal Work Notes

    A part of Wordnet_plus