Difference between revisions of "WNImage"

From CSWiki
Jump to: navigation, search
Line 1: Line 1:
 
== Todo ==
 
== Todo ==
* Rank images for a given synset.
+
* Update the Xsynset structure to incorporate the latest ideas.
 +
* Add support for links other than hypernym.
 +
* Fix the way in which synsets are ranked for an image.
 +
* Figure out some way to normalize to prevent "over-generalization."
  
 
== CVS Access ==
 
== CVS Access ==
The WNImage tools in the repository under wnimage.
+
The WNImage tools in the repository under wnimage.  The repository is named wnp.  To access it, follow the instructions at [http://cvs.cs.princeton.edu].
  
 
== Files ==
 
== Files ==

Revision as of 00:00, 6 May 2006

Todo

  • Update the Xsynset structure to incorporate the latest ideas.
  • Add support for links other than hypernym.
  • Fix the way in which synsets are ranked for an image.
  • Figure out some way to normalize to prevent "over-generalization."

CVS Access

The WNImage tools in the repository under wnimage. The repository is named wnp. To access it, follow the instructions at [1].

Files

CVS Files

  • gen_Xsynsets.py - generates the Xsynset database file for a given list of words on stdin. The db gets output on stdout. The max_depth parameter specifies how many links to follow. It currently only follows hypernyms and it crawls all senses of a word.
  • gen_caption_vects.py - generates a db of the weighted (alpha=0.5) Xsynset for each image caption. pass in the captions file as the first parameter. You need to run this after gen_Xsynsets.py and it expects the result of that to be named Xsynsetdb.py. Note: this generates a pretty large file and it doesn't save much time so it might be scrapped.
  • rank_caption_synsets.py - ranks the top N (second parameter) synsets for all the captions. The captions file is the first parameter.
  • extract_caption_words.sh - extracts and uniquifies all the words in the captions of a captions file. Input on stdin and output on stdout.
  • Xsynsettools.py - a library for utility functions relating to Xsynsets. Currently just has a function to generate Xsynsets.
  • similarity.py - a library for similarity computations. Currently just has cosine similarity.
  • captionstools.py - a library for utility functions relating to image caption manipulation (i.e. reading, vector extraction, etc.)

Experimental Files

These are large data files (too large and/or time consuming to put in everybody's CVS). All files are in on psy-build2 at /wnimage.

  • captions - This is the captions file that has been dos2unix-ified.
  • captionwords - A sorted and uniquified list of all the words that occur in the captions file.
  • Xsynsetdb.py - This is the database of Xsynsets generated using gen_Xsynsets.py. You should create a simlink from this file to your working dir.

The results subdirectory contains working results for experiments.

  • top5synsets - The top 5 synsets for each image caption using Xsynsetdb and cosine similarity.

Notes

JBG has done some basic disambiguation using Lesk (sp?, maximizing definition overlap).