Difference between revisions of "WNImage"
From CSWiki
(→Notes) |
(→5/5/06) |
||
Line 34: | Line 34: | ||
* The principal goal (or first milestone) of this project is to use Xsynsets rank the synsets associated with the given image. | * The principal goal (or first milestone) of this project is to use Xsynsets rank the synsets associated with the given image. | ||
* Each Xsynset will be implemented in python as a dictionary. In summary, an Xsynset uses the following structures: | * Each Xsynset will be implemented in python as a dictionary. In summary, an Xsynset uses the following structures: | ||
− | ** synset | + | **; synset : the synset number within wordnet is recorded. |
− | ** path | + | **; path : a path is a list of synsets (starting node to ending node). |
− | ** typed_path | + | **; typed_path : a tuple where the first element is a wordnet connection type and the second is a path. This represents a path through wordnet where all the traversed edges are of the type specified in the first element of the tuple. |
− | ** entry | + | **; entry : is a dictionary entry where the key is a synset and the value is a list of typed_paths. The list of typed_paths are all those paths which go from the Xsynset's generator synset to the given target synset, while only traversing one type of connection. |
− | ** Xsynset | + | **; Xsynset : is a list of entries. If a synset does not appear as any key in the Xsynset, then it cannot be reached from the generator synset within the threshold number of steps. |
+ | === Personal Work Notes === | ||
+ | * [[WNImage:Jonathan's Lab Notes|Jonathan's Lab Notes]] | ||
=== 5/3/06 === | === 5/3/06 === | ||
* JBG has done some basic disambiguation using Lesk (sp?, maximizing definition overlap). | * JBG has done some basic disambiguation using Lesk (sp?, maximizing definition overlap). |
Revision as of 16:51, 6 May 2006
Contents
Todo
- Update the Xsynset structure to incorporate the latest ideas.
- Add support for links other than hypernym.
- Fix the way in which synsets are ranked for an image.
- Figure out some way to normalize to prevent "over-generalization."
CVS Access
The WNImage tools in the repository under wnimage. The repository is named wnp. To access it, follow the instructions at [1].
Files
CVS Files
- gen_Xsynsets.py - generates the Xsynset database file for a given list of words on stdin. The db gets output on stdout. The max_depth parameter specifies how many links to follow. It currently only follows hypernyms and it crawls all senses of a word.
- gen_caption_vects.py - generates a db of the weighted (alpha=0.5) Xsynset for each image caption. pass in the captions file as the first parameter. You need to run this after gen_Xsynsets.py and it expects the result of that to be named Xsynsetdb.py. Note: this generates a pretty large file and it doesn't save much time so it might be scrapped.
- rank_caption_synsets.py - ranks the top N (second parameter) synsets for all the captions. The captions file is the first parameter.
- extract_caption_words.sh - extracts and uniquifies all the words in the captions of a captions file. Input on stdin and output on stdout.
- Xsynsettools.py - a library for utility functions relating to Xsynsets. Currently just has a function to generate Xsynsets.
- similarity.py - a library for similarity computations. Currently just has cosine similarity.
- captionstools.py - a library for utility functions relating to image caption manipulation (i.e. reading, vector extraction, etc.)
Experimental Files
These are large data files (too large and/or time consuming to put in everybody's CVS). All files are in on psy-build2 at /wnimage.
- captions - This is the captions file that has been dos2unix-ified.
- captionwords - A sorted and uniquified list of all the words that occur in the captions file.
- Xsynsetdb.py - This is the database of Xsynsets generated using gen_Xsynsets.py. You should create a simlink from this file to your working dir.
The results subdirectory contains working results for experiments.
- top5synsets - The top 5 synsets for each image caption using Xsynsetdb and cosine similarity.
Notes
5/5/06
- The principal goal (or first milestone) of this project is to use Xsynsets rank the synsets associated with the given image.
- Each Xsynset will be implemented in python as a dictionary. In summary, an Xsynset uses the following structures:
- synset
- the synset number within wordnet is recorded.
- path
- a path is a list of synsets (starting node to ending node).
- typed_path
- a tuple where the first element is a wordnet connection type and the second is a path. This represents a path through wordnet where all the traversed edges are of the type specified in the first element of the tuple.
- entry
- is a dictionary entry where the key is a synset and the value is a list of typed_paths. The list of typed_paths are all those paths which go from the Xsynset's generator synset to the given target synset, while only traversing one type of connection.
- Xsynset
- is a list of entries. If a synset does not appear as any key in the Xsynset, then it cannot be reached from the generator synset within the threshold number of steps.
Personal Work Notes
5/3/06
- JBG has done some basic disambiguation using Lesk (sp?, maximizing definition overlap).