WNImage:Jonathan's Lab Notes

From CSWiki
Revision as of 02:19, 18 May 2006 by Jcone (talk | contribs)

Jump to: navigation, search

5/17/06

  • Collect some interesting statistics in the assignment: the percentage of assignments for which a caption word is a synset form and the average distance from a caption word to a synset form.
    • It turns out that only about 6% of images are assigned to a synset which is not one of the caption words. In the cases where it is not, it is only ever 1 away. This suggests that we could possibly do nearly as well using a simple keyword search. Another way of looking at it is that we're identifying which of the keywords is most important.

5/16/06

  • Meeting
  • Read some related work
  • Just for my edification, also compute the number of times we assign an image to a synset which is not in the caption words.

5/15/06

  • Clean up/update results and webpage.
  • Some more ideas:
    • Transitive meronyms - meronyms are often not expressed for all the hyponyms. For example, [leg] may be a meronym of [insect], but it may not be for [horsefly]. This suggests that the algorithm should manually insert transitive meronyms: that is, if a synset has a meronym link to another synset, then it also has implicit meronym links to all hyponyms of that synset.
    • Learning the strength of connections - if a word, say [ungulate], never appears in the corpus, then it should effectively be nonexistent in wordnet. That is, nodes which would have connected to [ungulate], say [horse], should now connect directly to the parent, say [animal]. The idea is to elide nodes which are "useless." A refinement of this would be to assign a weight to each node to help describe the decay function of the Xsynset as it crosses that node. So if a word never appears, that node is elided, but if it occurs many times, crossing the node incurs a heavy decay.

5/14/06

  • Here are some thoughts rolling around in my head:
    • We should try to reverse-engineer the labelling process. What happens when someone is asked to label?
      1. They decompose the image into separate components. For example, a picture of a horse standing in a field with trees might immediately suggest [horse,field,sky,trees].
      2. For each of these components, they will try to think of similar words. This will most likely involve using more general terms for the things they described. For example, [animal] for horse, [plant] for tree. Thus, our keywords are [horse,animal,field,sky,trees,plants].
      3. For each of the components of the image, they can also further elaborate on each of the components. For example, seeing the horse, they might suggest [mane]. Seeing the tree, they might suggest [leaves]. Now the list looks like [horse,animal,mane,tree,plant,leaves,sky,field].
    • This process seems to suggest that we should look for hypernym/hyponym pairs and meronym/holonym pairs. In the case of hypernyms, we should look for the most specific captions as being the actual object. We should not get any more specific than that. In the case of meronyms, we should look for the largest thing, i.e. the thing that contains as many of the other captions as possible.
    • Rare words are probably very important. If [game hen] only occurs once in the corpus, then an image with the captions [game hen, tree] should probably give much more weight to [game hen].
  • Gather some word-count statistics. Add a rare-word booster.
  • Based on the discussion above, add special handling for mernoyms akin to hypernyms. It simply will not traverse meronym links. It does not yet do the more sophisticated n^2 matching of mero/holonyms.
  • The results look better.
  • Something else to note is that it is often helpful to traverse hyopnym->meronym->hyponym. The meronym->hyponym should often be traversed because a meronym of the hypernym is effectively a meronym of the hyponym. For example, if leaf is a meronym of tree, the leaf is effectively a meronym of poplar.
  • Implement a meronym booster more similar to the hypernym booster. The results look about the same (maybe slightly better).

5/13/06

  • Investigate the results of the chi^2 scores. They give some interesting results but they are somewhat skewed since chi^2 does not deal well with distributions that have many rare events. Investigate other options, but end up settling on just calculating raw binomial probability.
  • Notice something else: many of the results have words that should intuitively be connected, but which are not. An example is [hippo,animal]. It should be pretty clear what this refers to; however, to get from hippo to animal, we must traverse hippo->even-toed ungulate->ungulate->placental mammal->mammal->vertebrate->chordate->animal.
  • Add an ad hoc analysis that boosts a particular synset if they form a possibly hypernym/hyponym pair. Also reduce other synsets with the same word form.
  • Remove color words altogether. Potential color words are almost always semantically color words in this context, and since they are already devalued by 0.2, they are practically insignficant. By removing them completely, we speed things up by reducing the number of senses we have to scan through.
  • A cursory look at the results are promising. One potential problem is that the high weight given to words we are confident about tends to spew many hyponyms. For example, if we know that this is a picture of a tree, synsets like poplar and willow will be strongly excited and move very high in the rankings. Perhaps the right approach is to disallow traversing of hypernyms when scoring a synset.

5/12/06

  • Fix the results webpages to incorporate different k-means parameters.
  • Implement a chi^2 scorer for each of the illustrated synsets. The results look fairly promising, with many erroneously assigned images getting low scores.
  • Sort a number of things to ensure that the results can be more easily compared.
  • Fix a bug in the k-means sorter.

5/11/06

  • Finish the k-means sorter. It now only assigns an image to a synset if it is the only member of its cluster. Make a webpage with the results.
  • Meeting

5/10/06

  • Made webpages/documented the latest results. They're starting to shape up.
  • Wrote the script to parse the results and based on those results, assign images to synsets. For now, just pick the top synset.
  • Make webpages for this set of results.
  • Start work on a k-means method for grouping candidate synsets. This should make it easier (in principle) to determine which synsets are in the "top tier" for a given image.

5/9/06

  • Remove duplicates from morphy-ized captions, otherwise words which appear in both singular and plural forms have excessive weight.
  • Update the webpage with the meeting notes.
  • Do not follow ANTONYM links. Also don't give scores to ATTRIBUTE links since they point to adjectives!
  • Generate the webpage with randomized (using seed 8675309) images.
  • Implement weighting colors less and weighting based on depth in the hierarchy.

5/8/06

  • Take another look over the results. They look slightly better but are still crummy. One reason for this is that the cache only has those words that appear in the captions. A good match might not be in any of the captions. However, any possible match has to be within 3 steps of one of the caption words. Only examine those Xsynsets and that speeds things up immensely.
  • Still not much better. Many results are multiples of 17.0. The reason for this is that each word is reachable via any of the 17 link types. Therefore, we are putting an inordinately high weight on words that appear in the captions (since they automatically get a score of 17!).
  • pywordnet also seems to be giving errors when trying to access certain synsets by offset. I will need to debug this later....
  • The results are better now, but still not great. Take a break and write a script to create webpages with the results.

5/7/06

  • Looked at the initial results. Not so good. There are a couple of problems:
    • There are many unrelated synsets. When this occurs, a the top synsets are simply equal to one of the caption keywords (and have a score of 1.0, meaning they match only that keyword). Hopefully this will be mitigated with more relations.
    • The other problem is disambiguation of polysemous words. For example, consider a caption with the words [white,red]. While you might these to naturally fire the {color} synset, it turns out that {person} gets a much higher score. For example, this could be a picture of E.B. White and Lenin. Because words like white which can also be proper names belong to many synsets, you get an explosion of weight associated to that word. In other words, because our scheme simply adds together the score of each sense of a word, this is essentially equivalent to expanding our bag of words to include each possible sense. So for [white], you end up with a crowd of people named White in the bag of words. Normalization might be one solution, but it greatly dilutes the value of [white], that is, normalization doesn't realize that white here refers to the color so much as it attenuates the magnitude of each sense of white out of existence.
    • A better solution for the aforementioned problem might be to select the sense of white which gives the highest score within the Xsynset we are testing. Implement this. Looks a bit better.
    • Start implementing the traversal of links other that hypernym. Regenerate results.

5/6/06

  • Updated the code to generate a weighted Xsynset.
  • Pickled everything to make it more compact and robust.
  • Wrote a brute force synset ranker.
  • It might be extremely slow. Think about using something like a HMM to get approximate rankings.
  • Read up on HMM. Also consider a simple gradient descent algorithm. Since the optimal synset must connected to one of the keywords, we just start walking from each of the keywords.

5/5/06

  • Updated the database to use the new structures.
  • Regenerated the database.
  • Updated the web documentation.