Difference between revisions of "WNImage:Jonathan's Lab Notes"

From CSWiki
Jump to: navigation, search
(5/7/06)
Line 3: Line 3:
 
** There are many unrelated synsets.  When this occurs, a the top synsets are simply equal to one of the caption keywords (and have a score of 1.0, meaning they match only that keyword).  Hopefully this will be mitigated with more relations.
 
** There are many unrelated synsets.  When this occurs, a the top synsets are simply equal to one of the caption keywords (and have a score of 1.0, meaning they match only that keyword).  Hopefully this will be mitigated with more relations.
 
** The other problem is disambiguation of polysemous words.  For example, consider a caption with the words [white,red].  While you might these to naturally fire the {color} synset, it turns out that {person} gets a much higher score.  For example, this could be a picture of E.B. White and Lenin.  Because words like white which can also be proper names belong to many synsets, you get an explosion of weight associated to that word.  In other words, because our scheme simply adds together the score of each sense of a word, this is essentially equivalent to expanding our bag of words to include each possible sense.  So for [white], you end up with a crowd of people named  White in the bag of words.  Normalization might be one solution, but it greatly dilutes the value of [white], that is, normalization doesn't realize that white here refers to the color so much as it attenuates the magnitude of each sense of white out of existence.  
 
** The other problem is disambiguation of polysemous words.  For example, consider a caption with the words [white,red].  While you might these to naturally fire the {color} synset, it turns out that {person} gets a much higher score.  For example, this could be a picture of E.B. White and Lenin.  Because words like white which can also be proper names belong to many synsets, you get an explosion of weight associated to that word.  In other words, because our scheme simply adds together the score of each sense of a word, this is essentially equivalent to expanding our bag of words to include each possible sense.  So for [white], you end up with a crowd of people named  White in the bag of words.  Normalization might be one solution, but it greatly dilutes the value of [white], that is, normalization doesn't realize that white here refers to the color so much as it attenuates the magnitude of each sense of white out of existence.  
** A better solution for the aforementioned problem might be to select the sense of white which gives the highest score within the Xsynset we are testing.  Implement this.
+
** A better solution for the aforementioned problem might be to select the sense of white which gives the highest score within the Xsynset we are testing.  Implement this. Looks a bit better.
 +
** Start implementing the traversal of links other that hypernym.
 +
 
 
== 5/6/06 ==  
 
== 5/6/06 ==  
 
* Updated the code to generate a weighted Xsynset.  
 
* Updated the code to generate a weighted Xsynset.  

Revision as of 18:33, 7 May 2006

5/7/06

  • Looked at the initial results. Not so good. There are a couple of problems:
    • There are many unrelated synsets. When this occurs, a the top synsets are simply equal to one of the caption keywords (and have a score of 1.0, meaning they match only that keyword). Hopefully this will be mitigated with more relations.
    • The other problem is disambiguation of polysemous words. For example, consider a caption with the words [white,red]. While you might these to naturally fire the {color} synset, it turns out that {person} gets a much higher score. For example, this could be a picture of E.B. White and Lenin. Because words like white which can also be proper names belong to many synsets, you get an explosion of weight associated to that word. In other words, because our scheme simply adds together the score of each sense of a word, this is essentially equivalent to expanding our bag of words to include each possible sense. So for [white], you end up with a crowd of people named White in the bag of words. Normalization might be one solution, but it greatly dilutes the value of [white], that is, normalization doesn't realize that white here refers to the color so much as it attenuates the magnitude of each sense of white out of existence.
    • A better solution for the aforementioned problem might be to select the sense of white which gives the highest score within the Xsynset we are testing. Implement this. Looks a bit better.
    • Start implementing the traversal of links other that hypernym.

5/6/06

  • Updated the code to generate a weighted Xsynset.
  • Pickled everything to make it more compact and robust.
  • Wrote a brute force synset ranker.
  • It might be extremely slow. Think about using something like a HMM to get approximate rankings.
  • Read up on HMM. Also consider a simple gradient descent algorithm. Since the optimal synset must connected to one of the keywords, we just start walking from each of the keywords.

5/5/06

  • Updated the database to use the new structures.
  • Regenerated the database.
  • Updated the web documentation.