Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full set of word senses missing in dictionary files? #2

Open
yakazimir opened this issue Sep 20, 2019 · 1 comment
Open

full set of word senses missing in dictionary files? #2

yakazimir opened this issue Sep 20, 2019 · 1 comment

Comments

@yakazimir
Copy link

yakazimir commented Sep 20, 2019

I'm trying to rebuild your data, and noticed in the ALL.dict.xml (which, as I understand, contains all of the lemmas, glosses and word senses used in all the semeval data), you have entries such as the following:

<lexelt item="climate#n" pos="n" sence_count_wn="2" sense_count_corpus="1" word_example_count="5">
 <sense gloss="the weather in some location averaged over some long period of time" id="climate%1:26:00::" sense_example_count="5" sense_freq="5" synset="climate clime">
 </sense>
</lexelt>

Where climate#n is the lemma and pos. It says here that the sence_count_wn=2, however, there is only one sense inside of lexelt. Shouldn't there be all of the 2 sense entries inside of lexelt? My assumption is that each lexelt should have all of the different WN senses and glosses of the lemma listed in item.

I also notice that when I look up this word in nltk's wordnet (which I see that you also use), I get a different definition for climate%1:26:00:::

In [1]: from nltk.corpus import wordnet as wn 
In [2]: wn.synset_from_sense_key('climate%1:26:00::').definition()                         
'the prevailing psychological state'

## whereas your sense gloss seems to correspond to climate%1:26:01::
In [11]: wn.synset_from_sense_key('climate%1:26:01::').definition()                        
Out[11]: 'the weather in some location averaged over some long period of time'

In [13]: wn.get_version()                                                                   
Out[13]: '3.0'

@yakazimir
Copy link
Author

yakazimir commented Sep 20, 2019

Just an update: in terms of nltk's wordnet mapping using synset_from_sense_key, something seems to be wrong.

Your gloss/id pair is consistent with wordnet when I searched here: http://wordnetweb.princeton.edu/perl/webwn?s=climate&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=1&o3=&o4=&h=0000 .

This issue is mentioned here: nltk/nltk#1934

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant