wordnet.doctest 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604
  1. .. Copyright (C) 2001-2020 NLTK Project
  2. .. For license information, see LICENSE.TXT
  3. =================
  4. WordNet Interface
  5. =================
  6. WordNet is just another NLTK corpus reader, and can be imported like this:
  7. >>> from nltk.corpus import wordnet
  8. For more compact code, we recommend:
  9. >>> from nltk.corpus import wordnet as wn
  10. -----
  11. Words
  12. -----
  13. Look up a word using ``synsets()``; this function has an optional ``pos`` argument
  14. which lets you constrain the part of speech of the word:
  15. >>> wn.synsets('dog') # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
  16. [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'),
  17. Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
  18. >>> wn.synsets('dog', pos=wn.VERB)
  19. [Synset('chase.v.01')]
  20. The other parts of speech are ``NOUN``, ``ADJ`` and ``ADV``.
  21. A synset is identified with a 3-part name of the form: word.pos.nn:
  22. >>> wn.synset('dog.n.01')
  23. Synset('dog.n.01')
  24. >>> print(wn.synset('dog.n.01').definition())
  25. a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds
  26. >>> len(wn.synset('dog.n.01').examples())
  27. 1
  28. >>> print(wn.synset('dog.n.01').examples()[0])
  29. the dog barked all night
  30. >>> wn.synset('dog.n.01').lemmas()
  31. [Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
  32. >>> [str(lemma.name()) for lemma in wn.synset('dog.n.01').lemmas()]
  33. ['dog', 'domestic_dog', 'Canis_familiaris']
  34. >>> wn.lemma('dog.n.01.dog').synset()
  35. Synset('dog.n.01')
  36. The WordNet corpus reader gives access to the Open Multilingual
  37. WordNet, using ISO-639 language codes.
  38. >>> sorted(wn.langs()) # doctest: +NORMALIZE_WHITESPACE
  39. ['als', 'arb', 'bul', 'cat', 'cmn', 'dan', 'ell', 'eng', 'eus', 'fas',
  40. 'fin', 'fra', 'glg', 'heb', 'hrv', 'ind', 'ita', 'jpn', 'nld', 'nno',
  41. 'nob', 'pol', 'por', 'qcn', 'slv', 'spa', 'swe', 'tha', 'zsm']
  42. >>> wn.synsets(b'\xe7\x8a\xac'.decode('utf-8'), lang='jpn')
  43. [Synset('dog.n.01'), Synset('spy.n.01')]
  44. wn.synset('spy.n.01').lemma_names('jpn') # doctest: +NORMALIZE_WHITESPACE
  45. ['\u3044\u306c', '\u307e\u308f\u3057\u8005', '\u30b9\u30d1\u30a4', '\u56de\u3057\u8005',
  46. '\u56de\u8005', '\u5bc6\u5075', '\u5de5\u4f5c\u54e1', '\u5efb\u3057\u8005',
  47. '\u5efb\u8005', '\u63a2', '\u63a2\u308a', '\u72ac', '\u79d8\u5bc6\u635c\u67fb\u54e1',
  48. '\u8adc\u5831\u54e1', '\u8adc\u8005', '\u9593\u8005', '\u9593\u8adc', '\u96a0\u5bc6']
  49. >>> wn.synset('dog.n.01').lemma_names('ita')
  50. ['cane', 'Canis_familiaris']
  51. >>> wn.lemmas('cane', lang='ita') # doctest: +NORMALIZE_WHITESPACE
  52. [Lemma('dog.n.01.cane'), Lemma('cramp.n.02.cane'), Lemma('hammer.n.01.cane'), Lemma('bad_person.n.01.cane'),
  53. Lemma('incompetent.n.01.cane')]
  54. >>> sorted(wn.synset('dog.n.01').lemmas('dan')) # doctest: +NORMALIZE_WHITESPACE
  55. [Lemma('dog.n.01.hund'), Lemma('dog.n.01.k\xf8ter'),
  56. Lemma('dog.n.01.vovhund'), Lemma('dog.n.01.vovse')]
  57. sorted(wn.synset('dog.n.01').lemmas('por'))
  58. [Lemma('dog.n.01.cachorra'), Lemma('dog.n.01.cachorro'), Lemma('dog.n.01.cadela'), Lemma('dog.n.01.c\xe3o')]
  59. >>> dog_lemma = wn.lemma(b'dog.n.01.c\xc3\xa3o'.decode('utf-8'), lang='por')
  60. >>> dog_lemma
  61. Lemma('dog.n.01.c\xe3o')
  62. >>> dog_lemma.lang()
  63. 'por'
  64. >>> len(list(wordnet.all_lemma_names(pos='n', lang='jpn')))
  65. 64797
  66. -------
  67. Synsets
  68. -------
  69. `Synset`: a set of synonyms that share a common meaning.
  70. >>> dog = wn.synset('dog.n.01')
  71. >>> dog.hypernyms()
  72. [Synset('canine.n.02'), Synset('domestic_animal.n.01')]
  73. >>> dog.hyponyms() # doctest: +ELLIPSIS
  74. [Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), ...]
  75. >>> dog.member_holonyms()
  76. [Synset('canis.n.01'), Synset('pack.n.06')]
  77. >>> dog.root_hypernyms()
  78. [Synset('entity.n.01')]
  79. >>> wn.synset('dog.n.01').lowest_common_hypernyms(wn.synset('cat.n.01'))
  80. [Synset('carnivore.n.01')]
  81. Each synset contains one or more lemmas, which represent a specific
  82. sense of a specific word.
  83. Note that some relations are defined by WordNet only over Lemmas:
  84. >>> good = wn.synset('good.a.01')
  85. >>> good.antonyms()
  86. Traceback (most recent call last):
  87. File "<stdin>", line 1, in <module>
  88. AttributeError: 'Synset' object has no attribute 'antonyms'
  89. >>> good.lemmas()[0].antonyms()
  90. [Lemma('bad.a.01.bad')]
  91. The relations that are currently defined in this way are `antonyms`,
  92. `derivationally_related_forms` and `pertainyms`.
  93. If you know the byte offset used to identify a synset in the original
  94. Princeton WordNet data file, you can use that to instantiate the synset
  95. in NLTK:
  96. >>> wn.synset_from_pos_and_offset('n', 4543158)
  97. Synset('wagon.n.01')
  98. ------
  99. Lemmas
  100. ------
  101. >>> eat = wn.lemma('eat.v.03.eat')
  102. >>> eat
  103. Lemma('feed.v.06.eat')
  104. >>> print(eat.key())
  105. eat%2:34:02::
  106. >>> eat.count()
  107. 4
  108. >>> wn.lemma_from_key(eat.key())
  109. Lemma('feed.v.06.eat')
  110. >>> wn.lemma_from_key(eat.key()).synset()
  111. Synset('feed.v.06')
  112. >>> wn.lemma_from_key('feebleminded%5:00:00:retarded:00')
  113. Lemma('backward.s.03.feebleminded')
  114. >>> for lemma in wn.synset('eat.v.03').lemmas():
  115. ... print(lemma, lemma.count())
  116. ...
  117. Lemma('feed.v.06.feed') 3
  118. Lemma('feed.v.06.eat') 4
  119. >>> for lemma in wn.lemmas('eat', 'v'):
  120. ... print(lemma, lemma.count())
  121. ...
  122. Lemma('eat.v.01.eat') 61
  123. Lemma('eat.v.02.eat') 13
  124. Lemma('feed.v.06.eat') 4
  125. Lemma('eat.v.04.eat') 0
  126. Lemma('consume.v.05.eat') 0
  127. Lemma('corrode.v.01.eat') 0
  128. >>> wn.lemma('jump.v.11.jump')
  129. Lemma('jump.v.11.jump')
  130. Lemmas can also have relations between them:
  131. >>> vocal = wn.lemma('vocal.a.01.vocal')
  132. >>> vocal.derivationally_related_forms()
  133. [Lemma('vocalize.v.02.vocalize')]
  134. >>> vocal.pertainyms()
  135. [Lemma('voice.n.02.voice')]
  136. >>> vocal.antonyms()
  137. [Lemma('instrumental.a.01.instrumental')]
  138. The three relations above exist only on lemmas, not on synsets.
  139. -----------
  140. Verb Frames
  141. -----------
  142. >>> wn.synset('think.v.01').frame_ids()
  143. [5, 9]
  144. >>> for lemma in wn.synset('think.v.01').lemmas():
  145. ... print(lemma, lemma.frame_ids())
  146. ... print(" | ".join(lemma.frame_strings()))
  147. ...
  148. Lemma('think.v.01.think') [5, 9]
  149. Something think something Adjective/Noun | Somebody think somebody
  150. Lemma('think.v.01.believe') [5, 9]
  151. Something believe something Adjective/Noun | Somebody believe somebody
  152. Lemma('think.v.01.consider') [5, 9]
  153. Something consider something Adjective/Noun | Somebody consider somebody
  154. Lemma('think.v.01.conceive') [5, 9]
  155. Something conceive something Adjective/Noun | Somebody conceive somebody
  156. >>> wn.synset('stretch.v.02').frame_ids()
  157. [8]
  158. >>> for lemma in wn.synset('stretch.v.02').lemmas():
  159. ... print(lemma, lemma.frame_ids())
  160. ... print(" | ".join(lemma.frame_strings()))
  161. ...
  162. Lemma('stretch.v.02.stretch') [8, 2]
  163. Somebody stretch something | Somebody stretch
  164. Lemma('stretch.v.02.extend') [8]
  165. Somebody extend something
  166. ----------
  167. Similarity
  168. ----------
  169. >>> dog = wn.synset('dog.n.01')
  170. >>> cat = wn.synset('cat.n.01')
  171. >>> hit = wn.synset('hit.v.01')
  172. >>> slap = wn.synset('slap.v.01')
  173. ``synset1.path_similarity(synset2):``
  174. Return a score denoting how similar two word senses are, based on the
  175. shortest path that connects the senses in the is-a (hypernym/hypnoym)
  176. taxonomy. The score is in the range 0 to 1. By default, there is now
  177. a fake root node added to verbs so for cases where previously a path
  178. could not be found---and None was returned---it should return a value.
  179. The old behavior can be achieved by setting simulate_root to be False.
  180. A score of 1 represents identity i.e. comparing a sense with itself
  181. will return 1.
  182. >>> dog.path_similarity(cat) # doctest: +ELLIPSIS
  183. 0.2...
  184. >>> hit.path_similarity(slap) # doctest: +ELLIPSIS
  185. 0.142...
  186. >>> wn.path_similarity(hit, slap) # doctest: +ELLIPSIS
  187. 0.142...
  188. >>> print(hit.path_similarity(slap, simulate_root=False))
  189. None
  190. >>> print(wn.path_similarity(hit, slap, simulate_root=False))
  191. None
  192. ``synset1.lch_similarity(synset2):``
  193. Leacock-Chodorow Similarity:
  194. Return a score denoting how similar two word senses are, based on the
  195. shortest path that connects the senses (as above) and the maximum depth
  196. of the taxonomy in which the senses occur. The relationship is given
  197. as -log(p/2d) where p is the shortest path length and d the taxonomy
  198. depth.
  199. >>> dog.lch_similarity(cat) # doctest: +ELLIPSIS
  200. 2.028...
  201. >>> hit.lch_similarity(slap) # doctest: +ELLIPSIS
  202. 1.312...
  203. >>> wn.lch_similarity(hit, slap) # doctest: +ELLIPSIS
  204. 1.312...
  205. >>> print(hit.lch_similarity(slap, simulate_root=False))
  206. None
  207. >>> print(wn.lch_similarity(hit, slap, simulate_root=False))
  208. None
  209. ``synset1.wup_similarity(synset2):``
  210. Wu-Palmer Similarity:
  211. Return a score denoting how similar two word senses are, based on the
  212. depth of the two senses in the taxonomy and that of their Least Common
  213. Subsumer (most specific ancestor node). Note that at this time the
  214. scores given do _not_ always agree with those given by Pedersen's Perl
  215. implementation of Wordnet Similarity.
  216. The LCS does not necessarily feature in the shortest path connecting the
  217. two senses, as it is by definition the common ancestor deepest in the
  218. taxonomy, not closest to the two senses. Typically, however, it will so
  219. feature. Where multiple candidates for the LCS exist, that whose
  220. shortest path to the root node is the longest will be selected. Where
  221. the LCS has multiple paths to the root, the longer path is used for
  222. the purposes of the calculation.
  223. >>> dog.wup_similarity(cat) # doctest: +ELLIPSIS
  224. 0.857...
  225. >>> hit.wup_similarity(slap)
  226. 0.25
  227. >>> wn.wup_similarity(hit, slap)
  228. 0.25
  229. >>> print(hit.wup_similarity(slap, simulate_root=False))
  230. None
  231. >>> print(wn.wup_similarity(hit, slap, simulate_root=False))
  232. None
  233. ``wordnet_ic``
  234. Information Content:
  235. Load an information content file from the wordnet_ic corpus.
  236. >>> from nltk.corpus import wordnet_ic
  237. >>> brown_ic = wordnet_ic.ic('ic-brown.dat')
  238. >>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
  239. Or you can create an information content dictionary from a corpus (or
  240. anything that has a words() method).
  241. >>> from nltk.corpus import genesis
  242. >>> genesis_ic = wn.ic(genesis, False, 0.0)
  243. ``synset1.res_similarity(synset2, ic):``
  244. Resnik Similarity:
  245. Return a score denoting how similar two word senses are, based on the
  246. Information Content (IC) of the Least Common Subsumer (most specific
  247. ancestor node). Note that for any similarity measure that uses
  248. information content, the result is dependent on the corpus used to
  249. generate the information content and the specifics of how the
  250. information content was created.
  251. >>> dog.res_similarity(cat, brown_ic) # doctest: +ELLIPSIS
  252. 7.911...
  253. >>> dog.res_similarity(cat, genesis_ic) # doctest: +ELLIPSIS
  254. 7.204...
  255. ``synset1.jcn_similarity(synset2, ic):``
  256. Jiang-Conrath Similarity
  257. Return a score denoting how similar two word senses are, based on the
  258. Information Content (IC) of the Least Common Subsumer (most specific
  259. ancestor node) and that of the two input Synsets. The relationship is
  260. given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).
  261. >>> dog.jcn_similarity(cat, brown_ic) # doctest: +ELLIPSIS
  262. 0.449...
  263. >>> dog.jcn_similarity(cat, genesis_ic) # doctest: +ELLIPSIS
  264. 0.285...
  265. ``synset1.lin_similarity(synset2, ic):``
  266. Lin Similarity:
  267. Return a score denoting how similar two word senses are, based on the
  268. Information Content (IC) of the Least Common Subsumer (most specific
  269. ancestor node) and that of the two input Synsets. The relationship is
  270. given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).
  271. >>> dog.lin_similarity(cat, semcor_ic) # doctest: +ELLIPSIS
  272. 0.886...
  273. ---------------------
  274. Access to all Synsets
  275. ---------------------
  276. Iterate over all the noun synsets:
  277. >>> for synset in list(wn.all_synsets('n'))[:10]:
  278. ... print(synset)
  279. ...
  280. Synset('entity.n.01')
  281. Synset('physical_entity.n.01')
  282. Synset('abstraction.n.06')
  283. Synset('thing.n.12')
  284. Synset('object.n.01')
  285. Synset('whole.n.02')
  286. Synset('congener.n.03')
  287. Synset('living_thing.n.01')
  288. Synset('organism.n.01')
  289. Synset('benthos.n.02')
  290. Get all synsets for this word, possibly restricted by POS:
  291. >>> wn.synsets('dog') # doctest: +ELLIPSIS
  292. [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), ...]
  293. >>> wn.synsets('dog', pos='v')
  294. [Synset('chase.v.01')]
  295. Walk through the noun synsets looking at their hypernyms:
  296. >>> from itertools import islice
  297. >>> for synset in islice(wn.all_synsets('n'), 5):
  298. ... print(synset, synset.hypernyms())
  299. ...
  300. Synset('entity.n.01') []
  301. Synset('physical_entity.n.01') [Synset('entity.n.01')]
  302. Synset('abstraction.n.06') [Synset('entity.n.01')]
  303. Synset('thing.n.12') [Synset('physical_entity.n.01')]
  304. Synset('object.n.01') [Synset('physical_entity.n.01')]
  305. ------
  306. Morphy
  307. ------
  308. Look up forms not in WordNet, with the help of Morphy:
  309. >>> wn.morphy('denied', wn.NOUN)
  310. >>> print(wn.morphy('denied', wn.VERB))
  311. deny
  312. >>> wn.synsets('denied', wn.NOUN)
  313. []
  314. >>> wn.synsets('denied', wn.VERB) # doctest: +NORMALIZE_WHITESPACE
  315. [Synset('deny.v.01'), Synset('deny.v.02'), Synset('deny.v.03'), Synset('deny.v.04'),
  316. Synset('deny.v.05'), Synset('traverse.v.03'), Synset('deny.v.07')]
  317. Morphy uses a combination of inflectional ending rules and exception
  318. lists to handle a variety of different possibilities:
  319. >>> print(wn.morphy('dogs'))
  320. dog
  321. >>> print(wn.morphy('churches'))
  322. church
  323. >>> print(wn.morphy('aardwolves'))
  324. aardwolf
  325. >>> print(wn.morphy('abaci'))
  326. abacus
  327. >>> print(wn.morphy('book', wn.NOUN))
  328. book
  329. >>> wn.morphy('hardrock', wn.ADV)
  330. >>> wn.morphy('book', wn.ADJ)
  331. >>> wn.morphy('his', wn.NOUN)
  332. >>>
  333. ---------------
  334. Synset Closures
  335. ---------------
  336. Compute transitive closures of synsets
  337. >>> dog = wn.synset('dog.n.01')
  338. >>> hypo = lambda s: s.hyponyms()
  339. >>> hyper = lambda s: s.hypernyms()
  340. >>> list(dog.closure(hypo, depth=1)) == dog.hyponyms()
  341. True
  342. >>> list(dog.closure(hyper, depth=1)) == dog.hypernyms()
  343. True
  344. >>> list(dog.closure(hypo)) # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
  345. [Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'),
  346. Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'),
  347. Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'),
  348. Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'),
  349. Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), ...]
  350. >>> list(dog.closure(hyper)) # doctest: +NORMALIZE_WHITESPACE
  351. [Synset('canine.n.02'), Synset('domestic_animal.n.01'), Synset('carnivore.n.01'), Synset('animal.n.01'),
  352. Synset('placental.n.01'), Synset('organism.n.01'), Synset('mammal.n.01'), Synset('living_thing.n.01'),
  353. Synset('vertebrate.n.01'), Synset('whole.n.02'), Synset('chordate.n.01'), Synset('object.n.01'),
  354. Synset('physical_entity.n.01'), Synset('entity.n.01')]
  355. ----------------
  356. Regression Tests
  357. ----------------
  358. Bug 85: morphy returns the base form of a word, if it's input is given
  359. as a base form for a POS for which that word is not defined:
  360. >>> wn.synsets('book', wn.NOUN)
  361. [Synset('book.n.01'), Synset('book.n.02'), Synset('record.n.05'), Synset('script.n.01'), Synset('ledger.n.01'), Synset('book.n.06'), Synset('book.n.07'), Synset('koran.n.01'), Synset('bible.n.01'), Synset('book.n.10'), Synset('book.n.11')]
  362. >>> wn.synsets('book', wn.ADJ)
  363. []
  364. >>> wn.morphy('book', wn.NOUN)
  365. 'book'
  366. >>> wn.morphy('book', wn.ADJ)
  367. Bug 160: wup_similarity breaks when the two synsets have no common hypernym
  368. >>> t = wn.synsets('picasso')[0]
  369. >>> m = wn.synsets('male')[1]
  370. >>> t.wup_similarity(m) # doctest: +ELLIPSIS
  371. 0.631...
  372. >>> t = wn.synsets('titan')[1]
  373. >>> s = wn.synsets('say', wn.VERB)[0]
  374. >>> print(t.wup_similarity(s))
  375. None
  376. Bug 21: "instance of" not included in LCS (very similar to bug 160)
  377. >>> a = wn.synsets("writings")[0]
  378. >>> b = wn.synsets("scripture")[0]
  379. >>> brown_ic = wordnet_ic.ic('ic-brown.dat')
  380. >>> a.jcn_similarity(b, brown_ic) # doctest: +ELLIPSIS
  381. 0.175...
  382. Bug 221: Verb root IC is zero
  383. >>> from nltk.corpus.reader.wordnet import information_content
  384. >>> s = wn.synsets('say', wn.VERB)[0]
  385. >>> information_content(s, brown_ic) # doctest: +ELLIPSIS
  386. 4.623...
  387. Bug 161: Comparison between WN keys/lemmas should not be case sensitive
  388. >>> k = wn.synsets("jefferson")[0].lemmas()[0].key()
  389. >>> wn.lemma_from_key(k)
  390. Lemma('jefferson.n.01.Jefferson')
  391. >>> wn.lemma_from_key(k.upper())
  392. Lemma('jefferson.n.01.Jefferson')
  393. Bug 99: WordNet root_hypernyms gives incorrect results
  394. >>> from nltk.corpus import wordnet as wn
  395. >>> for s in wn.all_synsets(wn.NOUN):
  396. ... if s.root_hypernyms()[0] != wn.synset('entity.n.01'):
  397. ... print(s, s.root_hypernyms())
  398. ...
  399. >>>
  400. Bug 382: JCN Division by zero error
  401. >>> tow = wn.synset('tow.v.01')
  402. >>> shlep = wn.synset('shlep.v.02')
  403. >>> from nltk.corpus import wordnet_ic
  404. >>> brown_ic = wordnet_ic.ic('ic-brown.dat')
  405. >>> tow.jcn_similarity(shlep, brown_ic) # doctest: +ELLIPSIS
  406. 1...e+300
  407. Bug 428: Depth is zero for instance nouns
  408. >>> s = wn.synset("lincoln.n.01")
  409. >>> s.max_depth() > 0
  410. True
  411. Bug 429: Information content smoothing used old reference to all_synsets
  412. >>> genesis_ic = wn.ic(genesis, True, 1.0)
  413. Bug 430: all_synsets used wrong pos lookup when synsets were cached
  414. >>> for ii in wn.all_synsets(): pass
  415. >>> for ii in wn.all_synsets(): pass
  416. Bug 470: shortest_path_distance ignored instance hypernyms
  417. >>> google = wordnet.synsets("google")[0]
  418. >>> earth = wordnet.synsets("earth")[0]
  419. >>> google.wup_similarity(earth) # doctest: +ELLIPSIS
  420. 0.1...
  421. Bug 484: similarity metrics returned -1 instead of None for no LCS
  422. >>> t = wn.synsets('fly', wn.VERB)[0]
  423. >>> s = wn.synsets('say', wn.VERB)[0]
  424. >>> print(s.shortest_path_distance(t))
  425. None
  426. >>> print(s.path_similarity(t, simulate_root=False))
  427. None
  428. >>> print(s.lch_similarity(t, simulate_root=False))
  429. None
  430. >>> print(s.wup_similarity(t, simulate_root=False))
  431. None
  432. Bug 427: "pants" does not return all the senses it should
  433. >>> from nltk.corpus import wordnet
  434. >>> wordnet.synsets("pants",'n')
  435. [Synset('bloomers.n.01'), Synset('pant.n.01'), Synset('trouser.n.01'), Synset('gasp.n.01')]
  436. Bug 482: Some nouns not being lemmatised by WordNetLemmatizer().lemmatize
  437. >>> from nltk.stem.wordnet import WordNetLemmatizer
  438. >>> WordNetLemmatizer().lemmatize("eggs", pos="n")
  439. 'egg'
  440. >>> WordNetLemmatizer().lemmatize("legs", pos="n")
  441. 'leg'
  442. Bug 284: instance hypernyms not used in similarity calculations
  443. >>> wn.synset('john.n.02').lch_similarity(wn.synset('dog.n.01')) # doctest: +ELLIPSIS
  444. 1.335...
  445. >>> wn.synset('john.n.02').wup_similarity(wn.synset('dog.n.01')) # doctest: +ELLIPSIS
  446. 0.571...
  447. >>> wn.synset('john.n.02').res_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
  448. 2.224...
  449. >>> wn.synset('john.n.02').jcn_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
  450. 0.075...
  451. >>> wn.synset('john.n.02').lin_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
  452. 0.252...
  453. >>> wn.synset('john.n.02').hypernym_paths() # doctest: +ELLIPSIS
  454. [[Synset('entity.n.01'), ..., Synset('john.n.02')]]
  455. Issue 541: add domains to wordnet
  456. >>> wn.synset('code.n.03').topic_domains()
  457. [Synset('computer_science.n.01')]
  458. >>> wn.synset('pukka.a.01').region_domains()
  459. [Synset('india.n.01')]
  460. >>> wn.synset('freaky.a.01').usage_domains()
  461. [Synset('slang.n.02')]
  462. Issue 629: wordnet failures when python run with -O optimizations
  463. >>> # Run the test suite with python -O to check this
  464. >>> wn.synsets("brunch")
  465. [Synset('brunch.n.01'), Synset('brunch.v.01')]
  466. Issue 395: wordnet returns incorrect result for lowest_common_hypernyms of chef and policeman
  467. >>> wn.synset('policeman.n.01').lowest_common_hypernyms(wn.synset('chef.n.01'))
  468. [Synset('person.n.01')]
  469. Bug https://github.com/nltk/nltk/issues/1641: Non-English lemmas containing capital letters cannot be looked up using wordnet.lemmas() or wordnet.synsets()
  470. >>> wn.lemmas('Londres', lang='fra')
  471. [Lemma('united_kingdom.n.01.Londres'), Lemma('london.n.01.Londres'), Lemma('london.n.02.Londres')]
  472. >>> wn.lemmas('londres', lang='fra')
  473. [Lemma('united_kingdom.n.01.Londres'), Lemma('london.n.01.Londres'), Lemma('london.n.02.Londres')]
  474. Patch-1 https://github.com/nltk/nltk/pull/2065 Adding 3 functions (relations) to WordNet class
  475. >>> wn.synsets("computer_science")[0].in_topic_domains()[2]
  476. Synset('access_time.n.01')
  477. >>> wn.synsets("France")[0].in_region_domains()[18]
  478. Synset('french.n.01')
  479. >>> wn.synsets("slang")[1].in_usage_domains()[18]
  480. Synset('can-do.s.01')