stem.doctest 2.0 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. .. Copyright (C) 2001-2020 NLTK Project
  2. .. For license information, see LICENSE.TXT
  3. ==========
  4. Stemmers
  5. ==========
  6. Overview
  7. ~~~~~~~~
  8. Stemmers remove morphological affixes from words, leaving only the
  9. word stem.
  10. >>> from nltk.stem import *
  11. Unit tests for the Porter stemmer
  12. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  13. >>> from nltk.stem.porter import *
  14. Create a new Porter stemmer.
  15. >>> stemmer = PorterStemmer()
  16. Test the stemmer on various pluralised words.
  17. >>> plurals = ['caresses', 'flies', 'dies', 'mules', 'denied',
  18. ... 'died', 'agreed', 'owned', 'humbled', 'sized',
  19. ... 'meeting', 'stating', 'siezing', 'itemization',
  20. ... 'sensational', 'traditional', 'reference', 'colonizer',
  21. ... 'plotted']
  22. >>> singles = [stemmer.stem(plural) for plural in plurals]
  23. >>> print(' '.join(singles)) # doctest: +NORMALIZE_WHITESPACE
  24. caress fli die mule deni die agre own humbl size meet
  25. state siez item sensat tradit refer colon plot
  26. Unit tests for Snowball stemmer
  27. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  28. >>> from nltk.stem.snowball import SnowballStemmer
  29. See which languages are supported.
  30. >>> print(" ".join(SnowballStemmer.languages))
  31. arabic danish dutch english finnish french german hungarian italian
  32. norwegian porter portuguese romanian russian spanish swedish
  33. Create a new instance of a language specific subclass.
  34. >>> stemmer = SnowballStemmer("english")
  35. Stem a word.
  36. >>> print(stemmer.stem("running"))
  37. run
  38. Decide not to stem stopwords.
  39. >>> stemmer2 = SnowballStemmer("english", ignore_stopwords=True)
  40. >>> print(stemmer.stem("having"))
  41. have
  42. >>> print(stemmer2.stem("having"))
  43. having
  44. The 'english' stemmer is better than the original 'porter' stemmer.
  45. >>> print(SnowballStemmer("english").stem("generously"))
  46. generous
  47. >>> print(SnowballStemmer("porter").stem("generously"))
  48. gener
  49. .. note::
  50. Extra stemmer tests can be found in `nltk.test.unit.test_stem`.