featgram.doctest 28 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606
  1. .. Copyright (C) 2001-2020 NLTK Project
  2. .. For license information, see LICENSE.TXT
  3. =========================
  4. Feature Grammar Parsing
  5. =========================
  6. .. include:: ../../../nltk_book/definitions.rst
  7. Grammars can be parsed from strings.
  8. >>> import nltk
  9. >>> from nltk import grammar, parse
  10. >>> g = """
  11. ... % start DP
  12. ... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
  13. ... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
  14. ... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
  15. ... D[AGR=[NUM='pl', PERS=1]] -> 'we'
  16. ... D[AGR=[PERS=2]] -> 'you'
  17. ... N[AGR=[NUM='sg', GND='m']] -> 'boy'
  18. ... N[AGR=[NUM='pl', GND='m']] -> 'boys'
  19. ... N[AGR=[NUM='sg', GND='f']] -> 'girl'
  20. ... N[AGR=[NUM='pl', GND='f']] -> 'girls'
  21. ... N[AGR=[NUM='sg']] -> 'student'
  22. ... N[AGR=[NUM='pl']] -> 'students'
  23. ... """
  24. >>> grammar = grammar.FeatureGrammar.fromstring(g)
  25. >>> tokens = 'these girls'.split()
  26. >>> parser = parse.FeatureEarleyChartParser(grammar)
  27. >>> trees = parser.parse(tokens)
  28. >>> for tree in trees: print(tree)
  29. (DP[AGR=[GND='f', NUM='pl', PERS=3]]
  30. (D[AGR=[NUM='pl', PERS=3]] these)
  31. (N[AGR=[GND='f', NUM='pl']] girls))
  32. In general, when we are trying to develop even a very small grammar,
  33. it is convenient to put the rules in a file where they can be edited,
  34. tested and revised. Let's assume that we have saved feat0cfg_ as a file named
  35. ``'feat0.fcfg'`` and placed it in the NLTK ``data`` directory. We can
  36. inspect it as follows:
  37. .. _feat0cfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/feat0.fcfg
  38. >>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')
  39. % start S
  40. # ###################
  41. # Grammar Productions
  42. # ###################
  43. # S expansion productions
  44. S -> NP[NUM=?n] VP[NUM=?n]
  45. # NP expansion productions
  46. NP[NUM=?n] -> N[NUM=?n]
  47. NP[NUM=?n] -> PropN[NUM=?n]
  48. NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
  49. NP[NUM=pl] -> N[NUM=pl]
  50. # VP expansion productions
  51. VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]
  52. VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP
  53. # ###################
  54. # Lexical Productions
  55. # ###################
  56. Det[NUM=sg] -> 'this' | 'every'
  57. Det[NUM=pl] -> 'these' | 'all'
  58. Det -> 'the' | 'some' | 'several'
  59. PropN[NUM=sg]-> 'Kim' | 'Jody'
  60. N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'
  61. N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'
  62. IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks'
  63. TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
  64. IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk'
  65. TV[TENSE=pres, NUM=pl] -> 'see' | 'like'
  66. IV[TENSE=past] -> 'disappeared' | 'walked'
  67. TV[TENSE=past] -> 'saw' | 'liked'
  68. Assuming we have saved feat0cfg_ as a file named
  69. ``'feat0.fcfg'``, the function ``parse.load_parser`` allows us to
  70. read the grammar into NLTK, ready for use in parsing.
  71. >>> cp = parse.load_parser('grammars/book_grammars/feat0.fcfg', trace=1)
  72. >>> sent = 'Kim likes children'
  73. >>> tokens = sent.split()
  74. >>> tokens
  75. ['Kim', 'likes', 'children']
  76. >>> trees = cp.parse(tokens)
  77. |.Kim .like.chil.|
  78. |[----] . .| [0:1] 'Kim'
  79. |. [----] .| [1:2] 'likes'
  80. |. . [----]| [2:3] 'children'
  81. |[----] . .| [0:1] PropN[NUM='sg'] -> 'Kim' *
  82. |[----] . .| [0:1] NP[NUM='sg'] -> PropN[NUM='sg'] *
  83. |[----> . .| [0:1] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'}
  84. |. [----] .| [1:2] TV[NUM='sg', TENSE='pres'] -> 'likes' *
  85. |. [----> .| [1:2] VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'}
  86. |. . [----]| [2:3] N[NUM='pl'] -> 'children' *
  87. |. . [----]| [2:3] NP[NUM='pl'] -> N[NUM='pl'] *
  88. |. . [---->| [2:3] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}
  89. |. [---------]| [1:3] VP[NUM='sg', TENSE='pres'] -> TV[NUM='sg', TENSE='pres'] NP[] *
  90. |[==============]| [0:3] S[] -> NP[NUM='sg'] VP[NUM='sg'] *
  91. >>> for tree in trees: print(tree)
  92. (S[]
  93. (NP[NUM='sg'] (PropN[NUM='sg'] Kim))
  94. (VP[NUM='sg', TENSE='pres']
  95. (TV[NUM='sg', TENSE='pres'] likes)
  96. (NP[NUM='pl'] (N[NUM='pl'] children))))
  97. The parser works directly with
  98. the underspecified productions given by the grammar. That is, the
  99. Predictor rule does not attempt to compile out all admissible feature
  100. combinations before trying to expand the non-terminals on the left hand
  101. side of a production. However, when the Scanner matches an input word
  102. against a lexical production that has been predicted, the new edge will
  103. typically contain fully specified features; e.g., the edge
  104. [PropN[`num`:feat: = `sg`:fval:] |rarr| 'Kim', (0, 1)]. Recall from
  105. Chapter 8 that the Fundamental (or Completer) Rule in
  106. standard CFGs is used to combine an incomplete edge that's expecting a
  107. nonterminal *B* with a following, complete edge whose left hand side
  108. matches *B*. In our current setting, rather than checking for a
  109. complete match, we test whether the expected category *B* will
  110. `unify`:dt: with the left hand side *B'* of a following complete
  111. edge. We will explain in more detail in Section 9.2 how
  112. unification works; for the moment, it is enough to know that as a
  113. result of unification, any variable values of features in *B* will be
  114. instantiated by constant values in the corresponding feature structure
  115. in *B'*, and these instantiated values will be used in the new edge
  116. added by the Completer. This instantiation can be seen, for example,
  117. in the edge
  118. [NP [`num`:feat:\ =\ `sg`:fval:] |rarr| PropN[`num`:feat:\ =\ `sg`:fval:] |dot|, (0, 1)]
  119. in Example 9.2, where the feature `num`:feat: has been assigned the value `sg`:fval:.
  120. Feature structures in NLTK are ... Atomic feature values can be strings or
  121. integers.
  122. >>> fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')
  123. >>> print(fs1)
  124. [ NUM = 'sg' ]
  125. [ TENSE = 'past' ]
  126. We can think of a feature structure as being like a Python dictionary,
  127. and access its values by indexing in the usual way.
  128. >>> fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')
  129. >>> print(fs1['GND'])
  130. fem
  131. We can also define feature structures which have complex values, as
  132. discussed earlier.
  133. >>> fs2 = nltk.FeatStruct(POS='N', AGR=fs1)
  134. >>> print(fs2)
  135. [ [ GND = 'fem' ] ]
  136. [ AGR = [ NUM = 'pl' ] ]
  137. [ [ PER = 3 ] ]
  138. [ ]
  139. [ POS = 'N' ]
  140. >>> print(fs2['AGR'])
  141. [ GND = 'fem' ]
  142. [ NUM = 'pl' ]
  143. [ PER = 3 ]
  144. >>> print(fs2['AGR']['PER'])
  145. 3
  146. Feature structures can also be constructed using the ``parse()``
  147. method of the ``nltk.FeatStruct`` class. Note that in this case, atomic
  148. feature values do not need to be enclosed in quotes.
  149. >>> f1 = nltk.FeatStruct("[NUMBER = sg]")
  150. >>> f2 = nltk.FeatStruct("[PERSON = 3]")
  151. >>> print(nltk.unify(f1, f2))
  152. [ NUMBER = 'sg' ]
  153. [ PERSON = 3 ]
  154. >>> f1 = nltk.FeatStruct("[A = [B = b, D = d]]")
  155. >>> f2 = nltk.FeatStruct("[A = [C = c, D = d]]")
  156. >>> print(nltk.unify(f1, f2))
  157. [ [ B = 'b' ] ]
  158. [ A = [ C = 'c' ] ]
  159. [ [ D = 'd' ] ]
  160. Feature Structures as Graphs
  161. ----------------------------
  162. Feature structures are not inherently tied to linguistic objects; they are
  163. general purpose structures for representing knowledge. For example, we
  164. could encode information about a person in a feature structure:
  165. >>> person01 = nltk.FeatStruct("[NAME=Lee, TELNO='01 27 86 42 96',AGE=33]")
  166. >>> print(person01)
  167. [ AGE = 33 ]
  168. [ NAME = 'Lee' ]
  169. [ TELNO = '01 27 86 42 96' ]
  170. There are a number of notations for representing reentrancy in
  171. matrix-style representations of feature structures. In NLTK, we adopt
  172. the following convention: the first occurrence of a shared feature structure
  173. is prefixed with an integer in parentheses, such as ``(1)``, and any
  174. subsequent reference to that structure uses the notation
  175. ``->(1)``, as shown below.
  176. >>> fs = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
  177. ... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
  178. >>> print(fs)
  179. [ ADDRESS = (1) [ NUMBER = 74 ] ]
  180. [ [ STREET = 'rue Pascal' ] ]
  181. [ ]
  182. [ NAME = 'Lee' ]
  183. [ ]
  184. [ SPOUSE = [ ADDRESS -> (1) ] ]
  185. [ [ NAME = 'Kim' ] ]
  186. There can be any number of tags within a single feature structure.
  187. >>> fs3 = nltk.FeatStruct("[A=(1)[B=b], C=(2)[], D->(1), E->(2)]")
  188. >>> print(fs3)
  189. [ A = (1) [ B = 'b' ] ]
  190. [ ]
  191. [ C = (2) [] ]
  192. [ ]
  193. [ D -> (1) ]
  194. [ E -> (2) ]
  195. >>> fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
  196. >>> fs2 = nltk.FeatStruct(CITY='Paris')
  197. >>> print(nltk.unify(fs1, fs2))
  198. [ CITY = 'Paris' ]
  199. [ NUMBER = 74 ]
  200. [ STREET = 'rue Pascal' ]
  201. Unification is symmetric:
  202. >>> nltk.unify(fs1, fs2) == nltk.unify(fs2, fs1)
  203. True
  204. Unification is commutative:
  205. >>> fs3 = nltk.FeatStruct(TELNO='01 27 86 42 96')
  206. >>> nltk.unify(nltk.unify(fs1, fs2), fs3) == nltk.unify(fs1, nltk.unify(fs2, fs3))
  207. True
  208. Unification between `FS`:math:\ :subscript:`0` and `FS`:math:\
  209. :subscript:`1` will fail if the two feature structures share a path |pi|,
  210. but the value of |pi| in `FS`:math:\ :subscript:`0` is a distinct
  211. atom from the value of |pi| in `FS`:math:\ :subscript:`1`. In NLTK,
  212. this is implemented by setting the result of unification to be
  213. ``None``.
  214. >>> fs0 = nltk.FeatStruct(A='a')
  215. >>> fs1 = nltk.FeatStruct(A='b')
  216. >>> print(nltk.unify(fs0, fs1))
  217. None
  218. Now, if we look at how unification interacts with structure-sharing,
  219. things become really interesting.
  220. >>> fs0 = nltk.FeatStruct("""[NAME=Lee,
  221. ... ADDRESS=[NUMBER=74,
  222. ... STREET='rue Pascal'],
  223. ... SPOUSE= [NAME=Kim,
  224. ... ADDRESS=[NUMBER=74,
  225. ... STREET='rue Pascal']]]""")
  226. >>> print(fs0)
  227. [ ADDRESS = [ NUMBER = 74 ] ]
  228. [ [ STREET = 'rue Pascal' ] ]
  229. [ ]
  230. [ NAME = 'Lee' ]
  231. [ ]
  232. [ [ ADDRESS = [ NUMBER = 74 ] ] ]
  233. [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
  234. [ [ ] ]
  235. [ [ NAME = 'Kim' ] ]
  236. >>> fs1 = nltk.FeatStruct("[SPOUSE=[ADDRESS=[CITY=Paris]]]")
  237. >>> print(nltk.unify(fs0, fs1))
  238. [ ADDRESS = [ NUMBER = 74 ] ]
  239. [ [ STREET = 'rue Pascal' ] ]
  240. [ ]
  241. [ NAME = 'Lee' ]
  242. [ ]
  243. [ [ [ CITY = 'Paris' ] ] ]
  244. [ [ ADDRESS = [ NUMBER = 74 ] ] ]
  245. [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
  246. [ [ ] ]
  247. [ [ NAME = 'Kim' ] ]
  248. >>> fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
  249. ... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
  250. >>> print(fs2)
  251. [ ADDRESS = (1) [ NUMBER = 74 ] ]
  252. [ [ STREET = 'rue Pascal' ] ]
  253. [ ]
  254. [ NAME = 'Lee' ]
  255. [ ]
  256. [ SPOUSE = [ ADDRESS -> (1) ] ]
  257. [ [ NAME = 'Kim' ] ]
  258. >>> print(nltk.unify(fs2, fs1))
  259. [ [ CITY = 'Paris' ] ]
  260. [ ADDRESS = (1) [ NUMBER = 74 ] ]
  261. [ [ STREET = 'rue Pascal' ] ]
  262. [ ]
  263. [ NAME = 'Lee' ]
  264. [ ]
  265. [ SPOUSE = [ ADDRESS -> (1) ] ]
  266. [ [ NAME = 'Kim' ] ]
  267. >>> fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
  268. >>> fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")
  269. >>> print(fs2)
  270. [ ADDRESS1 = ?x ]
  271. [ ADDRESS2 = ?x ]
  272. >>> print(nltk.unify(fs1, fs2))
  273. [ ADDRESS1 = (1) [ NUMBER = 74 ] ]
  274. [ [ STREET = 'rue Pascal' ] ]
  275. [ ]
  276. [ ADDRESS2 -> (1) ]
  277. >>> sent = 'who do you claim that you like'
  278. >>> tokens = sent.split()
  279. >>> cp = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1)
  280. >>> trees = cp.parse(tokens)
  281. |.w.d.y.c.t.y.l.|
  282. |[-] . . . . . .| [0:1] 'who'
  283. |. [-] . . . . .| [1:2] 'do'
  284. |. . [-] . . . .| [2:3] 'you'
  285. |. . . [-] . . .| [3:4] 'claim'
  286. |. . . . [-] . .| [4:5] 'that'
  287. |. . . . . [-] .| [5:6] 'you'
  288. |. . . . . . [-]| [6:7] 'like'
  289. |# . . . . . . .| [0:0] NP[]/NP[] -> *
  290. |. # . . . . . .| [1:1] NP[]/NP[] -> *
  291. |. . # . . . . .| [2:2] NP[]/NP[] -> *
  292. |. . . # . . . .| [3:3] NP[]/NP[] -> *
  293. |. . . . # . . .| [4:4] NP[]/NP[] -> *
  294. |. . . . . # . .| [5:5] NP[]/NP[] -> *
  295. |. . . . . . # .| [6:6] NP[]/NP[] -> *
  296. |. . . . . . . #| [7:7] NP[]/NP[] -> *
  297. |[-] . . . . . .| [0:1] NP[+WH] -> 'who' *
  298. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {}
  299. |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  300. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {}
  301. |. [-] . . . . .| [1:2] V[+AUX] -> 'do' *
  302. |. [-> . . . . .| [1:2] S[+INV] -> V[+AUX] * NP[] VP[] {}
  303. |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {}
  304. |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {}
  305. |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {}
  306. |. . [-] . . . .| [2:3] NP[-WH] -> 'you' *
  307. |. . [-> . . . .| [2:3] S[-INV] -> NP[] * VP[] {}
  308. |. . [-> . . . .| [2:3] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  309. |. . [-> . . . .| [2:3] S[-INV] -> NP[] * S[]/NP[] {}
  310. |. [---> . . . .| [1:3] S[+INV] -> V[+AUX] NP[] * VP[] {}
  311. |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {}
  312. |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' *
  313. |. . . [-> . . .| [3:4] VP[] -> V[-AUX, SUBCAT='clause'] * SBar[] {}
  314. |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {}
  315. |. . . . [-] . .| [4:5] Comp[] -> 'that' *
  316. |. . . . [-> . .| [4:5] SBar[] -> Comp[] * S[-INV] {}
  317. |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {}
  318. |. . . . . [-] .| [5:6] NP[-WH] -> 'you' *
  319. |. . . . . [-> .| [5:6] S[-INV] -> NP[] * VP[] {}
  320. |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  321. |. . . . . [-> .| [5:6] S[-INV] -> NP[] * S[]/NP[] {}
  322. |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' *
  323. |. . . . . . [->| [6:7] VP[] -> V[-AUX, SUBCAT='trans'] * NP[] {}
  324. |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {}
  325. |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] *
  326. |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
  327. |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] *
  328. |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] *
  329. |. . [---------]| [2:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
  330. |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] *
  331. |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] *
  332. >>> trees = list(trees)
  333. >>> for tree in trees: print(tree)
  334. (S[-INV]
  335. (NP[+WH] who)
  336. (S[+INV]/NP[]
  337. (V[+AUX] do)
  338. (NP[-WH] you)
  339. (VP[]/NP[]
  340. (V[-AUX, SUBCAT='clause'] claim)
  341. (SBar[]/NP[]
  342. (Comp[] that)
  343. (S[-INV]/NP[]
  344. (NP[-WH] you)
  345. (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))
  346. A different parser should give the same parse trees, but perhaps in a different order:
  347. >>> cp2 = parse.load_parser('grammars/book_grammars/feat1.fcfg', trace=1,
  348. ... parser=parse.FeatureEarleyChartParser)
  349. >>> trees2 = cp2.parse(tokens)
  350. |.w.d.y.c.t.y.l.|
  351. |[-] . . . . . .| [0:1] 'who'
  352. |. [-] . . . . .| [1:2] 'do'
  353. |. . [-] . . . .| [2:3] 'you'
  354. |. . . [-] . . .| [3:4] 'claim'
  355. |. . . . [-] . .| [4:5] 'that'
  356. |. . . . . [-] .| [5:6] 'you'
  357. |. . . . . . [-]| [6:7] 'like'
  358. |> . . . . . . .| [0:0] S[-INV] -> * NP[] VP[] {}
  359. |> . . . . . . .| [0:0] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
  360. |> . . . . . . .| [0:0] S[-INV] -> * NP[] S[]/NP[] {}
  361. |> . . . . . . .| [0:0] S[-INV] -> * Adv[+NEG] S[+INV] {}
  362. |> . . . . . . .| [0:0] S[+INV] -> * V[+AUX] NP[] VP[] {}
  363. |> . . . . . . .| [0:0] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {}
  364. |> . . . . . . .| [0:0] NP[+WH] -> * 'who' {}
  365. |[-] . . . . . .| [0:1] NP[+WH] -> 'who' *
  366. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * VP[] {}
  367. |[-> . . . . . .| [0:1] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  368. |[-> . . . . . .| [0:1] S[-INV] -> NP[] * S[]/NP[] {}
  369. |. > . . . . . .| [1:1] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
  370. |. > . . . . . .| [1:1] S[+INV]/?x[] -> * V[+AUX] NP[] VP[]/?x[] {}
  371. |. > . . . . . .| [1:1] V[+AUX] -> * 'do' {}
  372. |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  373. |. > . . . . . .| [1:1] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  374. |. > . . . . . .| [1:1] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  375. |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='intrans'] {}
  376. |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {}
  377. |. > . . . . . .| [1:1] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {}
  378. |. > . . . . . .| [1:1] VP[] -> * V[+AUX] VP[] {}
  379. |. [-] . . . . .| [1:2] V[+AUX] -> 'do' *
  380. |. [-> . . . . .| [1:2] S[+INV]/?x[] -> V[+AUX] * NP[] VP[]/?x[] {}
  381. |. [-> . . . . .| [1:2] VP[]/?x[] -> V[+AUX] * VP[]/?x[] {}
  382. |. [-> . . . . .| [1:2] VP[] -> V[+AUX] * VP[] {}
  383. |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='intrans'] {}
  384. |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='trans'] NP[] {}
  385. |. . > . . . . .| [2:2] VP[] -> * V[-AUX, SUBCAT='clause'] SBar[] {}
  386. |. . > . . . . .| [2:2] VP[] -> * V[+AUX] VP[] {}
  387. |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  388. |. . > . . . . .| [2:2] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  389. |. . > . . . . .| [2:2] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  390. |. . > . . . . .| [2:2] NP[-WH] -> * 'you' {}
  391. |. . [-] . . . .| [2:3] NP[-WH] -> 'you' *
  392. |. [---> . . . .| [1:3] S[+INV]/?x[] -> V[+AUX] NP[] * VP[]/?x[] {}
  393. |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  394. |. . . > . . . .| [3:3] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  395. |. . . > . . . .| [3:3] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  396. |. . . > . . . .| [3:3] V[-AUX, SUBCAT='clause'] -> * 'claim' {}
  397. |. . . [-] . . .| [3:4] V[-AUX, SUBCAT='clause'] -> 'claim' *
  398. |. . . [-> . . .| [3:4] VP[]/?x[] -> V[-AUX, SUBCAT='clause'] * SBar[]/?x[] {}
  399. |. . . . > . . .| [4:4] SBar[]/?x[] -> * Comp[] S[-INV]/?x[] {}
  400. |. . . . > . . .| [4:4] Comp[] -> * 'that' {}
  401. |. . . . [-] . .| [4:5] Comp[] -> 'that' *
  402. |. . . . [-> . .| [4:5] SBar[]/?x[] -> Comp[] * S[-INV]/?x[] {}
  403. |. . . . . > . .| [5:5] S[-INV]/?x[] -> * NP[] VP[]/?x[] {}
  404. |. . . . . > . .| [5:5] NP[-WH] -> * 'you' {}
  405. |. . . . . [-] .| [5:6] NP[-WH] -> 'you' *
  406. |. . . . . [-> .| [5:6] S[-INV]/?x[] -> NP[] * VP[]/?x[] {}
  407. |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='trans'] NP[]/?x[] {}
  408. |. . . . . . > .| [6:6] VP[]/?x[] -> * V[-AUX, SUBCAT='clause'] SBar[]/?x[] {}
  409. |. . . . . . > .| [6:6] VP[]/?x[] -> * V[+AUX] VP[]/?x[] {}
  410. |. . . . . . > .| [6:6] V[-AUX, SUBCAT='trans'] -> * 'like' {}
  411. |. . . . . . [-]| [6:7] V[-AUX, SUBCAT='trans'] -> 'like' *
  412. |. . . . . . [->| [6:7] VP[]/?x[] -> V[-AUX, SUBCAT='trans'] * NP[]/?x[] {}
  413. |. . . . . . . #| [7:7] NP[]/NP[] -> *
  414. |. . . . . . [-]| [6:7] VP[]/NP[] -> V[-AUX, SUBCAT='trans'] NP[]/NP[] *
  415. |. . . . . [---]| [5:7] S[-INV]/NP[] -> NP[] VP[]/NP[] *
  416. |. . . . [-----]| [4:7] SBar[]/NP[] -> Comp[] S[-INV]/NP[] *
  417. |. . . [-------]| [3:7] VP[]/NP[] -> V[-AUX, SUBCAT='clause'] SBar[]/NP[] *
  418. |. [-----------]| [1:7] S[+INV]/NP[] -> V[+AUX] NP[] VP[]/NP[] *
  419. |[=============]| [0:7] S[-INV] -> NP[] S[]/NP[] *
  420. >>> sorted(trees) == sorted(trees2)
  421. True
  422. Let's load a German grammar:
  423. >>> cp = parse.load_parser('grammars/book_grammars/german.fcfg', trace=0)
  424. >>> sent = 'die Katze sieht den Hund'
  425. >>> tokens = sent.split()
  426. >>> trees = cp.parse(tokens)
  427. >>> for tree in trees: print(tree)
  428. (S[]
  429. (NP[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom']
  430. (Det[AGR=[GND='fem', NUM='sg', PER=3], CASE='nom'] die)
  431. (N[AGR=[GND='fem', NUM='sg', PER=3]] Katze))
  432. (VP[AGR=[NUM='sg', PER=3]]
  433. (TV[AGR=[NUM='sg', PER=3], OBJCASE='acc'] sieht)
  434. (NP[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc']
  435. (Det[AGR=[GND='masc', NUM='sg', PER=3], CASE='acc'] den)
  436. (N[AGR=[GND='masc', NUM='sg', PER=3]] Hund))))
  437. Grammar with Binding Operators
  438. ------------------------------
  439. The `bindop.fcfg`_ grammar is a semantic grammar that uses lambda
  440. calculus. Each element has a core semantics, which is a single lambda
  441. calculus expression; and a set of binding operators, which bind
  442. variables.
  443. .. _bindop.fcfg: http://nltk.svn.sourceforge.net/svnroot/nltk/trunk/nltk/data/grammars/bindop.fcfg
  444. In order to make the binding operators work right, they need to
  445. instantiate their bound variable every time they are added to the
  446. chart. To do this, we use a special subclass of `Chart`, called
  447. `InstantiateVarsChart`.
  448. >>> from nltk.parse.featurechart import InstantiateVarsChart
  449. >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=1,
  450. ... chart_class=InstantiateVarsChart)
  451. >>> print(cp.grammar())
  452. Grammar with 15 productions (start state = S[])
  453. S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] VP[SEM=[BO=?b2, CORE=?vp]]
  454. VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] NP[SEM=[BO=?b2, CORE=?obj]]
  455. VP[SEM=?s] -> IV[SEM=?s]
  456. NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] N[SEM=[BO=?b2, CORE=?n]]
  457. Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a'
  458. N[SEM=[BO={/}, CORE=<dog>]] -> 'dog'
  459. N[SEM=[BO={/}, CORE=<dog>]] -> 'cat'
  460. N[SEM=[BO={/}, CORE=<dog>]] -> 'mouse'
  461. IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks'
  462. IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'eats'
  463. IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'walks'
  464. TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds'
  465. TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'walks'
  466. NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'john'
  467. NP[SEM=[BO={bo(\P.P(John),@x)}, CORE=<@x>]] -> 'alex'
  468. A simple intransitive sentence:
  469. >>> from nltk.sem import logic
  470. >>> logic._counter._value = 100
  471. >>> trees = cp.parse('john barks'.split())
  472. |. john.barks.|
  473. |[-----] .| [0:1] 'john'
  474. |. [-----]| [1:2] 'barks'
  475. |[-----] .| [0:1] NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] -> 'john' *
  476. |[-----> .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>}
  477. |. [-----]| [1:2] IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> 'barks' *
  478. |. [-----]| [1:2] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] -> IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] *
  479. |[===========]| [0:2] S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={/}, CORE=<\x.bark(x)>]] *
  480. >>> for tree in trees: print(tree)
  481. (S[SEM=[BO={bo(\P.P(John),z2)}, CORE=<bark(z2)>]]
  482. (NP[SEM=[BO={bo(\P.P(John),z101)}, CORE=<z101>]] john)
  483. (VP[SEM=[BO={/}, CORE=<\x.bark(x)>]]
  484. (IV[SEM=[BO={/}, CORE=<\x.bark(x)>]] barks)))
  485. A transitive sentence:
  486. >>> trees = cp.parse('john feeds a dog'.split())
  487. |.joh.fee. a .dog.|
  488. |[---] . . .| [0:1] 'john'
  489. |. [---] . .| [1:2] 'feeds'
  490. |. . [---] .| [2:3] 'a'
  491. |. . . [---]| [3:4] 'dog'
  492. |[---] . . .| [0:1] NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] -> 'john' *
  493. |[---> . . .| [0:1] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.P(John),z2)}, ?subj: <IndividualVariableExpression z2>}
  494. |. [---] . .| [1:2] TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] -> 'feeds' *
  495. |. [---> . .| [1:2] VP[SEM=[BO={?b1+?b2}, CORE=<?v(?obj)>]] -> TV[SEM=[BO=?b1, CORE=?v]] * NP[SEM=[BO=?b2, CORE=?obj]] {?b1: {/}, ?v: <LambdaExpression \x y.feed(y,x)>}
  496. |. . [---] .| [2:3] Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] -> 'a' *
  497. |. . [---> .| [2:3] NP[SEM=[BO={?b1+?b2+{bo(?det(?n),@x)}}, CORE=<@x>]] -> Det[SEM=[BO=?b1, CORE=?det]] * N[SEM=[BO=?b2, CORE=?n]] {?b1: {/}, ?det: <LambdaExpression \Q P.exists x.(Q(x) & P(x))>}
  498. |. . . [---]| [3:4] N[SEM=[BO={/}, CORE=<dog>]] -> 'dog' *
  499. |. . [-------]| [2:4] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]] -> Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] N[SEM=[BO={/}, CORE=<dog>]] *
  500. |. . [------->| [2:4] S[SEM=[BO={?b1+?b2}, CORE=<?vp(?subj)>]] -> NP[SEM=[BO=?b1, CORE=?subj]] * VP[SEM=[BO=?b2, CORE=?vp]] {?b1: {bo(\P.exists x.(dog(x) & P(x)),z2)}, ?subj: <IndividualVariableExpression z2>}
  501. |. [-----------]| [1:4] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]] -> TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<z2>]] *
  502. |[===============]| [0:4] S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]] -> NP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<z2>]] VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<\y.feed(y,z3)>]] *
  503. >>> for tree in trees: print(tree)
  504. (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]]
  505. (NP[SEM=[BO={bo(\P.P(John),z102)}, CORE=<z102>]] john)
  506. (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]]
  507. (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
  508. (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z103)}, CORE=<z103>]]
  509. (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
  510. (N[SEM=[BO={/}, CORE=<dog>]] dog))))
  511. Turn down the verbosity:
  512. >>> cp = parse.load_parser('grammars/sample_grammars/bindop.fcfg', trace=0,
  513. ... chart_class=InstantiateVarsChart)
  514. Reuse the same lexical item twice:
  515. >>> trees = cp.parse('john feeds john'.split())
  516. >>> for tree in trees: print(tree)
  517. (S[SEM=[BO={bo(\P.P(John),z2), bo(\P.P(John),z3)}, CORE=<feed(z2,z3)>]]
  518. (NP[SEM=[BO={bo(\P.P(John),z104)}, CORE=<z104>]] john)
  519. (VP[SEM=[BO={bo(\P.P(John),z2)}, CORE=<\y.feed(y,z2)>]]
  520. (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
  521. (NP[SEM=[BO={bo(\P.P(John),z105)}, CORE=<z105>]] john)))
  522. >>> trees = cp.parse('a dog feeds a dog'.split())
  523. >>> for tree in trees: print(tree)
  524. (S[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2), bo(\P.exists x.(dog(x) & P(x)),z3)}, CORE=<feed(z2,z3)>]]
  525. (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z106)}, CORE=<z106>]]
  526. (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
  527. (N[SEM=[BO={/}, CORE=<dog>]] dog))
  528. (VP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z2)}, CORE=<\y.feed(y,z2)>]]
  529. (TV[SEM=[BO={/}, CORE=<\x y.feed(y,x)>]] feeds)
  530. (NP[SEM=[BO={bo(\P.exists x.(dog(x) & P(x)),z107)}, CORE=<z107>]]
  531. (Det[SEM=[BO={/}, CORE=<\Q P.exists x.(Q(x) & P(x))>]] a)
  532. (N[SEM=[BO={/}, CORE=<dog>]] dog))))