|
1 |
|
2 :mod:`parser` --- Access Python parse trees |
|
3 =========================================== |
|
4 |
|
5 .. module:: parser |
|
6 :synopsis: Access parse trees for Python source code. |
|
7 .. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
|
8 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
|
9 |
|
10 |
|
11 .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred |
|
12 L. Drake, Jr. This copyright notice must be distributed on all copies, but |
|
13 this document otherwise may be distributed as part of the Python |
|
14 distribution. No fee may be charged for this document in any representation, |
|
15 either on paper or electronically. This restriction does not affect other |
|
16 elements in a distributed package in any way. |
|
17 |
|
18 .. index:: single: parsing; Python source code |
|
19 |
|
20 The :mod:`parser` module provides an interface to Python's internal parser and |
|
21 byte-code compiler. The primary purpose for this interface is to allow Python |
|
22 code to edit the parse tree of a Python expression and create executable code |
|
23 from this. This is better than trying to parse and modify an arbitrary Python |
|
24 code fragment as a string because parsing is performed in a manner identical to |
|
25 the code forming the application. It is also faster. |
|
26 |
|
27 .. note:: |
|
28 |
|
29 From Python 2.5 onward, it's much more convenient to cut in at the Abstract |
|
30 Syntax Tree (AST) generation and compilation stage, using the :mod:`ast` |
|
31 module. |
|
32 |
|
33 The :mod:`parser` module exports the names documented here also with "st" |
|
34 replaced by "ast"; this is a legacy from the time when there was no other |
|
35 AST and has nothing to do with the AST found in Python 2.5. This is also the |
|
36 reason for the functions' keyword arguments being called *ast*, not *st*. |
|
37 The "ast" functions will be removed in Python 3.0. |
|
38 |
|
39 There are a few things to note about this module which are important to making |
|
40 use of the data structures created. This is not a tutorial on editing the parse |
|
41 trees for Python code, but some examples of using the :mod:`parser` module are |
|
42 presented. |
|
43 |
|
44 Most importantly, a good understanding of the Python grammar processed by the |
|
45 internal parser is required. For full information on the language syntax, refer |
|
46 to :ref:`reference-index`. The parser |
|
47 itself is created from a grammar specification defined in the file |
|
48 :file:`Grammar/Grammar` in the standard Python distribution. The parse trees |
|
49 stored in the ST objects created by this module are the actual output from the |
|
50 internal parser when created by the :func:`expr` or :func:`suite` functions, |
|
51 described below. The ST objects created by :func:`sequence2st` faithfully |
|
52 simulate those structures. Be aware that the values of the sequences which are |
|
53 considered "correct" will vary from one version of Python to another as the |
|
54 formal grammar for the language is revised. However, transporting code from one |
|
55 Python version to another as source text will always allow correct parse trees |
|
56 to be created in the target version, with the only restriction being that |
|
57 migrating to an older version of the interpreter will not support more recent |
|
58 language constructs. The parse trees are not typically compatible from one |
|
59 version to another, whereas source code has always been forward-compatible. |
|
60 |
|
61 Each element of the sequences returned by :func:`st2list` or :func:`st2tuple` |
|
62 has a simple form. Sequences representing non-terminal elements in the grammar |
|
63 always have a length greater than one. The first element is an integer which |
|
64 identifies a production in the grammar. These integers are given symbolic names |
|
65 in the C header file :file:`Include/graminit.h` and the Python module |
|
66 :mod:`symbol`. Each additional element of the sequence represents a component |
|
67 of the production as recognized in the input string: these are always sequences |
|
68 which have the same form as the parent. An important aspect of this structure |
|
69 which should be noted is that keywords used to identify the parent node type, |
|
70 such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the |
|
71 node tree without any special treatment. For example, the :keyword:`if` keyword |
|
72 is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value |
|
73 associated with all :const:`NAME` tokens, including variable and function names |
|
74 defined by the user. In an alternate form returned when line number information |
|
75 is requested, the same token might be represented as ``(1, 'if', 12)``, where |
|
76 the ``12`` represents the line number at which the terminal symbol was found. |
|
77 |
|
78 Terminal elements are represented in much the same way, but without any child |
|
79 elements and the addition of the source text which was identified. The example |
|
80 of the :keyword:`if` keyword above is representative. The various types of |
|
81 terminal symbols are defined in the C header file :file:`Include/token.h` and |
|
82 the Python module :mod:`token`. |
|
83 |
|
84 The ST objects are not required to support the functionality of this module, |
|
85 but are provided for three purposes: to allow an application to amortize the |
|
86 cost of processing complex parse trees, to provide a parse tree representation |
|
87 which conserves memory space when compared to the Python list or tuple |
|
88 representation, and to ease the creation of additional modules in C which |
|
89 manipulate parse trees. A simple "wrapper" class may be created in Python to |
|
90 hide the use of ST objects. |
|
91 |
|
92 The :mod:`parser` module defines functions for a few distinct purposes. The |
|
93 most important purposes are to create ST objects and to convert ST objects to |
|
94 other representations such as parse trees and compiled code objects, but there |
|
95 are also functions which serve to query the type of parse tree represented by an |
|
96 ST object. |
|
97 |
|
98 |
|
99 .. seealso:: |
|
100 |
|
101 Module :mod:`symbol` |
|
102 Useful constants representing internal nodes of the parse tree. |
|
103 |
|
104 Module :mod:`token` |
|
105 Useful constants representing leaf nodes of the parse tree and functions for |
|
106 testing node values. |
|
107 |
|
108 |
|
109 .. _creating-sts: |
|
110 |
|
111 Creating ST Objects |
|
112 ------------------- |
|
113 |
|
114 ST objects may be created from source code or from a parse tree. When creating |
|
115 an ST object from source, different functions are used to create the ``'eval'`` |
|
116 and ``'exec'`` forms. |
|
117 |
|
118 |
|
119 .. function:: expr(source) |
|
120 |
|
121 The :func:`expr` function parses the parameter *source* as if it were an input |
|
122 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an ST object |
|
123 is created to hold the internal parse tree representation, otherwise an |
|
124 appropriate exception is thrown. |
|
125 |
|
126 |
|
127 .. function:: suite(source) |
|
128 |
|
129 The :func:`suite` function parses the parameter *source* as if it were an input |
|
130 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an ST object |
|
131 is created to hold the internal parse tree representation, otherwise an |
|
132 appropriate exception is thrown. |
|
133 |
|
134 |
|
135 .. function:: sequence2st(sequence) |
|
136 |
|
137 This function accepts a parse tree represented as a sequence and builds an |
|
138 internal representation if possible. If it can validate that the tree conforms |
|
139 to the Python grammar and all nodes are valid node types in the host version of |
|
140 Python, an ST object is created from the internal representation and returned |
|
141 to the called. If there is a problem creating the internal representation, or |
|
142 if the tree cannot be validated, a :exc:`ParserError` exception is thrown. An |
|
143 ST object created this way should not be assumed to compile correctly; normal |
|
144 exceptions thrown by compilation may still be initiated when the ST object is |
|
145 passed to :func:`compilest`. This may indicate problems not related to syntax |
|
146 (such as a :exc:`MemoryError` exception), but may also be due to constructs such |
|
147 as the result of parsing ``del f(0)``, which escapes the Python parser but is |
|
148 checked by the bytecode compiler. |
|
149 |
|
150 Sequences representing terminal tokens may be represented as either two-element |
|
151 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1, |
|
152 'name', 56)``. If the third element is present, it is assumed to be a valid |
|
153 line number. The line number may be specified for any subset of the terminal |
|
154 symbols in the input tree. |
|
155 |
|
156 |
|
157 .. function:: tuple2st(sequence) |
|
158 |
|
159 This is the same function as :func:`sequence2st`. This entry point is |
|
160 maintained for backward compatibility. |
|
161 |
|
162 |
|
163 .. _converting-sts: |
|
164 |
|
165 Converting ST Objects |
|
166 --------------------- |
|
167 |
|
168 ST objects, regardless of the input used to create them, may be converted to |
|
169 parse trees represented as list- or tuple- trees, or may be compiled into |
|
170 executable code objects. Parse trees may be extracted with or without line |
|
171 numbering information. |
|
172 |
|
173 |
|
174 .. function:: st2list(ast[, line_info]) |
|
175 |
|
176 This function accepts an ST object from the caller in *ast* and returns a |
|
177 Python list representing the equivalent parse tree. The resulting list |
|
178 representation can be used for inspection or the creation of a new parse tree in |
|
179 list form. This function does not fail so long as memory is available to build |
|
180 the list representation. If the parse tree will only be used for inspection, |
|
181 :func:`st2tuple` should be used instead to reduce memory consumption and |
|
182 fragmentation. When the list representation is required, this function is |
|
183 significantly faster than retrieving a tuple representation and converting that |
|
184 to nested lists. |
|
185 |
|
186 If *line_info* is true, line number information will be included for all |
|
187 terminal tokens as a third element of the list representing the token. Note |
|
188 that the line number provided specifies the line on which the token *ends*. |
|
189 This information is omitted if the flag is false or omitted. |
|
190 |
|
191 |
|
192 .. function:: st2tuple(ast[, line_info]) |
|
193 |
|
194 This function accepts an ST object from the caller in *ast* and returns a |
|
195 Python tuple representing the equivalent parse tree. Other than returning a |
|
196 tuple instead of a list, this function is identical to :func:`st2list`. |
|
197 |
|
198 If *line_info* is true, line number information will be included for all |
|
199 terminal tokens as a third element of the list representing the token. This |
|
200 information is omitted if the flag is false or omitted. |
|
201 |
|
202 |
|
203 .. function:: compilest(ast[, filename='<syntax-tree>']) |
|
204 |
|
205 .. index:: builtin: eval |
|
206 |
|
207 The Python byte compiler can be invoked on an ST object to produce code objects |
|
208 which can be used as part of an :keyword:`exec` statement or a call to the |
|
209 built-in :func:`eval` function. This function provides the interface to the |
|
210 compiler, passing the internal parse tree from *ast* to the parser, using the |
|
211 source file name specified by the *filename* parameter. The default value |
|
212 supplied for *filename* indicates that the source was an ST object. |
|
213 |
|
214 Compiling an ST object may result in exceptions related to compilation; an |
|
215 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``: |
|
216 this statement is considered legal within the formal grammar for Python but is |
|
217 not a legal language construct. The :exc:`SyntaxError` raised for this |
|
218 condition is actually generated by the Python byte-compiler normally, which is |
|
219 why it can be raised at this point by the :mod:`parser` module. Most causes of |
|
220 compilation failure can be diagnosed programmatically by inspection of the parse |
|
221 tree. |
|
222 |
|
223 |
|
224 .. _querying-sts: |
|
225 |
|
226 Queries on ST Objects |
|
227 --------------------- |
|
228 |
|
229 Two functions are provided which allow an application to determine if an ST was |
|
230 created as an expression or a suite. Neither of these functions can be used to |
|
231 determine if an ST was created from source code via :func:`expr` or |
|
232 :func:`suite` or from a parse tree via :func:`sequence2st`. |
|
233 |
|
234 |
|
235 .. function:: isexpr(ast) |
|
236 |
|
237 .. index:: builtin: compile |
|
238 |
|
239 When *ast* represents an ``'eval'`` form, this function returns true, otherwise |
|
240 it returns false. This is useful, since code objects normally cannot be queried |
|
241 for this information using existing built-in functions. Note that the code |
|
242 objects created by :func:`compilest` cannot be queried like this either, and |
|
243 are identical to those created by the built-in :func:`compile` function. |
|
244 |
|
245 |
|
246 .. function:: issuite(ast) |
|
247 |
|
248 This function mirrors :func:`isexpr` in that it reports whether an ST object |
|
249 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to |
|
250 assume that this function is equivalent to ``not isexpr(ast)``, as additional |
|
251 syntactic fragments may be supported in the future. |
|
252 |
|
253 |
|
254 .. _st-errors: |
|
255 |
|
256 Exceptions and Error Handling |
|
257 ----------------------------- |
|
258 |
|
259 The parser module defines a single exception, but may also pass other built-in |
|
260 exceptions from other portions of the Python runtime environment. See each |
|
261 function for information about the exceptions it can raise. |
|
262 |
|
263 |
|
264 .. exception:: ParserError |
|
265 |
|
266 Exception raised when a failure occurs within the parser module. This is |
|
267 generally produced for validation failures rather than the built in |
|
268 :exc:`SyntaxError` thrown during normal parsing. The exception argument is |
|
269 either a string describing the reason of the failure or a tuple containing a |
|
270 sequence causing the failure from a parse tree passed to :func:`sequence2st` |
|
271 and an explanatory string. Calls to :func:`sequence2st` need to be able to |
|
272 handle either type of exception, while calls to other functions in the module |
|
273 will only need to be aware of the simple string values. |
|
274 |
|
275 Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may |
|
276 throw exceptions which are normally thrown by the parsing and compilation |
|
277 process. These include the built in exceptions :exc:`MemoryError`, |
|
278 :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these |
|
279 cases, these exceptions carry all the meaning normally associated with them. |
|
280 Refer to the descriptions of each function for detailed information. |
|
281 |
|
282 |
|
283 .. _st-objects: |
|
284 |
|
285 ST Objects |
|
286 ---------- |
|
287 |
|
288 Ordered and equality comparisons are supported between ST objects. Pickling of |
|
289 ST objects (using the :mod:`pickle` module) is also supported. |
|
290 |
|
291 |
|
292 .. data:: STType |
|
293 |
|
294 The type of the objects returned by :func:`expr`, :func:`suite` and |
|
295 :func:`sequence2st`. |
|
296 |
|
297 ST objects have the following methods: |
|
298 |
|
299 |
|
300 .. method:: ST.compile([filename]) |
|
301 |
|
302 Same as ``compilest(st, filename)``. |
|
303 |
|
304 |
|
305 .. method:: ST.isexpr() |
|
306 |
|
307 Same as ``isexpr(st)``. |
|
308 |
|
309 |
|
310 .. method:: ST.issuite() |
|
311 |
|
312 Same as ``issuite(st)``. |
|
313 |
|
314 |
|
315 .. method:: ST.tolist([line_info]) |
|
316 |
|
317 Same as ``st2list(st, line_info)``. |
|
318 |
|
319 |
|
320 .. method:: ST.totuple([line_info]) |
|
321 |
|
322 Same as ``st2tuple(st, line_info)``. |
|
323 |
|
324 |
|
325 .. _st-examples: |
|
326 |
|
327 Examples |
|
328 -------- |
|
329 |
|
330 .. index:: builtin: compile |
|
331 |
|
332 The parser modules allows operations to be performed on the parse tree of Python |
|
333 source code before the :term:`bytecode` is generated, and provides for inspection of the |
|
334 parse tree for information gathering purposes. Two examples are presented. The |
|
335 simple example demonstrates emulation of the :func:`compile` built-in function |
|
336 and the complex example shows the use of a parse tree for information discovery. |
|
337 |
|
338 |
|
339 Emulation of :func:`compile` |
|
340 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
341 |
|
342 While many useful operations may take place between parsing and bytecode |
|
343 generation, the simplest operation is to do nothing. For this purpose, using |
|
344 the :mod:`parser` module to produce an intermediate data structure is equivalent |
|
345 to the code :: |
|
346 |
|
347 >>> code = compile('a + 5', 'file.py', 'eval') |
|
348 >>> a = 5 |
|
349 >>> eval(code) |
|
350 10 |
|
351 |
|
352 The equivalent operation using the :mod:`parser` module is somewhat longer, and |
|
353 allows the intermediate internal parse tree to be retained as an ST object:: |
|
354 |
|
355 >>> import parser |
|
356 >>> st = parser.expr('a + 5') |
|
357 >>> code = st.compile('file.py') |
|
358 >>> a = 5 |
|
359 >>> eval(code) |
|
360 10 |
|
361 |
|
362 An application which needs both ST and code objects can package this code into |
|
363 readily available functions:: |
|
364 |
|
365 import parser |
|
366 |
|
367 def load_suite(source_string): |
|
368 st = parser.suite(source_string) |
|
369 return st, st.compile() |
|
370 |
|
371 def load_expression(source_string): |
|
372 st = parser.expr(source_string) |
|
373 return st, st.compile() |
|
374 |
|
375 |
|
376 Information Discovery |
|
377 ^^^^^^^^^^^^^^^^^^^^^ |
|
378 |
|
379 .. index:: |
|
380 single: string; documentation |
|
381 single: docstrings |
|
382 |
|
383 Some applications benefit from direct access to the parse tree. The remainder |
|
384 of this section demonstrates how the parse tree provides access to module |
|
385 documentation defined in docstrings without requiring that the code being |
|
386 examined be loaded into a running interpreter via :keyword:`import`. This can |
|
387 be very useful for performing analyses of untrusted code. |
|
388 |
|
389 Generally, the example will demonstrate how the parse tree may be traversed to |
|
390 distill interesting information. Two functions and a set of classes are |
|
391 developed which provide programmatic access to high level function and class |
|
392 definitions provided by a module. The classes extract information from the |
|
393 parse tree and provide access to the information at a useful semantic level, one |
|
394 function provides a simple low-level pattern matching capability, and the other |
|
395 function defines a high-level interface to the classes by handling file |
|
396 operations on behalf of the caller. All source files mentioned here which are |
|
397 not part of the Python installation are located in the :file:`Demo/parser/` |
|
398 directory of the distribution. |
|
399 |
|
400 The dynamic nature of Python allows the programmer a great deal of flexibility, |
|
401 but most modules need only a limited measure of this when defining classes, |
|
402 functions, and methods. In this example, the only definitions that will be |
|
403 considered are those which are defined in the top level of their context, e.g., |
|
404 a function defined by a :keyword:`def` statement at column zero of a module, but |
|
405 not a function defined within a branch of an :keyword:`if` ... :keyword:`else` |
|
406 construct, though there are some good reasons for doing so in some situations. |
|
407 Nesting of definitions will be handled by the code developed in the example. |
|
408 |
|
409 To construct the upper-level extraction methods, we need to know what the parse |
|
410 tree structure looks like and how much of it we actually need to be concerned |
|
411 about. Python uses a moderately deep parse tree so there are a large number of |
|
412 intermediate nodes. It is important to read and understand the formal grammar |
|
413 used by Python. This is specified in the file :file:`Grammar/Grammar` in the |
|
414 distribution. Consider the simplest case of interest when searching for |
|
415 docstrings: a module consisting of a docstring and nothing else. (See file |
|
416 :file:`docstring.py`.) :: |
|
417 |
|
418 """Some documentation. |
|
419 """ |
|
420 |
|
421 Using the interpreter to take a look at the parse tree, we find a bewildering |
|
422 mass of numbers and parentheses, with the documentation buried deep in nested |
|
423 tuples. :: |
|
424 |
|
425 >>> import parser |
|
426 >>> import pprint |
|
427 >>> st = parser.suite(open('docstring.py').read()) |
|
428 >>> tup = st.totuple() |
|
429 >>> pprint.pprint(tup) |
|
430 (257, |
|
431 (264, |
|
432 (265, |
|
433 (266, |
|
434 (267, |
|
435 (307, |
|
436 (287, |
|
437 (288, |
|
438 (289, |
|
439 (290, |
|
440 (292, |
|
441 (293, |
|
442 (294, |
|
443 (295, |
|
444 (296, |
|
445 (297, |
|
446 (298, |
|
447 (299, |
|
448 (300, (3, '"""Some documentation.\n"""'))))))))))))))))), |
|
449 (4, ''))), |
|
450 (4, ''), |
|
451 (0, '')) |
|
452 |
|
453 The numbers at the first element of each node in the tree are the node types; |
|
454 they map directly to terminal and non-terminal symbols in the grammar. |
|
455 Unfortunately, they are represented as integers in the internal representation, |
|
456 and the Python structures generated do not change that. However, the |
|
457 :mod:`symbol` and :mod:`token` modules provide symbolic names for the node types |
|
458 and dictionaries which map from the integers to the symbolic names for the node |
|
459 types. |
|
460 |
|
461 In the output presented above, the outermost tuple contains four elements: the |
|
462 integer ``257`` and three additional tuples. Node type ``257`` has the symbolic |
|
463 name :const:`file_input`. Each of these inner tuples contains an integer as the |
|
464 first element; these integers, ``264``, ``4``, and ``0``, represent the node |
|
465 types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively. |
|
466 Note that these values may change depending on the version of Python you are |
|
467 using; consult :file:`symbol.py` and :file:`token.py` for details of the |
|
468 mapping. It should be fairly clear that the outermost node is related primarily |
|
469 to the input source rather than the contents of the file, and may be disregarded |
|
470 for the moment. The :const:`stmt` node is much more interesting. In |
|
471 particular, all docstrings are found in subtrees which are formed exactly as |
|
472 this node is formed, with the only difference being the string itself. The |
|
473 association between the docstring in a similar tree and the defined entity |
|
474 (class, function, or module) which it describes is given by the position of the |
|
475 docstring subtree within the tree defining the described structure. |
|
476 |
|
477 By replacing the actual docstring with something to signify a variable component |
|
478 of the tree, we allow a simple pattern matching approach to check any given |
|
479 subtree for equivalence to the general pattern for docstrings. Since the |
|
480 example demonstrates information extraction, we can safely require that the tree |
|
481 be in tuple form rather than list form, allowing a simple variable |
|
482 representation to be ``['variable_name']``. A simple recursive function can |
|
483 implement the pattern matching, returning a Boolean and a dictionary of variable |
|
484 name to value mappings. (See file :file:`example.py`.) :: |
|
485 |
|
486 from types import ListType, TupleType |
|
487 |
|
488 def match(pattern, data, vars=None): |
|
489 if vars is None: |
|
490 vars = {} |
|
491 if type(pattern) is ListType: |
|
492 vars[pattern[0]] = data |
|
493 return 1, vars |
|
494 if type(pattern) is not TupleType: |
|
495 return (pattern == data), vars |
|
496 if len(data) != len(pattern): |
|
497 return 0, vars |
|
498 for pattern, data in map(None, pattern, data): |
|
499 same, vars = match(pattern, data, vars) |
|
500 if not same: |
|
501 break |
|
502 return same, vars |
|
503 |
|
504 Using this simple representation for syntactic variables and the symbolic node |
|
505 types, the pattern for the candidate docstring subtrees becomes fairly readable. |
|
506 (See file :file:`example.py`.) :: |
|
507 |
|
508 import symbol |
|
509 import token |
|
510 |
|
511 DOCSTRING_STMT_PATTERN = ( |
|
512 symbol.stmt, |
|
513 (symbol.simple_stmt, |
|
514 (symbol.small_stmt, |
|
515 (symbol.expr_stmt, |
|
516 (symbol.testlist, |
|
517 (symbol.test, |
|
518 (symbol.and_test, |
|
519 (symbol.not_test, |
|
520 (symbol.comparison, |
|
521 (symbol.expr, |
|
522 (symbol.xor_expr, |
|
523 (symbol.and_expr, |
|
524 (symbol.shift_expr, |
|
525 (symbol.arith_expr, |
|
526 (symbol.term, |
|
527 (symbol.factor, |
|
528 (symbol.power, |
|
529 (symbol.atom, |
|
530 (token.STRING, ['docstring']) |
|
531 )))))))))))))))), |
|
532 (token.NEWLINE, '') |
|
533 )) |
|
534 |
|
535 Using the :func:`match` function with this pattern, extracting the module |
|
536 docstring from the parse tree created previously is easy:: |
|
537 |
|
538 >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1]) |
|
539 >>> found |
|
540 1 |
|
541 >>> vars |
|
542 {'docstring': '"""Some documentation.\n"""'} |
|
543 |
|
544 Once specific data can be extracted from a location where it is expected, the |
|
545 question of where information can be expected needs to be answered. When |
|
546 dealing with docstrings, the answer is fairly simple: the docstring is the first |
|
547 :const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node |
|
548 types). A module consists of a single :const:`file_input` node, and class and |
|
549 function definitions each contain exactly one :const:`suite` node. Classes and |
|
550 functions are readily identified as subtrees of code block nodes which start |
|
551 with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt, |
|
552 (funcdef, ...``. Note that these subtrees cannot be matched by :func:`match` |
|
553 since it does not support multiple sibling nodes to match without regard to |
|
554 number. A more elaborate matching function could be used to overcome this |
|
555 limitation, but this is sufficient for the example. |
|
556 |
|
557 Given the ability to determine whether a statement might be a docstring and |
|
558 extract the actual string from the statement, some work needs to be performed to |
|
559 walk the parse tree for an entire module and extract information about the names |
|
560 defined in each context of the module and associate any docstrings with the |
|
561 names. The code to perform this work is not complicated, but bears some |
|
562 explanation. |
|
563 |
|
564 The public interface to the classes is straightforward and should probably be |
|
565 somewhat more flexible. Each "major" block of the module is described by an |
|
566 object providing several methods for inquiry and a constructor which accepts at |
|
567 least the subtree of the complete parse tree which it represents. The |
|
568 :class:`ModuleInfo` constructor accepts an optional *name* parameter since it |
|
569 cannot otherwise determine the name of the module. |
|
570 |
|
571 The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and |
|
572 :class:`ModuleInfo`. All objects provide the methods :meth:`get_name`, |
|
573 :meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`. The |
|
574 :class:`ClassInfo` objects support :meth:`get_method_names` and |
|
575 :meth:`get_method_info` while the other classes provide |
|
576 :meth:`get_function_names` and :meth:`get_function_info`. |
|
577 |
|
578 Within each of the forms of code block that the public classes represent, most |
|
579 of the required information is in the same form and is accessed in the same way, |
|
580 with classes having the distinction that functions defined at the top level are |
|
581 referred to as "methods." Since the difference in nomenclature reflects a real |
|
582 semantic distinction from functions defined outside of a class, the |
|
583 implementation needs to maintain the distinction. Hence, most of the |
|
584 functionality of the public classes can be implemented in a common base class, |
|
585 :class:`SuiteInfoBase`, with the accessors for function and method information |
|
586 provided elsewhere. Note that there is only one class which represents function |
|
587 and method information; this parallels the use of the :keyword:`def` statement |
|
588 to define both types of elements. |
|
589 |
|
590 Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not |
|
591 need to be overridden by subclasses. More importantly, the extraction of most |
|
592 information from a parse tree is handled through a method called by the |
|
593 :class:`SuiteInfoBase` constructor. The example code for most of the classes is |
|
594 clear when read alongside the formal grammar, but the method which recursively |
|
595 creates new information objects requires further examination. Here is the |
|
596 relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`:: |
|
597 |
|
598 class SuiteInfoBase: |
|
599 _docstring = '' |
|
600 _name = '' |
|
601 |
|
602 def __init__(self, tree = None): |
|
603 self._class_info = {} |
|
604 self._function_info = {} |
|
605 if tree: |
|
606 self._extract_info(tree) |
|
607 |
|
608 def _extract_info(self, tree): |
|
609 # extract docstring |
|
610 if len(tree) == 2: |
|
611 found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1]) |
|
612 else: |
|
613 found, vars = match(DOCSTRING_STMT_PATTERN, tree[3]) |
|
614 if found: |
|
615 self._docstring = eval(vars['docstring']) |
|
616 # discover inner definitions |
|
617 for node in tree[1:]: |
|
618 found, vars = match(COMPOUND_STMT_PATTERN, node) |
|
619 if found: |
|
620 cstmt = vars['compound'] |
|
621 if cstmt[0] == symbol.funcdef: |
|
622 name = cstmt[2][1] |
|
623 self._function_info[name] = FunctionInfo(cstmt) |
|
624 elif cstmt[0] == symbol.classdef: |
|
625 name = cstmt[2][1] |
|
626 self._class_info[name] = ClassInfo(cstmt) |
|
627 |
|
628 After initializing some internal state, the constructor calls the |
|
629 :meth:`_extract_info` method. This method performs the bulk of the information |
|
630 extraction which takes place in the entire example. The extraction has two |
|
631 distinct phases: the location of the docstring for the parse tree passed in, and |
|
632 the discovery of additional definitions within the code block represented by the |
|
633 parse tree. |
|
634 |
|
635 The initial :keyword:`if` test determines whether the nested suite is of the |
|
636 "short form" or the "long form." The short form is used when the code block is |
|
637 on the same line as the definition of the code block, as in :: |
|
638 |
|
639 def square(x): "Square an argument."; return x ** 2 |
|
640 |
|
641 while the long form uses an indented block and allows nested definitions:: |
|
642 |
|
643 def make_power(exp): |
|
644 "Make a function that raises an argument to the exponent `exp'." |
|
645 def raiser(x, y=exp): |
|
646 return x ** y |
|
647 return raiser |
|
648 |
|
649 When the short form is used, the code block may contain a docstring as the |
|
650 first, and possibly only, :const:`small_stmt` element. The extraction of such a |
|
651 docstring is slightly different and requires only a portion of the complete |
|
652 pattern used in the more common case. As implemented, the docstring will only |
|
653 be found if there is only one :const:`small_stmt` node in the |
|
654 :const:`simple_stmt` node. Since most functions and methods which use the short |
|
655 form do not provide a docstring, this may be considered sufficient. The |
|
656 extraction of the docstring proceeds using the :func:`match` function as |
|
657 described above, and the value of the docstring is stored as an attribute of the |
|
658 :class:`SuiteInfoBase` object. |
|
659 |
|
660 After docstring extraction, a simple definition discovery algorithm operates on |
|
661 the :const:`stmt` nodes of the :const:`suite` node. The special case of the |
|
662 short form is not tested; since there are no :const:`stmt` nodes in the short |
|
663 form, the algorithm will silently skip the single :const:`simple_stmt` node and |
|
664 correctly not discover any nested definitions. |
|
665 |
|
666 Each statement in the code block is categorized as a class definition, function |
|
667 or method definition, or something else. For the definition statements, the |
|
668 name of the element defined is extracted and a representation object appropriate |
|
669 to the definition is created with the defining subtree passed as an argument to |
|
670 the constructor. The representation objects are stored in instance variables |
|
671 and may be retrieved by name using the appropriate accessor methods. |
|
672 |
|
673 The public classes provide any accessors required which are more specific than |
|
674 those provided by the :class:`SuiteInfoBase` class, but the real extraction |
|
675 algorithm remains common to all forms of code blocks. A high-level function can |
|
676 be used to extract the complete set of information from a source file. (See |
|
677 file :file:`example.py`.) :: |
|
678 |
|
679 def get_docs(fileName): |
|
680 import os |
|
681 import parser |
|
682 |
|
683 source = open(fileName).read() |
|
684 basename = os.path.basename(os.path.splitext(fileName)[0]) |
|
685 st = parser.suite(source) |
|
686 return ModuleInfo(st.totuple(), basename) |
|
687 |
|
688 This provides an easy-to-use interface to the documentation of a module. If |
|
689 information is required which is not extracted by the code of this example, the |
|
690 code may be extended at clearly defined points to provide additional |
|
691 capabilities. |
|
692 |