symbian-qemu-0.9.1-12/python-2.6.1/Doc/library/parser.rst
changeset 1 2fb8b9db1c86
equal deleted inserted replaced
0:ffa851df0825 1:2fb8b9db1c86
       
     1 
       
     2 :mod:`parser` --- Access Python parse trees
       
     3 ===========================================
       
     4 
       
     5 .. module:: parser
       
     6    :synopsis: Access parse trees for Python source code.
       
     7 .. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
       
     8 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
       
     9 
       
    10 
       
    11 .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
       
    12    L. Drake, Jr.  This copyright notice must be distributed on all copies, but
       
    13    this document otherwise may be distributed as part of the Python
       
    14    distribution.  No fee may be charged for this document in any representation,
       
    15    either on paper or electronically.  This restriction does not affect other
       
    16    elements in a distributed package in any way.
       
    17 
       
    18 .. index:: single: parsing; Python source code
       
    19 
       
    20 The :mod:`parser` module provides an interface to Python's internal parser and
       
    21 byte-code compiler.  The primary purpose for this interface is to allow Python
       
    22 code to edit the parse tree of a Python expression and create executable code
       
    23 from this.  This is better than trying to parse and modify an arbitrary Python
       
    24 code fragment as a string because parsing is performed in a manner identical to
       
    25 the code forming the application.  It is also faster.
       
    26 
       
    27 .. note::
       
    28 
       
    29    From Python 2.5 onward, it's much more convenient to cut in at the Abstract
       
    30    Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
       
    31    module.
       
    32 
       
    33    The :mod:`parser` module exports the names documented here also with "st"
       
    34    replaced by "ast"; this is a legacy from the time when there was no other
       
    35    AST and has nothing to do with the AST found in Python 2.5.  This is also the
       
    36    reason for the functions' keyword arguments being called *ast*, not *st*.
       
    37    The "ast" functions will be removed in Python 3.0.
       
    38 
       
    39 There are a few things to note about this module which are important to making
       
    40 use of the data structures created.  This is not a tutorial on editing the parse
       
    41 trees for Python code, but some examples of using the :mod:`parser` module are
       
    42 presented.
       
    43 
       
    44 Most importantly, a good understanding of the Python grammar processed by the
       
    45 internal parser is required.  For full information on the language syntax, refer
       
    46 to :ref:`reference-index`.  The parser
       
    47 itself is created from a grammar specification defined in the file
       
    48 :file:`Grammar/Grammar` in the standard Python distribution.  The parse trees
       
    49 stored in the ST objects created by this module are the actual output from the
       
    50 internal parser when created by the :func:`expr` or :func:`suite` functions,
       
    51 described below.  The ST objects created by :func:`sequence2st` faithfully
       
    52 simulate those structures.  Be aware that the values of the sequences which are
       
    53 considered "correct" will vary from one version of Python to another as the
       
    54 formal grammar for the language is revised.  However, transporting code from one
       
    55 Python version to another as source text will always allow correct parse trees
       
    56 to be created in the target version, with the only restriction being that
       
    57 migrating to an older version of the interpreter will not support more recent
       
    58 language constructs.  The parse trees are not typically compatible from one
       
    59 version to another, whereas source code has always been forward-compatible.
       
    60 
       
    61 Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
       
    62 has a simple form.  Sequences representing non-terminal elements in the grammar
       
    63 always have a length greater than one.  The first element is an integer which
       
    64 identifies a production in the grammar.  These integers are given symbolic names
       
    65 in the C header file :file:`Include/graminit.h` and the Python module
       
    66 :mod:`symbol`.  Each additional element of the sequence represents a component
       
    67 of the production as recognized in the input string: these are always sequences
       
    68 which have the same form as the parent.  An important aspect of this structure
       
    69 which should be noted is that keywords used to identify the parent node type,
       
    70 such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
       
    71 node tree without any special treatment.  For example, the :keyword:`if` keyword
       
    72 is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
       
    73 associated with all :const:`NAME` tokens, including variable and function names
       
    74 defined by the user.  In an alternate form returned when line number information
       
    75 is requested, the same token might be represented as ``(1, 'if', 12)``, where
       
    76 the ``12`` represents the line number at which the terminal symbol was found.
       
    77 
       
    78 Terminal elements are represented in much the same way, but without any child
       
    79 elements and the addition of the source text which was identified.  The example
       
    80 of the :keyword:`if` keyword above is representative.  The various types of
       
    81 terminal symbols are defined in the C header file :file:`Include/token.h` and
       
    82 the Python module :mod:`token`.
       
    83 
       
    84 The ST objects are not required to support the functionality of this module,
       
    85 but are provided for three purposes: to allow an application to amortize the
       
    86 cost of processing complex parse trees, to provide a parse tree representation
       
    87 which conserves memory space when compared to the Python list or tuple
       
    88 representation, and to ease the creation of additional modules in C which
       
    89 manipulate parse trees.  A simple "wrapper" class may be created in Python to
       
    90 hide the use of ST objects.
       
    91 
       
    92 The :mod:`parser` module defines functions for a few distinct purposes.  The
       
    93 most important purposes are to create ST objects and to convert ST objects to
       
    94 other representations such as parse trees and compiled code objects, but there
       
    95 are also functions which serve to query the type of parse tree represented by an
       
    96 ST object.
       
    97 
       
    98 
       
    99 .. seealso::
       
   100 
       
   101    Module :mod:`symbol`
       
   102       Useful constants representing internal nodes of the parse tree.
       
   103 
       
   104    Module :mod:`token`
       
   105       Useful constants representing leaf nodes of the parse tree and functions for
       
   106       testing node values.
       
   107 
       
   108 
       
   109 .. _creating-sts:
       
   110 
       
   111 Creating ST Objects
       
   112 -------------------
       
   113 
       
   114 ST objects may be created from source code or from a parse tree. When creating
       
   115 an ST object from source, different functions are used to create the ``'eval'``
       
   116 and ``'exec'`` forms.
       
   117 
       
   118 
       
   119 .. function:: expr(source)
       
   120 
       
   121    The :func:`expr` function parses the parameter *source* as if it were an input
       
   122    to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object
       
   123    is created to hold the internal parse tree representation, otherwise an
       
   124    appropriate exception is thrown.
       
   125 
       
   126 
       
   127 .. function:: suite(source)
       
   128 
       
   129    The :func:`suite` function parses the parameter *source* as if it were an input
       
   130    to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object
       
   131    is created to hold the internal parse tree representation, otherwise an
       
   132    appropriate exception is thrown.
       
   133 
       
   134 
       
   135 .. function:: sequence2st(sequence)
       
   136 
       
   137    This function accepts a parse tree represented as a sequence and builds an
       
   138    internal representation if possible.  If it can validate that the tree conforms
       
   139    to the Python grammar and all nodes are valid node types in the host version of
       
   140    Python, an ST object is created from the internal representation and returned
       
   141    to the called.  If there is a problem creating the internal representation, or
       
   142    if the tree cannot be validated, a :exc:`ParserError` exception is thrown.  An
       
   143    ST object created this way should not be assumed to compile correctly; normal
       
   144    exceptions thrown by compilation may still be initiated when the ST object is
       
   145    passed to :func:`compilest`.  This may indicate problems not related to syntax
       
   146    (such as a :exc:`MemoryError` exception), but may also be due to constructs such
       
   147    as the result of parsing ``del f(0)``, which escapes the Python parser but is
       
   148    checked by the bytecode compiler.
       
   149 
       
   150    Sequences representing terminal tokens may be represented as either two-element
       
   151    lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
       
   152    'name', 56)``.  If the third element is present, it is assumed to be a valid
       
   153    line number.  The line number may be specified for any subset of the terminal
       
   154    symbols in the input tree.
       
   155 
       
   156 
       
   157 .. function:: tuple2st(sequence)
       
   158 
       
   159    This is the same function as :func:`sequence2st`.  This entry point is
       
   160    maintained for backward compatibility.
       
   161 
       
   162 
       
   163 .. _converting-sts:
       
   164 
       
   165 Converting ST Objects
       
   166 ---------------------
       
   167 
       
   168 ST objects, regardless of the input used to create them, may be converted to
       
   169 parse trees represented as list- or tuple- trees, or may be compiled into
       
   170 executable code objects.  Parse trees may be extracted with or without line
       
   171 numbering information.
       
   172 
       
   173 
       
   174 .. function:: st2list(ast[, line_info])
       
   175 
       
   176    This function accepts an ST object from the caller in *ast* and returns a
       
   177    Python list representing the equivalent parse tree.  The resulting list
       
   178    representation can be used for inspection or the creation of a new parse tree in
       
   179    list form.  This function does not fail so long as memory is available to build
       
   180    the list representation.  If the parse tree will only be used for inspection,
       
   181    :func:`st2tuple` should be used instead to reduce memory consumption and
       
   182    fragmentation.  When the list representation is required, this function is
       
   183    significantly faster than retrieving a tuple representation and converting that
       
   184    to nested lists.
       
   185 
       
   186    If *line_info* is true, line number information will be included for all
       
   187    terminal tokens as a third element of the list representing the token.  Note
       
   188    that the line number provided specifies the line on which the token *ends*.
       
   189    This information is omitted if the flag is false or omitted.
       
   190 
       
   191 
       
   192 .. function:: st2tuple(ast[, line_info])
       
   193 
       
   194    This function accepts an ST object from the caller in *ast* and returns a
       
   195    Python tuple representing the equivalent parse tree.  Other than returning a
       
   196    tuple instead of a list, this function is identical to :func:`st2list`.
       
   197 
       
   198    If *line_info* is true, line number information will be included for all
       
   199    terminal tokens as a third element of the list representing the token.  This
       
   200    information is omitted if the flag is false or omitted.
       
   201 
       
   202 
       
   203 .. function:: compilest(ast[, filename='<syntax-tree>'])
       
   204 
       
   205    .. index:: builtin: eval
       
   206 
       
   207    The Python byte compiler can be invoked on an ST object to produce code objects
       
   208    which can be used as part of an :keyword:`exec` statement or a call to the
       
   209    built-in :func:`eval` function. This function provides the interface to the
       
   210    compiler, passing the internal parse tree from *ast* to the parser, using the
       
   211    source file name specified by the *filename* parameter. The default value
       
   212    supplied for *filename* indicates that the source was an ST object.
       
   213 
       
   214    Compiling an ST object may result in exceptions related to compilation; an
       
   215    example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
       
   216    this statement is considered legal within the formal grammar for Python but is
       
   217    not a legal language construct.  The :exc:`SyntaxError` raised for this
       
   218    condition is actually generated by the Python byte-compiler normally, which is
       
   219    why it can be raised at this point by the :mod:`parser` module.  Most causes of
       
   220    compilation failure can be diagnosed programmatically by inspection of the parse
       
   221    tree.
       
   222 
       
   223 
       
   224 .. _querying-sts:
       
   225 
       
   226 Queries on ST Objects
       
   227 ---------------------
       
   228 
       
   229 Two functions are provided which allow an application to determine if an ST was
       
   230 created as an expression or a suite.  Neither of these functions can be used to
       
   231 determine if an ST was created from source code via :func:`expr` or
       
   232 :func:`suite` or from a parse tree via :func:`sequence2st`.
       
   233 
       
   234 
       
   235 .. function:: isexpr(ast)
       
   236 
       
   237    .. index:: builtin: compile
       
   238 
       
   239    When *ast* represents an ``'eval'`` form, this function returns true, otherwise
       
   240    it returns false.  This is useful, since code objects normally cannot be queried
       
   241    for this information using existing built-in functions.  Note that the code
       
   242    objects created by :func:`compilest` cannot be queried like this either, and
       
   243    are identical to those created by the built-in :func:`compile` function.
       
   244 
       
   245 
       
   246 .. function:: issuite(ast)
       
   247 
       
   248    This function mirrors :func:`isexpr` in that it reports whether an ST object
       
   249    represents an ``'exec'`` form, commonly known as a "suite."  It is not safe to
       
   250    assume that this function is equivalent to ``not isexpr(ast)``, as additional
       
   251    syntactic fragments may be supported in the future.
       
   252 
       
   253 
       
   254 .. _st-errors:
       
   255 
       
   256 Exceptions and Error Handling
       
   257 -----------------------------
       
   258 
       
   259 The parser module defines a single exception, but may also pass other built-in
       
   260 exceptions from other portions of the Python runtime environment.  See each
       
   261 function for information about the exceptions it can raise.
       
   262 
       
   263 
       
   264 .. exception:: ParserError
       
   265 
       
   266    Exception raised when a failure occurs within the parser module.  This is
       
   267    generally produced for validation failures rather than the built in
       
   268    :exc:`SyntaxError` thrown during normal parsing. The exception argument is
       
   269    either a string describing the reason of the failure or a tuple containing a
       
   270    sequence causing the failure from a parse tree passed to :func:`sequence2st`
       
   271    and an explanatory string.  Calls to :func:`sequence2st` need to be able to
       
   272    handle either type of exception, while calls to other functions in the module
       
   273    will only need to be aware of the simple string values.
       
   274 
       
   275 Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
       
   276 throw exceptions which are normally thrown by the parsing and compilation
       
   277 process.  These include the built in exceptions :exc:`MemoryError`,
       
   278 :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these
       
   279 cases, these exceptions carry all the meaning normally associated with them.
       
   280 Refer to the descriptions of each function for detailed information.
       
   281 
       
   282 
       
   283 .. _st-objects:
       
   284 
       
   285 ST Objects
       
   286 ----------
       
   287 
       
   288 Ordered and equality comparisons are supported between ST objects. Pickling of
       
   289 ST objects (using the :mod:`pickle` module) is also supported.
       
   290 
       
   291 
       
   292 .. data:: STType
       
   293 
       
   294    The type of the objects returned by :func:`expr`, :func:`suite` and
       
   295    :func:`sequence2st`.
       
   296 
       
   297 ST objects have the following methods:
       
   298 
       
   299 
       
   300 .. method:: ST.compile([filename])
       
   301 
       
   302    Same as ``compilest(st, filename)``.
       
   303 
       
   304 
       
   305 .. method:: ST.isexpr()
       
   306 
       
   307    Same as ``isexpr(st)``.
       
   308 
       
   309 
       
   310 .. method:: ST.issuite()
       
   311 
       
   312    Same as ``issuite(st)``.
       
   313 
       
   314 
       
   315 .. method:: ST.tolist([line_info])
       
   316 
       
   317    Same as ``st2list(st, line_info)``.
       
   318 
       
   319 
       
   320 .. method:: ST.totuple([line_info])
       
   321 
       
   322    Same as ``st2tuple(st, line_info)``.
       
   323 
       
   324 
       
   325 .. _st-examples:
       
   326 
       
   327 Examples
       
   328 --------
       
   329 
       
   330 .. index:: builtin: compile
       
   331 
       
   332 The parser modules allows operations to be performed on the parse tree of Python
       
   333 source code before the :term:`bytecode` is generated, and provides for inspection of the
       
   334 parse tree for information gathering purposes. Two examples are presented.  The
       
   335 simple example demonstrates emulation of the :func:`compile` built-in function
       
   336 and the complex example shows the use of a parse tree for information discovery.
       
   337 
       
   338 
       
   339 Emulation of :func:`compile`
       
   340 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       
   341 
       
   342 While many useful operations may take place between parsing and bytecode
       
   343 generation, the simplest operation is to do nothing.  For this purpose, using
       
   344 the :mod:`parser` module to produce an intermediate data structure is equivalent
       
   345 to the code ::
       
   346 
       
   347    >>> code = compile('a + 5', 'file.py', 'eval')
       
   348    >>> a = 5
       
   349    >>> eval(code)
       
   350    10
       
   351 
       
   352 The equivalent operation using the :mod:`parser` module is somewhat longer, and
       
   353 allows the intermediate internal parse tree to be retained as an ST object::
       
   354 
       
   355    >>> import parser
       
   356    >>> st = parser.expr('a + 5')
       
   357    >>> code = st.compile('file.py')
       
   358    >>> a = 5
       
   359    >>> eval(code)
       
   360    10
       
   361 
       
   362 An application which needs both ST and code objects can package this code into
       
   363 readily available functions::
       
   364 
       
   365    import parser
       
   366 
       
   367    def load_suite(source_string):
       
   368        st = parser.suite(source_string)
       
   369        return st, st.compile()
       
   370 
       
   371    def load_expression(source_string):
       
   372        st = parser.expr(source_string)
       
   373        return st, st.compile()
       
   374 
       
   375 
       
   376 Information Discovery
       
   377 ^^^^^^^^^^^^^^^^^^^^^
       
   378 
       
   379 .. index::
       
   380    single: string; documentation
       
   381    single: docstrings
       
   382 
       
   383 Some applications benefit from direct access to the parse tree.  The remainder
       
   384 of this section demonstrates how the parse tree provides access to module
       
   385 documentation defined in docstrings without requiring that the code being
       
   386 examined be loaded into a running interpreter via :keyword:`import`.  This can
       
   387 be very useful for performing analyses of untrusted code.
       
   388 
       
   389 Generally, the example will demonstrate how the parse tree may be traversed to
       
   390 distill interesting information.  Two functions and a set of classes are
       
   391 developed which provide programmatic access to high level function and class
       
   392 definitions provided by a module.  The classes extract information from the
       
   393 parse tree and provide access to the information at a useful semantic level, one
       
   394 function provides a simple low-level pattern matching capability, and the other
       
   395 function defines a high-level interface to the classes by handling file
       
   396 operations on behalf of the caller.  All source files mentioned here which are
       
   397 not part of the Python installation are located in the :file:`Demo/parser/`
       
   398 directory of the distribution.
       
   399 
       
   400 The dynamic nature of Python allows the programmer a great deal of flexibility,
       
   401 but most modules need only a limited measure of this when defining classes,
       
   402 functions, and methods.  In this example, the only definitions that will be
       
   403 considered are those which are defined in the top level of their context, e.g.,
       
   404 a function defined by a :keyword:`def` statement at column zero of a module, but
       
   405 not a function defined within a branch of an :keyword:`if` ... :keyword:`else`
       
   406 construct, though there are some good reasons for doing so in some situations.
       
   407 Nesting of definitions will be handled by the code developed in the example.
       
   408 
       
   409 To construct the upper-level extraction methods, we need to know what the parse
       
   410 tree structure looks like and how much of it we actually need to be concerned
       
   411 about.  Python uses a moderately deep parse tree so there are a large number of
       
   412 intermediate nodes.  It is important to read and understand the formal grammar
       
   413 used by Python.  This is specified in the file :file:`Grammar/Grammar` in the
       
   414 distribution. Consider the simplest case of interest when searching for
       
   415 docstrings: a module consisting of a docstring and nothing else.  (See file
       
   416 :file:`docstring.py`.) ::
       
   417 
       
   418    """Some documentation.
       
   419    """
       
   420 
       
   421 Using the interpreter to take a look at the parse tree, we find a bewildering
       
   422 mass of numbers and parentheses, with the documentation buried deep in nested
       
   423 tuples. ::
       
   424 
       
   425    >>> import parser
       
   426    >>> import pprint
       
   427    >>> st = parser.suite(open('docstring.py').read())
       
   428    >>> tup = st.totuple()
       
   429    >>> pprint.pprint(tup)
       
   430    (257,
       
   431     (264,
       
   432      (265,
       
   433       (266,
       
   434        (267,
       
   435         (307,
       
   436          (287,
       
   437           (288,
       
   438            (289,
       
   439             (290,
       
   440              (292,
       
   441               (293,
       
   442                (294,
       
   443                 (295,
       
   444                  (296,
       
   445                   (297,
       
   446                    (298,
       
   447                     (299,
       
   448                      (300, (3, '"""Some documentation.\n"""'))))))))))))))))),
       
   449       (4, ''))),
       
   450     (4, ''),
       
   451     (0, ''))
       
   452 
       
   453 The numbers at the first element of each node in the tree are the node types;
       
   454 they map directly to terminal and non-terminal symbols in the grammar.
       
   455 Unfortunately, they are represented as integers in the internal representation,
       
   456 and the Python structures generated do not change that.  However, the
       
   457 :mod:`symbol` and :mod:`token` modules provide symbolic names for the node types
       
   458 and dictionaries which map from the integers to the symbolic names for the node
       
   459 types.
       
   460 
       
   461 In the output presented above, the outermost tuple contains four elements: the
       
   462 integer ``257`` and three additional tuples.  Node type ``257`` has the symbolic
       
   463 name :const:`file_input`.  Each of these inner tuples contains an integer as the
       
   464 first element; these integers, ``264``, ``4``, and ``0``, represent the node
       
   465 types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively.
       
   466 Note that these values may change depending on the version of Python you are
       
   467 using; consult :file:`symbol.py` and :file:`token.py` for details of the
       
   468 mapping.  It should be fairly clear that the outermost node is related primarily
       
   469 to the input source rather than the contents of the file, and may be disregarded
       
   470 for the moment.  The :const:`stmt` node is much more interesting.  In
       
   471 particular, all docstrings are found in subtrees which are formed exactly as
       
   472 this node is formed, with the only difference being the string itself.  The
       
   473 association between the docstring in a similar tree and the defined entity
       
   474 (class, function, or module) which it describes is given by the position of the
       
   475 docstring subtree within the tree defining the described structure.
       
   476 
       
   477 By replacing the actual docstring with something to signify a variable component
       
   478 of the tree, we allow a simple pattern matching approach to check any given
       
   479 subtree for equivalence to the general pattern for docstrings.  Since the
       
   480 example demonstrates information extraction, we can safely require that the tree
       
   481 be in tuple form rather than list form, allowing a simple variable
       
   482 representation to be ``['variable_name']``.  A simple recursive function can
       
   483 implement the pattern matching, returning a Boolean and a dictionary of variable
       
   484 name to value mappings.  (See file :file:`example.py`.) ::
       
   485 
       
   486    from types import ListType, TupleType
       
   487 
       
   488    def match(pattern, data, vars=None):
       
   489        if vars is None:
       
   490            vars = {}
       
   491        if type(pattern) is ListType:
       
   492            vars[pattern[0]] = data
       
   493            return 1, vars
       
   494        if type(pattern) is not TupleType:
       
   495            return (pattern == data), vars
       
   496        if len(data) != len(pattern):
       
   497            return 0, vars
       
   498        for pattern, data in map(None, pattern, data):
       
   499            same, vars = match(pattern, data, vars)
       
   500            if not same:
       
   501                break
       
   502        return same, vars
       
   503 
       
   504 Using this simple representation for syntactic variables and the symbolic node
       
   505 types, the pattern for the candidate docstring subtrees becomes fairly readable.
       
   506 (See file :file:`example.py`.) ::
       
   507 
       
   508    import symbol
       
   509    import token
       
   510 
       
   511    DOCSTRING_STMT_PATTERN = (
       
   512        symbol.stmt,
       
   513        (symbol.simple_stmt,
       
   514         (symbol.small_stmt,
       
   515          (symbol.expr_stmt,
       
   516           (symbol.testlist,
       
   517            (symbol.test,
       
   518             (symbol.and_test,
       
   519              (symbol.not_test,
       
   520               (symbol.comparison,
       
   521                (symbol.expr,
       
   522                 (symbol.xor_expr,
       
   523                  (symbol.and_expr,
       
   524                   (symbol.shift_expr,
       
   525                    (symbol.arith_expr,
       
   526                     (symbol.term,
       
   527                      (symbol.factor,
       
   528                       (symbol.power,
       
   529                        (symbol.atom,
       
   530                         (token.STRING, ['docstring'])
       
   531                         )))))))))))))))),
       
   532         (token.NEWLINE, '')
       
   533         ))
       
   534 
       
   535 Using the :func:`match` function with this pattern, extracting the module
       
   536 docstring from the parse tree created previously is easy::
       
   537 
       
   538    >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
       
   539    >>> found
       
   540    1
       
   541    >>> vars
       
   542    {'docstring': '"""Some documentation.\n"""'}
       
   543 
       
   544 Once specific data can be extracted from a location where it is expected, the
       
   545 question of where information can be expected needs to be answered.  When
       
   546 dealing with docstrings, the answer is fairly simple: the docstring is the first
       
   547 :const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node
       
   548 types).  A module consists of a single :const:`file_input` node, and class and
       
   549 function definitions each contain exactly one :const:`suite` node.  Classes and
       
   550 functions are readily identified as subtrees of code block nodes which start
       
   551 with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt,
       
   552 (funcdef, ...``.  Note that these subtrees cannot be matched by :func:`match`
       
   553 since it does not support multiple sibling nodes to match without regard to
       
   554 number.  A more elaborate matching function could be used to overcome this
       
   555 limitation, but this is sufficient for the example.
       
   556 
       
   557 Given the ability to determine whether a statement might be a docstring and
       
   558 extract the actual string from the statement, some work needs to be performed to
       
   559 walk the parse tree for an entire module and extract information about the names
       
   560 defined in each context of the module and associate any docstrings with the
       
   561 names.  The code to perform this work is not complicated, but bears some
       
   562 explanation.
       
   563 
       
   564 The public interface to the classes is straightforward and should probably be
       
   565 somewhat more flexible.  Each "major" block of the module is described by an
       
   566 object providing several methods for inquiry and a constructor which accepts at
       
   567 least the subtree of the complete parse tree which it represents.  The
       
   568 :class:`ModuleInfo` constructor accepts an optional *name* parameter since it
       
   569 cannot otherwise determine the name of the module.
       
   570 
       
   571 The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and
       
   572 :class:`ModuleInfo`.  All objects provide the methods :meth:`get_name`,
       
   573 :meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`.  The
       
   574 :class:`ClassInfo` objects support :meth:`get_method_names` and
       
   575 :meth:`get_method_info` while the other classes provide
       
   576 :meth:`get_function_names` and :meth:`get_function_info`.
       
   577 
       
   578 Within each of the forms of code block that the public classes represent, most
       
   579 of the required information is in the same form and is accessed in the same way,
       
   580 with classes having the distinction that functions defined at the top level are
       
   581 referred to as "methods." Since the difference in nomenclature reflects a real
       
   582 semantic distinction from functions defined outside of a class, the
       
   583 implementation needs to maintain the distinction. Hence, most of the
       
   584 functionality of the public classes can be implemented in a common base class,
       
   585 :class:`SuiteInfoBase`, with the accessors for function and method information
       
   586 provided elsewhere. Note that there is only one class which represents function
       
   587 and method information; this parallels the use of the :keyword:`def` statement
       
   588 to define both types of elements.
       
   589 
       
   590 Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not
       
   591 need to be overridden by subclasses.  More importantly, the extraction of most
       
   592 information from a parse tree is handled through a method called by the
       
   593 :class:`SuiteInfoBase` constructor.  The example code for most of the classes is
       
   594 clear when read alongside the formal grammar, but the method which recursively
       
   595 creates new information objects requires further examination.  Here is the
       
   596 relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`::
       
   597 
       
   598    class SuiteInfoBase:
       
   599        _docstring = ''
       
   600        _name = ''
       
   601 
       
   602        def __init__(self, tree = None):
       
   603            self._class_info = {}
       
   604            self._function_info = {}
       
   605            if tree:
       
   606                self._extract_info(tree)
       
   607 
       
   608        def _extract_info(self, tree):
       
   609            # extract docstring
       
   610            if len(tree) == 2:
       
   611                found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
       
   612            else:
       
   613                found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
       
   614            if found:
       
   615                self._docstring = eval(vars['docstring'])
       
   616            # discover inner definitions
       
   617            for node in tree[1:]:
       
   618                found, vars = match(COMPOUND_STMT_PATTERN, node)
       
   619                if found:
       
   620                    cstmt = vars['compound']
       
   621                    if cstmt[0] == symbol.funcdef:
       
   622                        name = cstmt[2][1]
       
   623                        self._function_info[name] = FunctionInfo(cstmt)
       
   624                    elif cstmt[0] == symbol.classdef:
       
   625                        name = cstmt[2][1]
       
   626                        self._class_info[name] = ClassInfo(cstmt)
       
   627 
       
   628 After initializing some internal state, the constructor calls the
       
   629 :meth:`_extract_info` method.  This method performs the bulk of the information
       
   630 extraction which takes place in the entire example.  The extraction has two
       
   631 distinct phases: the location of the docstring for the parse tree passed in, and
       
   632 the discovery of additional definitions within the code block represented by the
       
   633 parse tree.
       
   634 
       
   635 The initial :keyword:`if` test determines whether the nested suite is of the
       
   636 "short form" or the "long form."  The short form is used when the code block is
       
   637 on the same line as the definition of the code block, as in ::
       
   638 
       
   639    def square(x): "Square an argument."; return x ** 2
       
   640 
       
   641 while the long form uses an indented block and allows nested definitions::
       
   642 
       
   643    def make_power(exp):
       
   644        "Make a function that raises an argument to the exponent `exp'."
       
   645        def raiser(x, y=exp):
       
   646            return x ** y
       
   647        return raiser
       
   648 
       
   649 When the short form is used, the code block may contain a docstring as the
       
   650 first, and possibly only, :const:`small_stmt` element.  The extraction of such a
       
   651 docstring is slightly different and requires only a portion of the complete
       
   652 pattern used in the more common case.  As implemented, the docstring will only
       
   653 be found if there is only one :const:`small_stmt` node in the
       
   654 :const:`simple_stmt` node. Since most functions and methods which use the short
       
   655 form do not provide a docstring, this may be considered sufficient.  The
       
   656 extraction of the docstring proceeds using the :func:`match` function as
       
   657 described above, and the value of the docstring is stored as an attribute of the
       
   658 :class:`SuiteInfoBase` object.
       
   659 
       
   660 After docstring extraction, a simple definition discovery algorithm operates on
       
   661 the :const:`stmt` nodes of the :const:`suite` node.  The special case of the
       
   662 short form is not tested; since there are no :const:`stmt` nodes in the short
       
   663 form, the algorithm will silently skip the single :const:`simple_stmt` node and
       
   664 correctly not discover any nested definitions.
       
   665 
       
   666 Each statement in the code block is categorized as a class definition, function
       
   667 or method definition, or something else.  For the definition statements, the
       
   668 name of the element defined is extracted and a representation object appropriate
       
   669 to the definition is created with the defining subtree passed as an argument to
       
   670 the constructor.  The representation objects are stored in instance variables
       
   671 and may be retrieved by name using the appropriate accessor methods.
       
   672 
       
   673 The public classes provide any accessors required which are more specific than
       
   674 those provided by the :class:`SuiteInfoBase` class, but the real extraction
       
   675 algorithm remains common to all forms of code blocks.  A high-level function can
       
   676 be used to extract the complete set of information from a source file.  (See
       
   677 file :file:`example.py`.) ::
       
   678 
       
   679    def get_docs(fileName):
       
   680        import os
       
   681        import parser
       
   682 
       
   683        source = open(fileName).read()
       
   684        basename = os.path.basename(os.path.splitext(fileName)[0])
       
   685        st = parser.suite(source)
       
   686        return ModuleInfo(st.totuple(), basename)
       
   687 
       
   688 This provides an easy-to-use interface to the documentation of a module.  If
       
   689 information is required which is not extracted by the code of this example, the
       
   690 code may be extended at clearly defined points to provide additional
       
   691 capabilities.
       
   692