symbian-qemu-0.9.1-12/python-2.6.1/Doc/library/re.rst
author johnathan.white@2718R8BGH51.accenture.com
Mon, 08 Mar 2010 18:45:03 +0000
changeset 46 b6935a90ca64
parent 1 2fb8b9db1c86
permissions -rw-r--r--
Modify framebuffer and NGA framebuffer to read screen size from board model dtb file. Optimise memory usuage of frame buffer Add example minigui application with hooks to profiler (which writes results to S:\). Modified NGA framebuffer to run its own dfc queue at high priority
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     2
:mod:`re` --- Regular expression operations
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     3
===========================================
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     4
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     5
.. module:: re
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     6
   :synopsis: Regular expression operations.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     7
.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     8
.. sectionauthor:: Andrew M. Kuchling <amk@amk.ca>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     9
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    10
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    11
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    12
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    13
This module provides regular expression matching operations similar to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    14
those found in Perl. Both patterns and strings to be searched can be
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    15
Unicode strings as well as 8-bit strings.  The :mod:`re` module is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    16
always available.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    17
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    18
Regular expressions use the backslash character (``'\'``) to indicate
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    19
special forms or to allow special characters to be used without invoking
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    20
their special meaning.  This collides with Python's usage of the same
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    21
character for the same purpose in string literals; for example, to match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    22
a literal backslash, one might have to write ``'\\\\'`` as the pattern
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    23
string, because the regular expression must be ``\\``, and each
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    24
backslash must be expressed as ``\\`` inside a regular Python string
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    25
literal.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    26
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    27
The solution is to use Python's raw string notation for regular expression
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    28
patterns; backslashes are not handled in any special way in a string literal
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    29
prefixed with ``'r'``.  So ``r"\n"`` is a two-character string containing
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    30
``'\'`` and ``'n'``, while ``"\n"`` is a one-character string containing a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    31
newline.  Usually patterns will be expressed in Python code using this raw
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    32
string notation.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    33
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    34
It is important to note that most regular expression operations are available as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    35
module-level functions and :class:`RegexObject` methods.  The functions are
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    36
shortcuts that don't require you to compile a regex object first, but miss some
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    37
fine-tuning parameters.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    38
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    39
.. seealso::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    40
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    41
   Mastering Regular Expressions
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    42
      Book on regular expressions by Jeffrey Friedl, published by O'Reilly.  The
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    43
      second edition of the book no longer covers Python at all, but the first
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    44
      edition covered writing good regular expression patterns in great detail.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    45
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    46
   `Kodos <http://kodos.sf.net/>`_
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    47
      is a graphical regular expression debugger written in Python.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    48
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    49
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    50
.. _re-syntax:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    51
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    52
Regular Expression Syntax
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    53
-------------------------
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    54
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    55
A regular expression (or RE) specifies a set of strings that matches it; the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    56
functions in this module let you check if a particular string matches a given
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    57
regular expression (or if a given regular expression matches a particular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    58
string, which comes down to the same thing).
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    59
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    60
Regular expressions can be concatenated to form new regular expressions; if *A*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    61
and *B* are both regular expressions, then *AB* is also a regular expression.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    62
In general, if a string *p* matches *A* and another string *q* matches *B*, the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    63
string *pq* will match AB.  This holds unless *A* or *B* contain low precedence
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    64
operations; boundary conditions between *A* and *B*; or have numbered group
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    65
references.  Thus, complex expressions can easily be constructed from simpler
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    66
primitive expressions like the ones described here.  For details of the theory
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    67
and implementation of regular expressions, consult the Friedl book referenced
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    68
above, or almost any textbook about compiler construction.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    69
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    70
A brief explanation of the format of regular expressions follows.  For further
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    71
information and a gentler presentation, consult the :ref:`regex-howto`.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    72
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    73
Regular expressions can contain both special and ordinary characters. Most
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    74
ordinary characters, like ``'A'``, ``'a'``, or ``'0'``, are the simplest regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    75
expressions; they simply match themselves.  You can concatenate ordinary
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    76
characters, so ``last`` matches the string ``'last'``.  (In the rest of this
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    77
section, we'll write RE's in ``this special style``, usually without quotes, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    78
strings to be matched ``'in single quotes'``.)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    79
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    80
Some characters, like ``'|'`` or ``'('``, are special. Special
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    81
characters either stand for classes of ordinary characters, or affect
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    82
how the regular expressions around them are interpreted. Regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    83
expression pattern strings may not contain null bytes, but can specify
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    84
the null byte using the ``\number`` notation, e.g., ``'\x00'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    85
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    86
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    87
The special characters are:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    88
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    89
``'.'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    90
   (Dot.)  In the default mode, this matches any character except a newline.  If
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    91
   the :const:`DOTALL` flag has been specified, this matches any character
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    92
   including a newline.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    93
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    94
``'^'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    95
   (Caret.)  Matches the start of the string, and in :const:`MULTILINE` mode also
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    96
   matches immediately after each newline.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    97
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    98
``'$'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    99
   Matches the end of the string or just before the newline at the end of the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   100
   string, and in :const:`MULTILINE` mode also matches before a newline.  ``foo``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   101
   matches both 'foo' and 'foobar', while the regular expression ``foo$`` matches
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   102
   only 'foo'.  More interestingly, searching for ``foo.$`` in ``'foo1\nfoo2\n'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   103
   matches 'foo2' normally, but 'foo1' in :const:`MULTILINE` mode; searching for
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   104
   a single ``$`` in ``'foo\n'`` will find two (empty) matches: one just before
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   105
   the newline, and one at the end of the string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   106
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   107
``'*'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   108
   Causes the resulting RE to match 0 or more repetitions of the preceding RE, as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   109
   many repetitions as are possible.  ``ab*`` will match 'a', 'ab', or 'a' followed
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   110
   by any number of 'b's.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   111
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   112
``'+'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   113
   Causes the resulting RE to match 1 or more repetitions of the preceding RE.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   114
   ``ab+`` will match 'a' followed by any non-zero number of 'b's; it will not
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   115
   match just 'a'.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   116
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   117
``'?'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   118
   Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   119
   ``ab?`` will match either 'a' or 'ab'.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   120
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   121
``*?``, ``+?``, ``??``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   122
   The ``'*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   123
   as much text as possible.  Sometimes this behaviour isn't desired; if the RE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   124
   ``<.*>`` is matched against ``'<H1>title</H1>'``, it will match the entire
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   125
   string, and not just ``'<H1>'``.  Adding ``'?'`` after the qualifier makes it
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   126
   perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   127
   characters as possible will be matched.  Using ``.*?`` in the previous
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   128
   expression will match only ``'<H1>'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   129
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   130
``{m}``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   131
   Specifies that exactly *m* copies of the previous RE should be matched; fewer
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   132
   matches cause the entire RE not to match.  For example, ``a{6}`` will match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   133
   exactly six ``'a'`` characters, but not five.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   134
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   135
``{m,n}``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   136
   Causes the resulting RE to match from *m* to *n* repetitions of the preceding
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   137
   RE, attempting to match as many repetitions as possible.  For example,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   138
   ``a{3,5}`` will match from 3 to 5 ``'a'`` characters.  Omitting *m* specifies a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   139
   lower bound of zero,  and omitting *n* specifies an infinite upper bound.  As an
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   140
   example, ``a{4,}b`` will match ``aaaab`` or a thousand ``'a'`` characters
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   141
   followed by a ``b``, but not ``aaab``. The comma may not be omitted or the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   142
   modifier would be confused with the previously described form.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   143
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   144
``{m,n}?``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   145
   Causes the resulting RE to match from *m* to *n* repetitions of the preceding
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   146
   RE, attempting to match as *few* repetitions as possible.  This is the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   147
   non-greedy version of the previous qualifier.  For example, on the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   148
   6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   149
   while ``a{3,5}?`` will only match 3 characters.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   150
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   151
``'\'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   152
   Either escapes special characters (permitting you to match characters like
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   153
   ``'*'``, ``'?'``, and so forth), or signals a special sequence; special
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   154
   sequences are discussed below.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   155
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   156
   If you're not using a raw string to express the pattern, remember that Python
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   157
   also uses the backslash as an escape sequence in string literals; if the escape
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   158
   sequence isn't recognized by Python's parser, the backslash and subsequent
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   159
   character are included in the resulting string.  However, if Python would
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   160
   recognize the resulting sequence, the backslash should be repeated twice.  This
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   161
   is complicated and hard to understand, so it's highly recommended that you use
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   162
   raw strings for all but the simplest expressions.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   163
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   164
``[]``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   165
   Used to indicate a set of characters.  Characters can be listed individually, or
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   166
   a range of characters can be indicated by giving two characters and separating
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   167
   them by a ``'-'``.  Special characters are not active inside sets.  For example,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   168
   ``[akm$]`` will match any of the characters ``'a'``, ``'k'``,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   169
   ``'m'``, or ``'$'``; ``[a-z]`` will match any lowercase letter, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   170
   ``[a-zA-Z0-9]`` matches any letter or digit.  Character classes such
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   171
   as ``\w`` or ``\S`` (defined below) are also acceptable inside a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   172
   range, although the characters they match depends on whether :const:`LOCALE`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   173
   or  :const:`UNICODE` mode is in force.  If you want to include a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   174
   ``']'`` or a ``'-'`` inside a set, precede it with a backslash, or
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   175
   place it as the first character.  The pattern ``[]]`` will match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   176
   ``']'``, for example.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   177
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   178
   You can match the characters not within a range by :dfn:`complementing` the set.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   179
   This is indicated by including a ``'^'`` as the first character of the set;
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   180
   ``'^'`` elsewhere will simply match the ``'^'`` character.  For example,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   181
   ``[^5]`` will match any character except ``'5'``, and ``[^^]`` will match any
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   182
   character except ``'^'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   183
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   184
   Note that inside ``[]`` the special forms and special characters lose
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   185
   their meanings and only the syntaxes described here are valid. For
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   186
   example, ``+``, ``*``, ``(``, ``)``, and so on are treated as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   187
   literals inside ``[]``, and backreferences cannot be used inside
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   188
   ``[]``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   189
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   190
``'|'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   191
   ``A|B``, where A and B can be arbitrary REs, creates a regular expression that
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   192
   will match either A or B.  An arbitrary number of REs can be separated by the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   193
   ``'|'`` in this way.  This can be used inside groups (see below) as well.  As
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   194
   the target string is scanned, REs separated by ``'|'`` are tried from left to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   195
   right. When one pattern completely matches, that branch is accepted. This means
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   196
   that once ``A`` matches, ``B`` will not be tested further, even if it would
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   197
   produce a longer overall match.  In other words, the ``'|'`` operator is never
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   198
   greedy.  To match a literal ``'|'``, use ``\|``, or enclose it inside a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   199
   character class, as in ``[|]``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   200
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   201
``(...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   202
   Matches whatever regular expression is inside the parentheses, and indicates the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   203
   start and end of a group; the contents of a group can be retrieved after a match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   204
   has been performed, and can be matched later in the string with the ``\number``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   205
   special sequence, described below.  To match the literals ``'('`` or ``')'``,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   206
   use ``\(`` or ``\)``, or enclose them inside a character class: ``[(] [)]``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   207
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   208
``(?...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   209
   This is an extension notation (a ``'?'`` following a ``'('`` is not meaningful
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   210
   otherwise).  The first character after the ``'?'`` determines what the meaning
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   211
   and further syntax of the construct is. Extensions usually do not create a new
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   212
   group; ``(?P<name>...)`` is the only exception to this rule. Following are the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   213
   currently supported extensions.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   214
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   215
``(?iLmsux)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   216
   (One or more letters from the set ``'i'``, ``'L'``, ``'m'``, ``'s'``,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   217
   ``'u'``, ``'x'``.)  The group matches the empty string; the letters
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   218
   set the corresponding flags: :const:`re.I` (ignore case),
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   219
   :const:`re.L` (locale dependent), :const:`re.M` (multi-line),
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   220
   :const:`re.S` (dot matches all), :const:`re.U` (Unicode dependent),
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   221
   and :const:`re.X` (verbose), for the entire regular expression. (The
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   222
   flags are described in :ref:`contents-of-module-re`.) This
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   223
   is useful if you wish to include the flags as part of the regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   224
   expression, instead of passing a *flag* argument to the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   225
   :func:`compile` function.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   226
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   227
   Note that the ``(?x)`` flag changes how the expression is parsed. It should be
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   228
   used first in the expression string, or after one or more whitespace characters.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   229
   If there are non-whitespace characters before the flag, the results are
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   230
   undefined.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   231
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   232
``(?:...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   233
   A non-grouping version of regular parentheses. Matches whatever regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   234
   expression is inside the parentheses, but the substring matched by the group
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   235
   *cannot* be retrieved after performing a match or referenced later in the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   236
   pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   237
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   238
``(?P<name>...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   239
   Similar to regular parentheses, but the substring matched by the group is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   240
   accessible via the symbolic group name *name*.  Group names must be valid Python
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   241
   identifiers, and each group name must be defined only once within a regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   242
   expression.  A symbolic group is also a numbered group, just as if the group
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   243
   were not named.  So the group named 'id' in the example below can also be
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   244
   referenced as the numbered group 1.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   245
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   246
   For example, if the pattern is ``(?P<id>[a-zA-Z_]\w*)``, the group can be
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   247
   referenced by its name in arguments to methods of match objects, such as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   248
   ``m.group('id')`` or ``m.end('id')``, and also by name in pattern text (for
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   249
   example, ``(?P=id)``) and replacement text (such as ``\g<id>``).
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   250
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   251
``(?P=name)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   252
   Matches whatever text was matched by the earlier group named *name*.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   253
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   254
``(?#...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   255
   A comment; the contents of the parentheses are simply ignored.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   256
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   257
``(?=...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   258
   Matches if ``...`` matches next, but doesn't consume any of the string.  This is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   259
   called a lookahead assertion.  For example, ``Isaac (?=Asimov)`` will match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   260
   ``'Isaac '`` only if it's followed by ``'Asimov'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   261
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   262
``(?!...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   263
   Matches if ``...`` doesn't match next.  This is a negative lookahead assertion.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   264
   For example, ``Isaac (?!Asimov)`` will match ``'Isaac '`` only if it's *not*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   265
   followed by ``'Asimov'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   266
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   267
``(?<=...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   268
   Matches if the current position in the string is preceded by a match for ``...``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   269
   that ends at the current position.  This is called a :dfn:`positive lookbehind
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   270
   assertion`. ``(?<=abc)def`` will find a match in ``abcdef``, since the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   271
   lookbehind will back up 3 characters and check if the contained pattern matches.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   272
   The contained pattern must only match strings of some fixed length, meaning that
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   273
   ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not.  Note that
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   274
   patterns which start with positive lookbehind assertions will never match at the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   275
   beginning of the string being searched; you will most likely want to use the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   276
   :func:`search` function rather than the :func:`match` function:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   277
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   278
      >>> import re
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   279
      >>> m = re.search('(?<=abc)def', 'abcdef')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   280
      >>> m.group(0)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   281
      'def'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   282
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   283
   This example looks for a word following a hyphen:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   284
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   285
      >>> m = re.search('(?<=-)\w+', 'spam-egg')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   286
      >>> m.group(0)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   287
      'egg'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   288
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   289
``(?<!...)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   290
   Matches if the current position in the string is not preceded by a match for
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   291
   ``...``.  This is called a :dfn:`negative lookbehind assertion`.  Similar to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   292
   positive lookbehind assertions, the contained pattern must only match strings of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   293
   some fixed length.  Patterns which start with negative lookbehind assertions may
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   294
   match at the beginning of the string being searched.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   295
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   296
``(?(id/name)yes-pattern|no-pattern)``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   297
   Will try to match with ``yes-pattern`` if the group with given *id* or *name*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   298
   exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is optional and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   299
   can be omitted. For example,  ``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)`` is a poor email
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   300
   matching pattern, which will match with ``'<user@host.com>'`` as well as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   301
   ``'user@host.com'``, but not with ``'<user@host.com'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   302
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   303
   .. versionadded:: 2.4
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   304
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   305
The special sequences consist of ``'\'`` and a character from the list below.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   306
If the ordinary character is not on the list, then the resulting RE will match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   307
the second character.  For example, ``\$`` matches the character ``'$'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   308
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   309
``\number``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   310
   Matches the contents of the group of the same number.  Groups are numbered
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   311
   starting from 1.  For example, ``(.+) \1`` matches ``'the the'`` or ``'55 55'``,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   312
   but not ``'the end'`` (note the space after the group).  This special sequence
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   313
   can only be used to match one of the first 99 groups.  If the first digit of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   314
   *number* is 0, or *number* is 3 octal digits long, it will not be interpreted as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   315
   a group match, but as the character with octal value *number*. Inside the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   316
   ``'['`` and ``']'`` of a character class, all numeric escapes are treated as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   317
   characters.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   318
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   319
``\A``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   320
   Matches only at the start of the string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   321
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   322
``\b``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   323
   Matches the empty string, but only at the beginning or end of a word.  A word is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   324
   defined as a sequence of alphanumeric or underscore characters, so the end of a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   325
   word is indicated by whitespace or a non-alphanumeric, non-underscore character.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   326
   Note that  ``\b`` is defined as the boundary between ``\w`` and ``\ W``, so the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   327
   precise set of characters deemed to be alphanumeric depends on the values of the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   328
   ``UNICODE`` and ``LOCALE`` flags.  Inside a character range, ``\b`` represents
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   329
   the backspace character, for compatibility with Python's string literals.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   330
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   331
``\B``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   332
   Matches the empty string, but only when it is *not* at the beginning or end of a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   333
   word.  This is just the opposite of ``\b``, so is also subject to the settings
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   334
   of ``LOCALE`` and ``UNICODE``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   335
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   336
``\d``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   337
   When the :const:`UNICODE` flag is not specified, matches any decimal digit; this
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   338
   is equivalent to the set ``[0-9]``.  With :const:`UNICODE`, it will match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   339
   whatever is classified as a digit in the Unicode character properties database.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   340
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   341
``\D``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   342
   When the :const:`UNICODE` flag is not specified, matches any non-digit
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   343
   character; this is equivalent to the set  ``[^0-9]``.  With :const:`UNICODE`, it
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   344
   will match  anything other than character marked as digits in the Unicode
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   345
   character  properties database.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   346
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   347
``\s``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   348
   When the :const:`LOCALE` and :const:`UNICODE` flags are not specified, matches
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   349
   any whitespace character; this is equivalent to the set ``[ \t\n\r\f\v]``. With
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   350
   :const:`LOCALE`, it will match this set plus whatever characters are defined as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   351
   space for the current locale. If :const:`UNICODE` is set, this will match the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   352
   characters ``[ \t\n\r\f\v]`` plus whatever is classified as space in the Unicode
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   353
   character properties database.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   354
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   355
``\S``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   356
   When the :const:`LOCALE` and :const:`UNICODE` flags are not specified, matches
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   357
   any non-whitespace character; this is equivalent to the set ``[^ \t\n\r\f\v]``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   358
   With :const:`LOCALE`, it will match any character not in this set, and not
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   359
   defined as space in the current locale. If :const:`UNICODE` is set, this will
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   360
   match anything other than ``[ \t\n\r\f\v]`` and characters marked as space in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   361
   the Unicode character properties database.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   362
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   363
``\w``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   364
   When the :const:`LOCALE` and :const:`UNICODE` flags are not specified, matches
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   365
   any alphanumeric character and the underscore; this is equivalent to the set
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   366
   ``[a-zA-Z0-9_]``.  With :const:`LOCALE`, it will match the set ``[0-9_]`` plus
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   367
   whatever characters are defined as alphanumeric for the current locale.  If
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   368
   :const:`UNICODE` is set, this will match the characters ``[0-9_]`` plus whatever
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   369
   is classified as alphanumeric in the Unicode character properties database.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   370
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   371
``\W``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   372
   When the :const:`LOCALE` and :const:`UNICODE` flags are not specified, matches
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   373
   any non-alphanumeric character; this is equivalent to the set ``[^a-zA-Z0-9_]``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   374
   With :const:`LOCALE`, it will match any character not in the set ``[0-9_]``, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   375
   not defined as alphanumeric for the current locale. If :const:`UNICODE` is set,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   376
   this will match anything other than ``[0-9_]`` and characters marked as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   377
   alphanumeric in the Unicode character properties database.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   378
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   379
``\Z``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   380
   Matches only at the end of the string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   381
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   382
Most of the standard escapes supported by Python string literals are also
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   383
accepted by the regular expression parser::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   384
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   385
   \a      \b      \f      \n
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   386
   \r      \t      \v      \x
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   387
   \\
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   388
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   389
Octal escapes are included in a limited form: If the first digit is a 0, or if
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   390
there are three octal digits, it is considered an octal escape. Otherwise, it is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   391
a group reference.  As for string literals, octal escapes are always at most
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   392
three digits in length.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   393
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   394
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   395
.. _matching-searching:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   396
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   397
Matching vs Searching
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   398
---------------------
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   399
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   400
.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   401
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   402
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   403
Python offers two different primitive operations based on regular expressions:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   404
**match** checks for a match only at the beginning of the string, while
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   405
**search** checks for a match anywhere in the string (this is what Perl does
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   406
by default).
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   407
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   408
Note that match may differ from search even when using a regular expression
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   409
beginning with ``'^'``: ``'^'`` matches only at the start of the string, or in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   410
:const:`MULTILINE` mode also immediately following a newline.  The "match"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   411
operation succeeds only if the pattern matches at the start of the string
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   412
regardless of mode, or at the starting position given by the optional *pos*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   413
argument regardless of whether a newline precedes it.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   414
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   415
   >>> re.match("c", "abcdef")  # No match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   416
   >>> re.search("c", "abcdef") # Match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   417
   <_sre.SRE_Match object at ...>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   418
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   419
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   420
.. _contents-of-module-re:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   421
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   422
Module Contents
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   423
---------------
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   424
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   425
The module defines several functions, constants, and an exception. Some of the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   426
functions are simplified versions of the full featured methods for compiled
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   427
regular expressions.  Most non-trivial applications always use the compiled
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   428
form.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   429
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   430
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   431
.. function:: compile(pattern[, flags])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   432
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   433
   Compile a regular expression pattern into a regular expression object, which
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   434
   can be used for matching using its :func:`match` and :func:`search` methods,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   435
   described below.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   436
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   437
   The expression's behaviour can be modified by specifying a *flags* value.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   438
   Values can be any of the following variables, combined using bitwise OR (the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   439
   ``|`` operator).
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   440
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   441
   The sequence ::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   442
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   443
      prog = re.compile(pat)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   444
      result = prog.match(str)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   445
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   446
   is equivalent to ::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   447
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   448
      result = re.match(pat, str)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   449
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   450
   but the version using :func:`compile` is more efficient when the expression
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   451
   will be used several times in a single program.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   452
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   453
   .. (The compiled version of the last pattern passed to :func:`re.match` or
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   454
      :func:`re.search` is cached, so programs that use only a single regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   455
      expression at a time needn't worry about compiling regular expressions.)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   456
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   457
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   458
.. data:: I
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   459
          IGNORECASE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   460
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   461
   Perform case-insensitive matching; expressions like ``[A-Z]`` will match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   462
   lowercase letters, too.  This is not affected by the current locale.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   463
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   464
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   465
.. data:: L
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   466
          LOCALE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   467
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   468
   Make ``\w``, ``\W``, ``\b``, ``\B``, ``\s`` and ``\S`` dependent on the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   469
   current locale.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   470
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   471
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   472
.. data:: M
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   473
          MULTILINE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   474
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   475
   When specified, the pattern character ``'^'`` matches at the beginning of the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   476
   string and at the beginning of each line (immediately following each newline);
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   477
   and the pattern character ``'$'`` matches at the end of the string and at the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   478
   end of each line (immediately preceding each newline).  By default, ``'^'``
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   479
   matches only at the beginning of the string, and ``'$'`` only at the end of the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   480
   string and immediately before the newline (if any) at the end of the string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   481
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   482
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   483
.. data:: S
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   484
          DOTALL
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   485
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   486
   Make the ``'.'`` special character match any character at all, including a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   487
   newline; without this flag, ``'.'`` will match anything *except* a newline.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   488
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   489
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   490
.. data:: U
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   491
          UNICODE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   492
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   493
   Make ``\w``, ``\W``, ``\b``, ``\B``, ``\d``, ``\D``, ``\s`` and ``\S`` dependent
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   494
   on the Unicode character properties database.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   495
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   496
   .. versionadded:: 2.0
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   497
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   498
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   499
.. data:: X
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   500
          VERBOSE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   501
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   502
   This flag allows you to write regular expressions that look nicer. Whitespace
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   503
   within the pattern is ignored, except when in a character class or preceded by
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   504
   an unescaped backslash, and, when a line contains a ``'#'`` neither in a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   505
   character class or preceded by an unescaped backslash, all characters from the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   506
   leftmost such ``'#'`` through the end of the line are ignored.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   507
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   508
   That means that the two following regular expression objects that match a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   509
   decimal number are functionally equal::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   510
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   511
      a = re.compile(r"""\d +  # the integral part
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   512
                         \.    # the decimal point
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   513
                         \d *  # some fractional digits""", re.X)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   514
      b = re.compile(r"\d+\.\d*")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   515
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   516
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   517
.. function:: search(pattern, string[, flags])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   518
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   519
   Scan through *string* looking for a location where the regular expression
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   520
   *pattern* produces a match, and return a corresponding :class:`MatchObject`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   521
   instance. Return ``None`` if no position in the string matches the pattern; note
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   522
   that this is different from finding a zero-length match at some point in the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   523
   string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   524
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   525
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   526
.. function:: match(pattern, string[, flags])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   527
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   528
   If zero or more characters at the beginning of *string* match the regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   529
   expression *pattern*, return a corresponding :class:`MatchObject` instance.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   530
   Return ``None`` if the string does not match the pattern; note that this is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   531
   different from a zero-length match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   532
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   533
   .. note::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   534
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   535
      If you want to locate a match anywhere in *string*, use :meth:`search`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   536
      instead.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   537
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   538
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   539
.. function:: split(pattern, string[, maxsplit=0])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   540
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   541
   Split *string* by the occurrences of *pattern*.  If capturing parentheses are
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   542
   used in *pattern*, then the text of all groups in the pattern are also returned
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   543
   as part of the resulting list. If *maxsplit* is nonzero, at most *maxsplit*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   544
   splits occur, and the remainder of the string is returned as the final element
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   545
   of the list.  (Incompatibility note: in the original Python 1.5 release,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   546
   *maxsplit* was ignored.  This has been fixed in later releases.)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   547
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   548
      >>> re.split('\W+', 'Words, words, words.')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   549
      ['Words', 'words', 'words', '']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   550
      >>> re.split('(\W+)', 'Words, words, words.')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   551
      ['Words', ', ', 'words', ', ', 'words', '.', '']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   552
      >>> re.split('\W+', 'Words, words, words.', 1)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   553
      ['Words', 'words, words.']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   554
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   555
   If there are capturing groups in the separator and it matches at the start of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   556
   the string, the result will start with an empty string.  The same holds for
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   557
   the end of the string:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   558
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   559
      >>> re.split('(\W+)', '...words, words...')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   560
      ['', '...', 'words', ', ', 'words', '...', '']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   561
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   562
   That way, separator components are always found at the same relative
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   563
   indices within the result list (e.g., if there's one capturing group
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   564
   in the separator, the 0th, the 2nd and so forth).
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   565
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   566
   Note that *split* will never split a string on an empty pattern match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   567
   For example:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   568
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   569
      >>> re.split('x*', 'foo')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   570
      ['foo']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   571
      >>> re.split("(?m)^$", "foo\n\nbar\n")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   572
      ['foo\n\nbar\n']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   573
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   574
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   575
.. function:: findall(pattern, string[, flags])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   576
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   577
   Return all non-overlapping matches of *pattern* in *string*, as a list of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   578
   strings.  The *string* is scanned left-to-right, and matches are returned in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   579
   the order found.  If one or more groups are present in the pattern, return a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   580
   list of groups; this will be a list of tuples if the pattern has more than
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   581
   one group.  Empty matches are included in the result unless they touch the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   582
   beginning of another match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   583
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   584
   .. versionadded:: 1.5.2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   585
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   586
   .. versionchanged:: 2.4
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   587
      Added the optional flags argument.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   588
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   589
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   590
.. function:: finditer(pattern, string[, flags])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   591
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   592
   Return an :term:`iterator` yielding :class:`MatchObject` instances over all
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   593
   non-overlapping matches for the RE *pattern* in *string*.  The *string* is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   594
   scanned left-to-right, and matches are returned in the order found.  Empty
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   595
   matches are included in the result unless they touch the beginning of another
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   596
   match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   597
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   598
   .. versionadded:: 2.2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   599
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   600
   .. versionchanged:: 2.4
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   601
      Added the optional flags argument.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   602
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   603
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   604
.. function:: sub(pattern, repl, string[, count])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   605
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   606
   Return the string obtained by replacing the leftmost non-overlapping occurrences
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   607
   of *pattern* in *string* by the replacement *repl*.  If the pattern isn't found,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   608
   *string* is returned unchanged.  *repl* can be a string or a function; if it is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   609
   a string, any backslash escapes in it are processed.  That is, ``\n`` is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   610
   converted to a single newline character, ``\r`` is converted to a linefeed, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   611
   so forth.  Unknown escapes such as ``\j`` are left alone.  Backreferences, such
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   612
   as ``\6``, are replaced with the substring matched by group 6 in the pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   613
   For example:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   614
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   615
      >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   616
      ...        r'static PyObject*\npy_\1(void)\n{',
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   617
      ...        'def myfunc():')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   618
      'static PyObject*\npy_myfunc(void)\n{'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   619
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   620
   If *repl* is a function, it is called for every non-overlapping occurrence of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   621
   *pattern*.  The function takes a single match object argument, and returns the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   622
   replacement string.  For example:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   623
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   624
      >>> def dashrepl(matchobj):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   625
      ...     if matchobj.group(0) == '-': return ' '
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   626
      ...     else: return '-'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   627
      >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   628
      'pro--gram files'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   629
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   630
   The pattern may be a string or an RE object; if you need to specify regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   631
   expression flags, you must use a RE object, or use embedded modifiers in a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   632
   pattern; for example, ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   633
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   634
   The optional argument *count* is the maximum number of pattern occurrences to be
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   635
   replaced; *count* must be a non-negative integer.  If omitted or zero, all
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   636
   occurrences will be replaced. Empty matches for the pattern are replaced only
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   637
   when not adjacent to a previous match, so ``sub('x*', '-', 'abc')`` returns
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   638
   ``'-a-b-c-'``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   639
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   640
   In addition to character escapes and backreferences as described above,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   641
   ``\g<name>`` will use the substring matched by the group named ``name``, as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   642
   defined by the ``(?P<name>...)`` syntax. ``\g<number>`` uses the corresponding
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   643
   group number; ``\g<2>`` is therefore equivalent to ``\2``, but isn't ambiguous
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   644
   in a replacement such as ``\g<2>0``.  ``\20`` would be interpreted as a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   645
   reference to group 20, not a reference to group 2 followed by the literal
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   646
   character ``'0'``.  The backreference ``\g<0>`` substitutes in the entire
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   647
   substring matched by the RE.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   648
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   649
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   650
.. function:: subn(pattern, repl, string[, count])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   651
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   652
   Perform the same operation as :func:`sub`, but return a tuple ``(new_string,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   653
   number_of_subs_made)``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   654
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   655
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   656
.. function:: escape(string)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   657
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   658
   Return *string* with all non-alphanumerics backslashed; this is useful if you
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   659
   want to match an arbitrary literal string that may have regular expression
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   660
   metacharacters in it.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   661
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   662
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   663
.. exception:: error
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   664
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   665
   Exception raised when a string passed to one of the functions here is not a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   666
   valid regular expression (for example, it might contain unmatched parentheses)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   667
   or when some other error occurs during compilation or matching.  It is never an
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   668
   error if a string contains no match for a pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   669
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   670
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   671
.. _re-objects:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   672
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   673
Regular Expression Objects
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   674
--------------------------
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   675
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   676
Compiled regular expression objects support the following methods and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   677
attributes:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   678
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   679
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   680
.. method:: RegexObject.match(string[, pos[, endpos]])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   681
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   682
   If zero or more characters at the beginning of *string* match this regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   683
   expression, return a corresponding :class:`MatchObject` instance.  Return
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   684
   ``None`` if the string does not match the pattern; note that this is different
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   685
   from a zero-length match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   686
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   687
   .. note::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   688
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   689
      If you want to locate a match anywhere in *string*, use :meth:`search`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   690
      instead.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   691
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   692
   The optional second parameter *pos* gives an index in the string where the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   693
   search is to start; it defaults to ``0``.  This is not completely equivalent to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   694
   slicing the string; the ``'^'`` pattern character matches at the real beginning
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   695
   of the string and at positions just after a newline, but not necessarily at the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   696
   index where the search is to start.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   697
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   698
   The optional parameter *endpos* limits how far the string will be searched; it
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   699
   will be as if the string is *endpos* characters long, so only the characters
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   700
   from *pos* to ``endpos - 1`` will be searched for a match.  If *endpos* is less
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   701
   than *pos*, no match will be found, otherwise, if *rx* is a compiled regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   702
   expression object, ``rx.match(string, 0, 50)`` is equivalent to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   703
   ``rx.match(string[:50], 0)``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   704
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   705
      >>> pattern = re.compile("o")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   706
      >>> pattern.match("dog")      # No match as "o" is not at the start of "dog."
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   707
      >>> pattern.match("dog", 1)   # Match as "o" is the 2nd character of "dog".
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   708
      <_sre.SRE_Match object at ...>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   709
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   710
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   711
.. method:: RegexObject.search(string[, pos[, endpos]])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   712
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   713
   Scan through *string* looking for a location where this regular expression
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   714
   produces a match, and return a corresponding :class:`MatchObject` instance.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   715
   Return ``None`` if no position in the string matches the pattern; note that this
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   716
   is different from finding a zero-length match at some point in the string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   717
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   718
   The optional *pos* and *endpos* parameters have the same meaning as for the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   719
   :meth:`match` method.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   720
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   721
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   722
.. method:: RegexObject.split(string[, maxsplit=0])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   723
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   724
   Identical to the :func:`split` function, using the compiled pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   725
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   726
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   727
.. method:: RegexObject.findall(string[, pos[, endpos]])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   728
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   729
   Identical to the :func:`findall` function, using the compiled pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   730
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   731
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   732
.. method:: RegexObject.finditer(string[, pos[, endpos]])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   733
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   734
   Identical to the :func:`finditer` function, using the compiled pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   735
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   736
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   737
.. method:: RegexObject.sub(repl, string[, count=0])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   738
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   739
   Identical to the :func:`sub` function, using the compiled pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   740
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   741
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   742
.. method:: RegexObject.subn(repl, string[, count=0])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   743
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   744
   Identical to the :func:`subn` function, using the compiled pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   745
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   746
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   747
.. attribute:: RegexObject.flags
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   748
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   749
   The flags argument used when the RE object was compiled, or ``0`` if no flags
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   750
   were provided.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   751
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   752
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   753
.. attribute:: RegexObject.groupindex
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   754
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   755
   A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   756
   numbers.  The dictionary is empty if no symbolic groups were used in the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   757
   pattern.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   758
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   759
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   760
.. attribute:: RegexObject.pattern
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   761
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   762
   The pattern string from which the RE object was compiled.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   763
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   764
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   765
.. _match-objects:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   766
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   767
Match Objects
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   768
-------------
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   769
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   770
Match objects always have a boolean value of :const:`True`, so that you can test
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   771
whether e.g. :func:`match` resulted in a match with a simple if statement.  They
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   772
support the following methods and attributes:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   773
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   774
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   775
.. method:: MatchObject.expand(template)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   776
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   777
   Return the string obtained by doing backslash substitution on the template
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   778
   string *template*, as done by the :meth:`sub` method. Escapes such as ``\n`` are
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   779
   converted to the appropriate characters, and numeric backreferences (``\1``,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   780
   ``\2``) and named backreferences (``\g<1>``, ``\g<name>``) are replaced by the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   781
   contents of the corresponding group.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   782
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   783
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   784
.. method:: MatchObject.group([group1, ...])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   785
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   786
   Returns one or more subgroups of the match.  If there is a single argument, the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   787
   result is a single string; if there are multiple arguments, the result is a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   788
   tuple with one item per argument. Without arguments, *group1* defaults to zero
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   789
   (the whole match is returned). If a *groupN* argument is zero, the corresponding
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   790
   return value is the entire matching string; if it is in the inclusive range
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   791
   [1..99], it is the string matching the corresponding parenthesized group.  If a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   792
   group number is negative or larger than the number of groups defined in the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   793
   pattern, an :exc:`IndexError` exception is raised. If a group is contained in a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   794
   part of the pattern that did not match, the corresponding result is ``None``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   795
   If a group is contained in a part of the pattern that matched multiple times,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   796
   the last match is returned.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   797
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   798
      >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   799
      >>> m.group(0)       # The entire match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   800
      'Isaac Newton'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   801
      >>> m.group(1)       # The first parenthesized subgroup.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   802
      'Isaac'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   803
      >>> m.group(2)       # The second parenthesized subgroup.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   804
      'Newton'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   805
      >>> m.group(1, 2)    # Multiple arguments give us a tuple.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   806
      ('Isaac', 'Newton')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   807
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   808
   If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   809
   arguments may also be strings identifying groups by their group name.  If a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   810
   string argument is not used as a group name in the pattern, an :exc:`IndexError`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   811
   exception is raised.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   812
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   813
   A moderately complicated example:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   814
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   815
      >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcom Reynolds")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   816
      >>> m.group('first_name')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   817
      'Malcom'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   818
      >>> m.group('last_name')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   819
      'Reynolds'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   820
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   821
   Named groups can also be referred to by their index:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   822
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   823
      >>> m.group(1)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   824
      'Malcom'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   825
      >>> m.group(2)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   826
      'Reynolds'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   827
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   828
   If a group matches multiple times, only the last match is accessible:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   829
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   830
      >>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   831
      >>> m.group(1)                        # Returns only the last match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   832
      'c3'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   833
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   834
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   835
.. method:: MatchObject.groups([default])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   836
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   837
   Return a tuple containing all the subgroups of the match, from 1 up to however
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   838
   many groups are in the pattern.  The *default* argument is used for groups that
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   839
   did not participate in the match; it defaults to ``None``.  (Incompatibility
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   840
   note: in the original Python 1.5 release, if the tuple was one element long, a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   841
   string would be returned instead.  In later versions (from 1.5.1 on), a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   842
   singleton tuple is returned in such cases.)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   843
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   844
   For example:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   845
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   846
      >>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   847
      >>> m.groups()
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   848
      ('24', '1632')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   849
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   850
   If we make the decimal place and everything after it optional, not all groups
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   851
   might participate in the match.  These groups will default to ``None`` unless
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   852
   the *default* argument is given:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   853
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   854
      >>> m = re.match(r"(\d+)\.?(\d+)?", "24")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   855
      >>> m.groups()      # Second group defaults to None.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   856
      ('24', None)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   857
      >>> m.groups('0')   # Now, the second group defaults to '0'.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   858
      ('24', '0')
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   859
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   860
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   861
.. method:: MatchObject.groupdict([default])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   862
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   863
   Return a dictionary containing all the *named* subgroups of the match, keyed by
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   864
   the subgroup name.  The *default* argument is used for groups that did not
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   865
   participate in the match; it defaults to ``None``.  For example:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   866
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   867
      >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcom Reynolds")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   868
      >>> m.groupdict()
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   869
      {'first_name': 'Malcom', 'last_name': 'Reynolds'}
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   870
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   871
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   872
.. method:: MatchObject.start([group])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   873
            MatchObject.end([group])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   874
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   875
   Return the indices of the start and end of the substring matched by *group*;
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   876
   *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   877
   *group* exists but did not contribute to the match.  For a match object *m*, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   878
   a group *g* that did contribute to the match, the substring matched by group *g*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   879
   (equivalent to ``m.group(g)``) is ::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   880
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   881
      m.string[m.start(g):m.end(g)]
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   882
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   883
   Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   884
   null string.  For example, after ``m = re.search('b(c?)', 'cba')``,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   885
   ``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   886
   2, and ``m.start(2)`` raises an :exc:`IndexError` exception.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   887
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   888
   An example that will remove *remove_this* from email addresses:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   889
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   890
      >>> email = "tony@tiremove_thisger.net"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   891
      >>> m = re.search("remove_this", email)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   892
      >>> email[:m.start()] + email[m.end():]
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   893
      'tony@tiger.net'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   894
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   895
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   896
.. method:: MatchObject.span([group])
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   897
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   898
   For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group),
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   899
   m.end(group))``. Note that if *group* did not contribute to the match, this is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   900
   ``(-1, -1)``.  *group* defaults to zero, the entire match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   901
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   902
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   903
.. attribute:: MatchObject.pos
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   904
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   905
   The value of *pos* which was passed to the :func:`search` or :func:`match`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   906
   method of the :class:`RegexObject`.  This is the index into the string at which
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   907
   the RE engine started looking for a match.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   908
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   909
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   910
.. attribute:: MatchObject.endpos
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   911
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   912
   The value of *endpos* which was passed to the :func:`search` or :func:`match`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   913
   method of the :class:`RegexObject`.  This is the index into the string beyond
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   914
   which the RE engine will not go.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   915
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   916
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   917
.. attribute:: MatchObject.lastindex
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   918
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   919
   The integer index of the last matched capturing group, or ``None`` if no group
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   920
   was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   921
   ``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   922
   the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   923
   string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   924
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   925
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   926
.. attribute:: MatchObject.lastgroup
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   927
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   928
   The name of the last matched capturing group, or ``None`` if the group didn't
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   929
   have a name, or if no group was matched at all.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   930
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   931
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   932
.. attribute:: MatchObject.re
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   933
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   934
   The regular expression object whose :meth:`match` or :meth:`search` method
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   935
   produced this :class:`MatchObject` instance.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   936
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   937
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   938
.. attribute:: MatchObject.string
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   939
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   940
   The string passed to :func:`match` or :func:`search`.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   941
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   942
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   943
Examples
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   944
--------
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   945
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   946
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   947
Checking For a Pair
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   948
^^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   949
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   950
In this example, we'll use the following helper function to display match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   951
objects a little more gracefully:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   952
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   953
.. testcode::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   954
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   955
   def displaymatch(match):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   956
       if match is None:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   957
           return None
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   958
       return '<Match: %r, groups=%r>' % (match.group(), match.groups())
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   959
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   960
Suppose you are writing a poker program where a player's hand is represented as
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   961
a 5-character string with each character representing a card, "a" for ace, "k"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   962
for king, "q" for queen, j for jack, "0" for 10, and "1" through "9"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   963
representing the card with that value.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   964
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   965
To see if a given string is a valid hand, one could do the following:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   966
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   967
   >>> valid = re.compile(r"[0-9akqj]{5}$")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   968
   >>> displaymatch(valid.match("ak05q"))  # Valid.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   969
   "<Match: 'ak05q', groups=()>"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   970
   >>> displaymatch(valid.match("ak05e"))  # Invalid.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   971
   >>> displaymatch(valid.match("ak0"))    # Invalid.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   972
   >>> displaymatch(valid.match("727ak"))  # Valid.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   973
   "<Match: '727ak', groups=()>"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   974
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   975
That last hand, ``"727ak"``, contained a pair, or two of the same valued cards.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   976
To match this with a regular expression, one could use backreferences as such:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   977
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   978
   >>> pair = re.compile(r".*(.).*\1")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   979
   >>> displaymatch(pair.match("717ak"))     # Pair of 7s.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   980
   "<Match: '717', groups=('7',)>"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   981
   >>> displaymatch(pair.match("718ak"))     # No pairs.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   982
   >>> displaymatch(pair.match("354aa"))     # Pair of aces.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   983
   "<Match: '354aa', groups=('a',)>"
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   984
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   985
To find out what card the pair consists of, one could use the :func:`group`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   986
method of :class:`MatchObject` in the following manner:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   987
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   988
.. doctest::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   989
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   990
   >>> pair.match("717ak").group(1)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   991
   '7'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   992
   
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   993
   # Error because re.match() returns None, which doesn't have a group() method:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   994
   >>> pair.match("718ak").group(1)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   995
   Traceback (most recent call last):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   996
     File "<pyshell#23>", line 1, in <module>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   997
       re.match(r".*(.).*\1", "718ak").group(1)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   998
   AttributeError: 'NoneType' object has no attribute 'group'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   999
   
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1000
   >>> pair.match("354aa").group(1)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1001
   'a'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1002
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1003
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1004
Simulating scanf()
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1005
^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1006
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1007
.. index:: single: scanf()
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1008
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1009
Python does not currently have an equivalent to :cfunc:`scanf`.  Regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1010
expressions are generally more powerful, though also more verbose, than
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1011
:cfunc:`scanf` format strings.  The table below offers some more-or-less
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1012
equivalent mappings between :cfunc:`scanf` format tokens and regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1013
expressions.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1014
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1015
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1016
| :cfunc:`scanf` Token           | Regular Expression                          |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1017
+================================+=============================================+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1018
| ``%c``                         | ``.``                                       |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1019
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1020
| ``%5c``                        | ``.{5}``                                    |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1021
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1022
| ``%d``                         | ``[-+]?\d+``                                |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1023
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1024
| ``%e``, ``%E``, ``%f``, ``%g`` | ``[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?`` |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1025
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1026
| ``%i``                         | ``[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)``     |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1027
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1028
| ``%o``                         | ``0[0-7]*``                                 |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1029
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1030
| ``%s``                         | ``\S+``                                     |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1031
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1032
| ``%u``                         | ``\d+``                                     |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1033
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1034
| ``%x``, ``%X``                 | ``0[xX][\dA-Fa-f]+``                        |
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1035
+--------------------------------+---------------------------------------------+
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1036
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1037
To extract the filename and numbers from a string like ::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1038
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1039
   /usr/sbin/sendmail - 0 errors, 4 warnings
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1040
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1041
you would use a :cfunc:`scanf` format like ::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1042
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1043
   %s - %d errors, %d warnings
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1044
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1045
The equivalent regular expression would be ::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1046
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1047
   (\S+) - (\d+) errors, (\d+) warnings
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1048
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1049
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1050
Avoiding recursion
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1051
^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1052
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1053
If you create regular expressions that require the engine to perform a lot of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1054
recursion, you may encounter a :exc:`RuntimeError` exception with the message
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1055
``maximum recursion limit`` exceeded. For example, ::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1056
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1057
   >>> s = 'Begin ' + 1000*'a very long string ' + 'end'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1058
   >>> re.match('Begin (\w| )*? end', s).end()
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1059
   Traceback (most recent call last):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1060
     File "<stdin>", line 1, in ?
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1061
     File "/usr/local/lib/python2.5/re.py", line 132, in match
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1062
       return _compile(pattern, flags).match(string)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1063
   RuntimeError: maximum recursion limit exceeded
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1064
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1065
You can often restructure your regular expression to avoid recursion.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1066
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1067
Starting with Python 2.3, simple uses of the ``*?`` pattern are special-cased to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1068
avoid recursion.  Thus, the above regular expression can avoid recursion by
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1069
being recast as ``Begin [a-zA-Z0-9_ ]*?end``.  As a further benefit, such
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1070
regular expressions will run faster than their recursive equivalents.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1071
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1072
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1073
search() vs. match()
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1074
^^^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1075
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1076
In a nutshell, :func:`match` only attempts to match a pattern at the beginning
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1077
of a string where :func:`search` will match a pattern anywhere in a string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1078
For example:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1079
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1080
   >>> re.match("o", "dog")  # No match as "o" is not the first letter of "dog".
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1081
   >>> re.search("o", "dog") # Match as search() looks everywhere in the string.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1082
   <_sre.SRE_Match object at ...>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1083
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1084
.. note::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1085
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1086
   The following applies only to regular expression objects like those created
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1087
   with ``re.compile("pattern")``, not the primitives ``re.match(pattern,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1088
   string)`` or ``re.search(pattern, string)``.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1089
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1090
:func:`match` has an optional second parameter that gives an index in the string
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1091
where the search is to start:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1092
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1093
   >>> pattern = re.compile("o")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1094
   >>> pattern.match("dog")      # No match as "o" is not at the start of "dog."
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1095
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1096
   # Equivalent to the above expression as 0 is the default starting index:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1097
   >>> pattern.match("dog", 0)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1098
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1099
   # Match as "o" is the 2nd character of "dog" (index 0 is the first):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1100
   >>> pattern.match("dog", 1)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1101
   <_sre.SRE_Match object at ...>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1102
   >>> pattern.match("dog", 2)   # No match as "o" is not the 3rd character of "dog."
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1103
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1104
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1105
Making a Phonebook
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1106
^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1107
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1108
:func:`split` splits a string into a list delimited by the passed pattern.  The 
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1109
method is invaluable for converting textual data into data structures that can be
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1110
easily read and modified by Python as demonstrated in the following example that
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1111
creates a phonebook.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1112
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1113
First, here is the input.  Normally it may come from a file, here we are using
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1114
triple-quoted string syntax:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1115
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1116
   >>> input = """Ross McFluff: 834.345.1254 155 Elm Street
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1117
   ... 
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1118
   ... Ronald Heathmore: 892.345.3428 436 Finley Avenue
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1119
   ... Frank Burger: 925.541.7625 662 South Dogwood Way
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1120
   ...
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1121
   ...
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1122
   ... Heather Albrecht: 548.326.4584 919 Park Place"""
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1123
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1124
The entries are separated by one or more newlines. Now we convert the string
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1125
into a list with each nonempty line having its own entry:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1126
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1127
.. doctest::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1128
   :options: +NORMALIZE_WHITESPACE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1129
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1130
   >>> entries = re.split("\n+", input)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1131
   >>> entries
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1132
   ['Ross McFluff: 834.345.1254 155 Elm Street',
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1133
   'Ronald Heathmore: 892.345.3428 436 Finley Avenue',
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1134
   'Frank Burger: 925.541.7625 662 South Dogwood Way',
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1135
   'Heather Albrecht: 548.326.4584 919 Park Place']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1136
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1137
Finally, split each entry into a list with first name, last name, telephone
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1138
number, and address.  We use the ``maxsplit`` parameter of :func:`split`
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1139
because the address has spaces, our splitting pattern, in it:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1140
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1141
.. doctest::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1142
   :options: +NORMALIZE_WHITESPACE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1143
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1144
   >>> [re.split(":? ", entry, 3) for entry in entries]
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1145
   [['Ross', 'McFluff', '834.345.1254', '155 Elm Street'],
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1146
   ['Ronald', 'Heathmore', '892.345.3428', '436 Finley Avenue'],
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1147
   ['Frank', 'Burger', '925.541.7625', '662 South Dogwood Way'],
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1148
   ['Heather', 'Albrecht', '548.326.4584', '919 Park Place']]
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1149
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1150
The ``:?`` pattern matches the colon after the last name, so that it does not
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1151
occur in the result list.  With a ``maxsplit`` of ``4``, we could separate the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1152
house number from the street name:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1153
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1154
.. doctest::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1155
   :options: +NORMALIZE_WHITESPACE
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1156
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1157
   >>> [re.split(":? ", entry, 4) for entry in entries]
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1158
   [['Ross', 'McFluff', '834.345.1254', '155', 'Elm Street'],
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1159
   ['Ronald', 'Heathmore', '892.345.3428', '436', 'Finley Avenue'],
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1160
   ['Frank', 'Burger', '925.541.7625', '662', 'South Dogwood Way'],
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1161
   ['Heather', 'Albrecht', '548.326.4584', '919', 'Park Place']]
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1162
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1163
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1164
Text Munging
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1165
^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1166
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1167
:func:`sub` replaces every occurrence of a pattern with a string or the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1168
result of a function.  This example demonstrates using :func:`sub` with
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1169
a function to "munge" text, or randomize the order of all the characters
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1170
in each word of a sentence except for the first and last characters::
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1171
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1172
   >>> def repl(m):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1173
   ...   inner_word = list(m.group(2))
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1174
   ...   random.shuffle(inner_word)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1175
   ...   return m.group(1) + "".join(inner_word) + m.group(3)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1176
   >>> text = "Professor Abdolmalek, please report your absences promptly."
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1177
   >>> re.sub("(\w)(\w+)(\w)", repl, text)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1178
   'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1179
   >>> re.sub("(\w)(\w+)(\w)", repl, text)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1180
   'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.'
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1181
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1182
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1183
Finding all Adverbs
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1184
^^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1185
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1186
:func:`findall` matches *all* occurrences of a pattern, not just the first
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1187
one as :func:`search` does.  For example, if one was a writer and wanted to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1188
find all of the adverbs in some text, he or she might use :func:`findall` in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1189
the following manner:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1190
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1191
   >>> text = "He was carefully disguised but captured quickly by police."
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1192
   >>> re.findall(r"\w+ly", text)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1193
   ['carefully', 'quickly']
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1194
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1195
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1196
Finding all Adverbs and their Positions
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1197
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1198
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1199
If one wants more information about all matches of a pattern than the matched
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1200
text, :func:`finditer` is useful as it provides instances of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1201
:class:`MatchObject` instead of strings.  Continuing with the previous example,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1202
if one was a writer who wanted to find all of the adverbs *and their positions*
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1203
in some text, he or she would use :func:`finditer` in the following manner:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1204
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1205
   >>> text = "He was carefully disguised but captured quickly by police."
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1206
   >>> for m in re.finditer(r"\w+ly", text):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1207
   ...     print '%02d-%02d: %s' % (m.start(), m.end(), m.group(0))
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1208
   07-16: carefully
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1209
   40-47: quickly
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1210
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1211
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1212
Raw String Notation
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1213
^^^^^^^^^^^^^^^^^^^
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1214
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1215
Raw string notation (``r"text"``) keeps regular expressions sane.  Without it,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1216
every backslash (``'\'``) in a regular expression would have to be prefixed with
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1217
another one to escape it.  For example, the two following lines of code are
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1218
functionally identical:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1219
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1220
   >>> re.match(r"\W(.)\1\W", " ff ")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1221
   <_sre.SRE_Match object at ...>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1222
   >>> re.match("\\W(.)\\1\\W", " ff ")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1223
   <_sre.SRE_Match object at ...>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1224
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1225
When one wants to match a literal backslash, it must be escaped in the regular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1226
expression.  With raw string notation, this means ``r"\\"``.  Without raw string
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1227
notation, one must use ``"\\\\"``, making the following lines of code
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1228
functionally identical:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1229
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1230
   >>> re.match(r"\\", r"\\")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1231
   <_sre.SRE_Match object at ...>
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1232
   >>> re.match("\\\\", r"\\")
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
  1233
   <_sre.SRE_Match object at ...>