symbian-qemu-0.9.1-12/python-2.6.1/Doc/library/difflib.rst
changeset 1 2fb8b9db1c86
equal deleted inserted replaced
0:ffa851df0825 1:2fb8b9db1c86
       
     1 :mod:`difflib` --- Helpers for computing deltas
       
     2 ===============================================
       
     3 
       
     4 .. module:: difflib
       
     5    :synopsis: Helpers for computing differences between objects.
       
     6 .. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
       
     7 .. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
       
     8 .. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
       
     9 
       
    10 .. testsetup::
       
    11 
       
    12    import sys
       
    13    from difflib import *
       
    14 
       
    15 .. versionadded:: 2.1
       
    16 
       
    17 This module provides classes and functions for comparing sequences. It
       
    18 can be used for example, for comparing files, and can produce difference
       
    19 information in various formats, including HTML and context and unified
       
    20 diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
       
    21 
       
    22 .. class:: SequenceMatcher
       
    23 
       
    24    This is a flexible class for comparing pairs of sequences of any type, so long
       
    25    as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
       
    26    little fancier than, an algorithm published in the late 1980's by Ratcliff and
       
    27    Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
       
    28    find the longest contiguous matching subsequence that contains no "junk"
       
    29    elements (the Ratcliff and Obershelp algorithm doesn't address junk).  The same
       
    30    idea is then applied recursively to the pieces of the sequences to the left and
       
    31    to the right of the matching subsequence.  This does not yield minimal edit
       
    32    sequences, but does tend to yield matches that "look right" to people.
       
    33 
       
    34    **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
       
    35    case and quadratic time in the expected case. :class:`SequenceMatcher` is
       
    36    quadratic time for the worst case and has expected-case behavior dependent in a
       
    37    complicated way on how many elements the sequences have in common; best case
       
    38    time is linear.
       
    39 
       
    40 
       
    41 .. class:: Differ
       
    42 
       
    43    This is a class for comparing sequences of lines of text, and producing
       
    44    human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
       
    45    both to compare sequences of lines, and to compare sequences of characters
       
    46    within similar (near-matching) lines.
       
    47 
       
    48    Each line of a :class:`Differ` delta begins with a two-letter code:
       
    49 
       
    50    +----------+-------------------------------------------+
       
    51    | Code     | Meaning                                   |
       
    52    +==========+===========================================+
       
    53    | ``'- '`` | line unique to sequence 1                 |
       
    54    +----------+-------------------------------------------+
       
    55    | ``'+ '`` | line unique to sequence 2                 |
       
    56    +----------+-------------------------------------------+
       
    57    | ``'  '`` | line common to both sequences             |
       
    58    +----------+-------------------------------------------+
       
    59    | ``'? '`` | line not present in either input sequence |
       
    60    +----------+-------------------------------------------+
       
    61 
       
    62    Lines beginning with '``?``' attempt to guide the eye to intraline differences,
       
    63    and were not present in either input sequence. These lines can be confusing if
       
    64    the sequences contain tab characters.
       
    65 
       
    66 
       
    67 .. class:: HtmlDiff
       
    68 
       
    69    This class can be used to create an HTML table (or a complete HTML file
       
    70    containing the table) showing a side by side, line by line comparison of text
       
    71    with inter-line and intra-line change highlights.  The table can be generated in
       
    72    either full or contextual difference mode.
       
    73 
       
    74    The constructor for this class is:
       
    75 
       
    76 
       
    77    .. function:: __init__([tabsize][, wrapcolumn][, linejunk][, charjunk])
       
    78 
       
    79       Initializes instance of :class:`HtmlDiff`.
       
    80 
       
    81       *tabsize* is an optional keyword argument to specify tab stop spacing and
       
    82       defaults to ``8``.
       
    83 
       
    84       *wrapcolumn* is an optional keyword to specify column number where lines are
       
    85       broken and wrapped, defaults to ``None`` where lines are not wrapped.
       
    86 
       
    87       *linejunk* and *charjunk* are optional keyword arguments passed into ``ndiff()``
       
    88       (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
       
    89       ``ndiff()`` documentation for argument default values and descriptions.
       
    90 
       
    91    The following methods are public:
       
    92 
       
    93 
       
    94    .. function:: make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
       
    95 
       
    96       Compares *fromlines* and *tolines* (lists of strings) and returns a string which
       
    97       is a complete HTML file containing a table showing line by line differences with
       
    98       inter-line and intra-line changes highlighted.
       
    99 
       
   100       *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
       
   101       column header strings (both default to an empty string).
       
   102 
       
   103       *context* and *numlines* are both optional keyword arguments. Set *context* to
       
   104       ``True`` when contextual differences are to be shown, else the default is
       
   105       ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
       
   106       is ``True`` *numlines* controls the number of context lines which surround the
       
   107       difference highlights.  When *context* is ``False`` *numlines* controls the
       
   108       number of lines which are shown before a difference highlight when using the
       
   109       "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
       
   110       the next difference highlight at the top of the browser without any leading
       
   111       context).
       
   112 
       
   113 
       
   114    .. function:: make_table(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
       
   115 
       
   116       Compares *fromlines* and *tolines* (lists of strings) and returns a string which
       
   117       is a complete HTML table showing line by line differences with inter-line and
       
   118       intra-line changes highlighted.
       
   119 
       
   120       The arguments for this method are the same as those for the :meth:`make_file`
       
   121       method.
       
   122 
       
   123    :file:`Tools/scripts/diff.py` is a command-line front-end to this class and
       
   124    contains a good example of its use.
       
   125 
       
   126    .. versionadded:: 2.4
       
   127 
       
   128 
       
   129 .. function:: context_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
       
   130 
       
   131    Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
       
   132    generating the delta lines) in context diff format.
       
   133 
       
   134    Context diffs are a compact way of showing just the lines that have changed plus
       
   135    a few lines of context.  The changes are shown in a before/after style.  The
       
   136    number of context lines is set by *n* which defaults to three.
       
   137 
       
   138    By default, the diff control lines (those with ``***`` or ``---``) are created
       
   139    with a trailing newline.  This is helpful so that inputs created from
       
   140    :func:`file.readlines` result in diffs that are suitable for use with
       
   141    :func:`file.writelines` since both the inputs and outputs have trailing
       
   142    newlines.
       
   143 
       
   144    For inputs that do not have trailing newlines, set the *lineterm* argument to
       
   145    ``""`` so that the output will be uniformly newline free.
       
   146 
       
   147    The context diff format normally has a header for filenames and modification
       
   148    times.  Any or all of these may be specified using strings for *fromfile*,
       
   149    *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally
       
   150    expressed in the format returned by :func:`time.ctime`.  If not specified, the
       
   151    strings default to blanks.
       
   152 
       
   153       >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
       
   154       >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
       
   155       >>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
       
   156       ...     sys.stdout.write(line)  # doctest: +NORMALIZE_WHITESPACE
       
   157       *** before.py
       
   158       --- after.py
       
   159       ***************
       
   160       *** 1,4 ****
       
   161       ! bacon
       
   162       ! eggs
       
   163       ! ham
       
   164         guido
       
   165       --- 1,4 ----
       
   166       ! python
       
   167       ! eggy
       
   168       ! hamster
       
   169         guido
       
   170 
       
   171    See :ref:`difflib-interface` for a more detailed example.
       
   172 
       
   173    .. versionadded:: 2.3
       
   174 
       
   175 
       
   176 .. function:: get_close_matches(word, possibilities[, n][, cutoff])
       
   177 
       
   178    Return a list of the best "good enough" matches.  *word* is a sequence for which
       
   179    close matches are desired (typically a string), and *possibilities* is a list of
       
   180    sequences against which to match *word* (typically a list of strings).
       
   181 
       
   182    Optional argument *n* (default ``3``) is the maximum number of close matches to
       
   183    return; *n* must be greater than ``0``.
       
   184 
       
   185    Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
       
   186    Possibilities that don't score at least that similar to *word* are ignored.
       
   187 
       
   188    The best (no more than *n*) matches among the possibilities are returned in a
       
   189    list, sorted by similarity score, most similar first.
       
   190 
       
   191       >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
       
   192       ['apple', 'ape']
       
   193       >>> import keyword
       
   194       >>> get_close_matches('wheel', keyword.kwlist)
       
   195       ['while']
       
   196       >>> get_close_matches('apple', keyword.kwlist)
       
   197       []
       
   198       >>> get_close_matches('accept', keyword.kwlist)
       
   199       ['except']
       
   200 
       
   201 
       
   202 .. function:: ndiff(a, b[, linejunk][, charjunk])
       
   203 
       
   204    Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
       
   205    delta (a :term:`generator` generating the delta lines).
       
   206 
       
   207    Optional keyword parameters *linejunk* and *charjunk* are for filter functions
       
   208    (or ``None``):
       
   209 
       
   210    *linejunk*: A function that accepts a single string argument, and returns true
       
   211    if the string is junk, or false if not. The default is (``None``), starting with
       
   212    Python 2.3.  Before then, the default was the module-level function
       
   213    :func:`IS_LINE_JUNK`, which filters out lines without visible characters, except
       
   214    for at most one pound character (``'#'``). As of Python 2.3, the underlying
       
   215    :class:`SequenceMatcher` class does a dynamic analysis of which lines are so
       
   216    frequent as to constitute noise, and this usually works better than the pre-2.3
       
   217    default.
       
   218 
       
   219    *charjunk*: A function that accepts a character (a string of length 1), and
       
   220    returns if the character is junk, or false if not. The default is module-level
       
   221    function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
       
   222    blank or tab; note: bad idea to include newline in this!).
       
   223 
       
   224    :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
       
   225 
       
   226       >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
       
   227       ...              'ore\ntree\nemu\n'.splitlines(1))
       
   228       >>> print ''.join(diff),
       
   229       - one
       
   230       ?  ^
       
   231       + ore
       
   232       ?  ^
       
   233       - two
       
   234       - three
       
   235       ?  -
       
   236       + tree
       
   237       + emu
       
   238 
       
   239 
       
   240 .. function:: restore(sequence, which)
       
   241 
       
   242    Return one of the two sequences that generated a delta.
       
   243 
       
   244    Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
       
   245    lines originating from file 1 or 2 (parameter *which*), stripping off line
       
   246    prefixes.
       
   247 
       
   248    Example:
       
   249 
       
   250       >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
       
   251       ...              'ore\ntree\nemu\n'.splitlines(1))
       
   252       >>> diff = list(diff) # materialize the generated delta into a list
       
   253       >>> print ''.join(restore(diff, 1)),
       
   254       one
       
   255       two
       
   256       three
       
   257       >>> print ''.join(restore(diff, 2)),
       
   258       ore
       
   259       tree
       
   260       emu
       
   261 
       
   262 
       
   263 .. function:: unified_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
       
   264 
       
   265    Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
       
   266    generating the delta lines) in unified diff format.
       
   267 
       
   268    Unified diffs are a compact way of showing just the lines that have changed plus
       
   269    a few lines of context.  The changes are shown in a inline style (instead of
       
   270    separate before/after blocks).  The number of context lines is set by *n* which
       
   271    defaults to three.
       
   272 
       
   273    By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
       
   274    created with a trailing newline.  This is helpful so that inputs created from
       
   275    :func:`file.readlines` result in diffs that are suitable for use with
       
   276    :func:`file.writelines` since both the inputs and outputs have trailing
       
   277    newlines.
       
   278 
       
   279    For inputs that do not have trailing newlines, set the *lineterm* argument to
       
   280    ``""`` so that the output will be uniformly newline free.
       
   281 
       
   282    The context diff format normally has a header for filenames and modification
       
   283    times.  Any or all of these may be specified using strings for *fromfile*,
       
   284    *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally
       
   285    expressed in the format returned by :func:`time.ctime`.  If not specified, the
       
   286    strings default to blanks.
       
   287 
       
   288       >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
       
   289       >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
       
   290       >>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
       
   291       ...     sys.stdout.write(line)   # doctest: +NORMALIZE_WHITESPACE
       
   292       --- before.py
       
   293       +++ after.py
       
   294       @@ -1,4 +1,4 @@
       
   295       -bacon
       
   296       -eggs
       
   297       -ham
       
   298       +python
       
   299       +eggy
       
   300       +hamster
       
   301        guido
       
   302 
       
   303    See :ref:`difflib-interface` for a more detailed example.
       
   304 
       
   305    .. versionadded:: 2.3
       
   306 
       
   307 
       
   308 .. function:: IS_LINE_JUNK(line)
       
   309 
       
   310    Return true for ignorable lines.  The line *line* is ignorable if *line* is
       
   311    blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
       
   312    default for parameter *linejunk* in :func:`ndiff` before Python 2.3.
       
   313 
       
   314 
       
   315 .. function:: IS_CHARACTER_JUNK(ch)
       
   316 
       
   317    Return true for ignorable characters.  The character *ch* is ignorable if *ch*
       
   318    is a space or tab, otherwise it is not ignorable.  Used as a default for
       
   319    parameter *charjunk* in :func:`ndiff`.
       
   320 
       
   321 
       
   322 .. seealso::
       
   323 
       
   324    `Pattern Matching: The Gestalt Approach <http://www.ddj.com/184407970?pgno=5>`_
       
   325       Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
       
   326       was published in `Dr. Dobb's Journal <http://www.ddj.com/>`_ in July, 1988.
       
   327 
       
   328 
       
   329 .. _sequence-matcher:
       
   330 
       
   331 SequenceMatcher Objects
       
   332 -----------------------
       
   333 
       
   334 The :class:`SequenceMatcher` class has this constructor:
       
   335 
       
   336 
       
   337 .. class:: SequenceMatcher([isjunk[, a[, b]]])
       
   338 
       
   339    Optional argument *isjunk* must be ``None`` (the default) or a one-argument
       
   340    function that takes a sequence element and returns true if and only if the
       
   341    element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
       
   342    equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
       
   343    For example, pass::
       
   344 
       
   345       lambda x: x in " \t"
       
   346 
       
   347    if you're comparing lines as sequences of characters, and don't want to synch up
       
   348    on blanks or hard tabs.
       
   349 
       
   350    The optional arguments *a* and *b* are sequences to be compared; both default to
       
   351    empty strings.  The elements of both sequences must be :term:`hashable`.
       
   352 
       
   353    :class:`SequenceMatcher` objects have the following methods:
       
   354 
       
   355 
       
   356    .. method:: set_seqs(a, b)
       
   357 
       
   358       Set the two sequences to be compared.
       
   359 
       
   360    :class:`SequenceMatcher` computes and caches detailed information about the
       
   361    second sequence, so if you want to compare one sequence against many
       
   362    sequences, use :meth:`set_seq2` to set the commonly used sequence once and
       
   363    call :meth:`set_seq1` repeatedly, once for each of the other sequences.
       
   364 
       
   365 
       
   366    .. method:: set_seq1(a)
       
   367 
       
   368       Set the first sequence to be compared.  The second sequence to be compared
       
   369       is not changed.
       
   370 
       
   371 
       
   372    .. method:: set_seq2(b)
       
   373 
       
   374       Set the second sequence to be compared.  The first sequence to be compared
       
   375       is not changed.
       
   376 
       
   377 
       
   378    .. method:: find_longest_match(alo, ahi, blo, bhi)
       
   379 
       
   380       Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
       
   381 
       
   382       If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
       
   383       ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
       
   384       <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
       
   385       k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
       
   386       <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
       
   387       all maximal matching blocks, return one that starts earliest in *a*, and
       
   388       of all those maximal matching blocks that start earliest in *a*, return
       
   389       the one that starts earliest in *b*.
       
   390 
       
   391          >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
       
   392          >>> s.find_longest_match(0, 5, 0, 9)
       
   393          Match(a=0, b=4, size=5)
       
   394 
       
   395       If *isjunk* was provided, first the longest matching block is determined
       
   396       as above, but with the additional restriction that no junk element appears
       
   397       in the block.  Then that block is extended as far as possible by matching
       
   398       (only) junk elements on both sides. So the resulting block never matches
       
   399       on junk except as identical junk happens to be adjacent to an interesting
       
   400       match.
       
   401 
       
   402       Here's the same example as before, but considering blanks to be junk. That
       
   403       prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
       
   404       second sequence directly.  Instead only the ``'abcd'`` can match, and
       
   405       matches the leftmost ``'abcd'`` in the second sequence:
       
   406 
       
   407          >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
       
   408          >>> s.find_longest_match(0, 5, 0, 9)
       
   409          Match(a=1, b=0, size=4)
       
   410 
       
   411       If no blocks match, this returns ``(alo, blo, 0)``.
       
   412 
       
   413       .. versionchanged:: 2.6
       
   414          This method returns a :term:`named tuple` ``Match(a, b, size)``.
       
   415 
       
   416 
       
   417    .. method:: get_matching_blocks()
       
   418 
       
   419       Return list of triples describing matching subsequences. Each triple is of
       
   420       the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``.  The
       
   421       triples are monotonically increasing in *i* and *j*.
       
   422 
       
   423       The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
       
   424       is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
       
   425       are adjacent triples in the list, and the second is not the last triple in
       
   426       the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
       
   427       triples always describe non-adjacent equal blocks.
       
   428 
       
   429       .. XXX Explain why a dummy is used!
       
   430 
       
   431       .. versionchanged:: 2.5 
       
   432          The guarantee that adjacent triples always describe non-adjacent blocks
       
   433          was implemented.
       
   434 
       
   435       .. doctest::
       
   436 
       
   437          >>> s = SequenceMatcher(None, "abxcd", "abcd")
       
   438          >>> s.get_matching_blocks()
       
   439          [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
       
   440 
       
   441 
       
   442    .. method:: get_opcodes()
       
   443 
       
   444       Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
       
   445       of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
       
   446       0``, and remaining tuples have *i1* equal to the *i2* from the preceding
       
   447       tuple, and, likewise, *j1* equal to the previous *j2*.
       
   448 
       
   449       The *tag* values are strings, with these meanings:
       
   450 
       
   451       +---------------+---------------------------------------------+
       
   452       | Value         | Meaning                                     |
       
   453       +===============+=============================================+
       
   454       | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
       
   455       |               | ``b[j1:j2]``.                               |
       
   456       +---------------+---------------------------------------------+
       
   457       | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
       
   458       |               | ``j1 == j2`` in this case.                  |
       
   459       +---------------+---------------------------------------------+
       
   460       | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
       
   461       |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
       
   462       |               | this case.                                  |
       
   463       +---------------+---------------------------------------------+
       
   464       | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
       
   465       |               | are equal).                                 |
       
   466       +---------------+---------------------------------------------+
       
   467 
       
   468       For example:
       
   469 
       
   470          >>> a = "qabxcd"
       
   471          >>> b = "abycdf"
       
   472          >>> s = SequenceMatcher(None, a, b)
       
   473          >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
       
   474          ...    print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
       
   475          ...           (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
       
   476           delete a[0:1] (q) b[0:0] ()
       
   477            equal a[1:3] (ab) b[0:2] (ab)
       
   478          replace a[3:4] (x) b[2:3] (y)
       
   479            equal a[4:6] (cd) b[3:5] (cd)
       
   480           insert a[6:6] () b[5:6] (f)
       
   481 
       
   482 
       
   483    .. method:: get_grouped_opcodes([n])
       
   484 
       
   485       Return a :term:`generator` of groups with up to *n* lines of context.
       
   486 
       
   487       Starting with the groups returned by :meth:`get_opcodes`, this method
       
   488       splits out smaller change clusters and eliminates intervening ranges which
       
   489       have no changes.
       
   490 
       
   491       The groups are returned in the same format as :meth:`get_opcodes`.
       
   492 
       
   493       .. versionadded:: 2.3
       
   494 
       
   495 
       
   496    .. method:: ratio()
       
   497 
       
   498       Return a measure of the sequences' similarity as a float in the range [0,
       
   499       1].
       
   500 
       
   501       Where T is the total number of elements in both sequences, and M is the
       
   502       number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
       
   503       sequences are identical, and ``0.0`` if they have nothing in common.
       
   504 
       
   505       This is expensive to compute if :meth:`get_matching_blocks` or
       
   506       :meth:`get_opcodes` hasn't already been called, in which case you may want
       
   507       to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
       
   508       upper bound.
       
   509 
       
   510 
       
   511    .. method:: quick_ratio()
       
   512 
       
   513       Return an upper bound on :meth:`ratio` relatively quickly.
       
   514 
       
   515       This isn't defined beyond that it is an upper bound on :meth:`ratio`, and
       
   516       is faster to compute.
       
   517 
       
   518 
       
   519    .. method:: real_quick_ratio()
       
   520 
       
   521       Return an upper bound on :meth:`ratio` very quickly.
       
   522 
       
   523       This isn't defined beyond that it is an upper bound on :meth:`ratio`, and
       
   524       is faster to compute than either :meth:`ratio` or :meth:`quick_ratio`.
       
   525 
       
   526 The three methods that return the ratio of matching to total characters can give
       
   527 different results due to differing levels of approximation, although
       
   528 :meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
       
   529 :meth:`ratio`:
       
   530 
       
   531    >>> s = SequenceMatcher(None, "abcd", "bcde")
       
   532    >>> s.ratio()
       
   533    0.75
       
   534    >>> s.quick_ratio()
       
   535    0.75
       
   536    >>> s.real_quick_ratio()
       
   537    1.0
       
   538 
       
   539 
       
   540 .. _sequencematcher-examples:
       
   541 
       
   542 SequenceMatcher Examples
       
   543 ------------------------
       
   544 
       
   545 This example compares two strings, considering blanks to be "junk:"
       
   546 
       
   547    >>> s = SequenceMatcher(lambda x: x == " ",
       
   548    ...                     "private Thread currentThread;",
       
   549    ...                     "private volatile Thread currentThread;")
       
   550 
       
   551 :meth:`ratio` returns a float in [0, 1], measuring the similarity of the
       
   552 sequences.  As a rule of thumb, a :meth:`ratio` value over 0.6 means the
       
   553 sequences are close matches:
       
   554 
       
   555    >>> print round(s.ratio(), 3)
       
   556    0.866
       
   557 
       
   558 If you're only interested in where the sequences match,
       
   559 :meth:`get_matching_blocks` is handy:
       
   560 
       
   561    >>> for block in s.get_matching_blocks():
       
   562    ...     print "a[%d] and b[%d] match for %d elements" % block
       
   563    a[0] and b[0] match for 8 elements
       
   564    a[8] and b[17] match for 21 elements
       
   565    a[29] and b[38] match for 0 elements
       
   566 
       
   567 Note that the last tuple returned by :meth:`get_matching_blocks` is always a
       
   568 dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
       
   569 tuple element (number of elements matched) is ``0``.
       
   570 
       
   571 If you want to know how to change the first sequence into the second, use
       
   572 :meth:`get_opcodes`:
       
   573 
       
   574    >>> for opcode in s.get_opcodes():
       
   575    ...     print "%6s a[%d:%d] b[%d:%d]" % opcode
       
   576     equal a[0:8] b[0:8]
       
   577    insert a[8:8] b[8:17]
       
   578     equal a[8:29] b[17:38]
       
   579 
       
   580 See also the function :func:`get_close_matches` in this module, which shows how
       
   581 simple code building on :class:`SequenceMatcher` can be used to do useful work.
       
   582 
       
   583 
       
   584 .. _differ-objects:
       
   585 
       
   586 Differ Objects
       
   587 --------------
       
   588 
       
   589 Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
       
   590 diffs. To the contrary, minimal diffs are often counter-intuitive, because they
       
   591 synch up anywhere possible, sometimes accidental matches 100 pages apart.
       
   592 Restricting synch points to contiguous matches preserves some notion of
       
   593 locality, at the occasional cost of producing a longer diff.
       
   594 
       
   595 The :class:`Differ` class has this constructor:
       
   596 
       
   597 
       
   598 .. class:: Differ([linejunk[, charjunk]])
       
   599 
       
   600    Optional keyword parameters *linejunk* and *charjunk* are for filter functions
       
   601    (or ``None``):
       
   602 
       
   603    *linejunk*: A function that accepts a single string argument, and returns true
       
   604    if the string is junk.  The default is ``None``, meaning that no line is
       
   605    considered junk.
       
   606 
       
   607    *charjunk*: A function that accepts a single character argument (a string of
       
   608    length 1), and returns true if the character is junk. The default is ``None``,
       
   609    meaning that no character is considered junk.
       
   610 
       
   611    :class:`Differ` objects are used (deltas generated) via a single method:
       
   612 
       
   613 
       
   614    .. method:: Differ.compare(a, b)
       
   615 
       
   616       Compare two sequences of lines, and generate the delta (a sequence of lines).
       
   617 
       
   618       Each sequence must contain individual single-line strings ending with newlines.
       
   619       Such sequences can be obtained from the :meth:`readlines` method of file-like
       
   620       objects.  The delta generated also consists of newline-terminated strings, ready
       
   621       to be printed as-is via the :meth:`writelines` method of a file-like object.
       
   622 
       
   623 
       
   624 .. _differ-examples:
       
   625 
       
   626 Differ Example
       
   627 --------------
       
   628 
       
   629 This example compares two texts. First we set up the texts, sequences of
       
   630 individual single-line strings ending with newlines (such sequences can also be
       
   631 obtained from the :meth:`readlines` method of file-like objects):
       
   632 
       
   633    >>> text1 = '''  1. Beautiful is better than ugly.
       
   634    ...   2. Explicit is better than implicit.
       
   635    ...   3. Simple is better than complex.
       
   636    ...   4. Complex is better than complicated.
       
   637    ... '''.splitlines(1)
       
   638    >>> len(text1)
       
   639    4
       
   640    >>> text1[0][-1]
       
   641    '\n'
       
   642    >>> text2 = '''  1. Beautiful is better than ugly.
       
   643    ...   3.   Simple is better than complex.
       
   644    ...   4. Complicated is better than complex.
       
   645    ...   5. Flat is better than nested.
       
   646    ... '''.splitlines(1)
       
   647 
       
   648 Next we instantiate a Differ object:
       
   649 
       
   650    >>> d = Differ()
       
   651 
       
   652 Note that when instantiating a :class:`Differ` object we may pass functions to
       
   653 filter out line and character "junk."  See the :meth:`Differ` constructor for
       
   654 details.
       
   655 
       
   656 Finally, we compare the two:
       
   657 
       
   658    >>> result = list(d.compare(text1, text2))
       
   659 
       
   660 ``result`` is a list of strings, so let's pretty-print it:
       
   661 
       
   662    >>> from pprint import pprint
       
   663    >>> pprint(result)
       
   664    ['    1. Beautiful is better than ugly.\n',
       
   665     '-   2. Explicit is better than implicit.\n',
       
   666     '-   3. Simple is better than complex.\n',
       
   667     '+   3.   Simple is better than complex.\n',
       
   668     '?     ++\n',
       
   669     '-   4. Complex is better than complicated.\n',
       
   670     '?            ^                     ---- ^\n',
       
   671     '+   4. Complicated is better than complex.\n',
       
   672     '?           ++++ ^                      ^\n',
       
   673     '+   5. Flat is better than nested.\n']
       
   674 
       
   675 As a single multi-line string it looks like this:
       
   676 
       
   677    >>> import sys
       
   678    >>> sys.stdout.writelines(result)
       
   679        1. Beautiful is better than ugly.
       
   680    -   2. Explicit is better than implicit.
       
   681    -   3. Simple is better than complex.
       
   682    +   3.   Simple is better than complex.
       
   683    ?     ++
       
   684    -   4. Complex is better than complicated.
       
   685    ?            ^                     ---- ^
       
   686    +   4. Complicated is better than complex.
       
   687    ?           ++++ ^                      ^
       
   688    +   5. Flat is better than nested.
       
   689 
       
   690 
       
   691 .. _difflib-interface:
       
   692 
       
   693 A command-line interface to difflib
       
   694 -----------------------------------
       
   695 
       
   696 This example shows how to use difflib to create a ``diff``-like utility.
       
   697 It is also contained in the Python source distribution, as
       
   698 :file:`Tools/scripts/diff.py`.
       
   699 
       
   700 .. testcode::
       
   701 
       
   702    """ Command line interface to difflib.py providing diffs in four formats:
       
   703 
       
   704    * ndiff:    lists every line and highlights interline changes.
       
   705    * context:  highlights clusters of changes in a before/after format.
       
   706    * unified:  highlights clusters of changes in an inline format.
       
   707    * html:     generates side by side comparison with change highlights.
       
   708 
       
   709    """
       
   710 
       
   711    import sys, os, time, difflib, optparse
       
   712 
       
   713    def main():
       
   714         # Configure the option parser
       
   715        usage = "usage: %prog [options] fromfile tofile"
       
   716        parser = optparse.OptionParser(usage)
       
   717        parser.add_option("-c", action="store_true", default=False,
       
   718                          help='Produce a context format diff (default)')
       
   719        parser.add_option("-u", action="store_true", default=False,
       
   720                          help='Produce a unified format diff')
       
   721        hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
       
   722        parser.add_option("-m", action="store_true", default=False, help=hlp)
       
   723        parser.add_option("-n", action="store_true", default=False,
       
   724                          help='Produce a ndiff format diff')
       
   725        parser.add_option("-l", "--lines", type="int", default=3,
       
   726                          help='Set number of context lines (default 3)')
       
   727        (options, args) = parser.parse_args()
       
   728 
       
   729        if len(args) == 0:
       
   730            parser.print_help()
       
   731            sys.exit(1)
       
   732        if len(args) != 2:
       
   733            parser.error("need to specify both a fromfile and tofile")
       
   734 
       
   735        n = options.lines
       
   736        fromfile, tofile = args # as specified in the usage string
       
   737 
       
   738        # we're passing these as arguments to the diff function
       
   739        fromdate = time.ctime(os.stat(fromfile).st_mtime)
       
   740        todate = time.ctime(os.stat(tofile).st_mtime)
       
   741        fromlines = open(fromfile, 'U').readlines()
       
   742        tolines = open(tofile, 'U').readlines()
       
   743 
       
   744        if options.u:
       
   745            diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
       
   746                                        fromdate, todate, n=n)
       
   747        elif options.n:
       
   748            diff = difflib.ndiff(fromlines, tolines)
       
   749        elif options.m:
       
   750            diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
       
   751                                                tofile, context=options.c,
       
   752                                                numlines=n)
       
   753        else:
       
   754            diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
       
   755                                        fromdate, todate, n=n)
       
   756 
       
   757        # we're using writelines because diff is a generator
       
   758        sys.stdout.writelines(diff)
       
   759 
       
   760    if __name__ == '__main__':
       
   761        main()