|
1 |
|
2 :mod:`tokenize` --- Tokenizer for Python source |
|
3 =============================================== |
|
4 |
|
5 .. module:: tokenize |
|
6 :synopsis: Lexical scanner for Python source code. |
|
7 .. moduleauthor:: Ka Ping Yee |
|
8 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
|
9 |
|
10 |
|
11 The :mod:`tokenize` module provides a lexical scanner for Python source code, |
|
12 implemented in Python. The scanner in this module returns comments as tokens as |
|
13 well, making it useful for implementing "pretty-printers," including colorizers |
|
14 for on-screen displays. |
|
15 |
|
16 The primary entry point is a :term:`generator`: |
|
17 |
|
18 .. function:: generate_tokens(readline) |
|
19 |
|
20 The :func:`generate_tokens` generator requires one argument, *readline*, |
|
21 which must be a callable object which provides the same interface as the |
|
22 :meth:`readline` method of built-in file objects (see section |
|
23 :ref:`bltin-file-objects`). Each call to the function should return one line |
|
24 of input as a string. |
|
25 |
|
26 The generator produces 5-tuples with these members: the token type; the token |
|
27 string; a 2-tuple ``(srow, scol)`` of ints specifying the row and column |
|
28 where the token begins in the source; a 2-tuple ``(erow, ecol)`` of ints |
|
29 specifying the row and column where the token ends in the source; and the |
|
30 line on which the token was found. The line passed (the last tuple item) is |
|
31 the *logical* line; continuation lines are included. |
|
32 |
|
33 .. versionadded:: 2.2 |
|
34 |
|
35 An older entry point is retained for backward compatibility: |
|
36 |
|
37 |
|
38 .. function:: tokenize(readline[, tokeneater]) |
|
39 |
|
40 The :func:`tokenize` function accepts two parameters: one representing the input |
|
41 stream, and one providing an output mechanism for :func:`tokenize`. |
|
42 |
|
43 The first parameter, *readline*, must be a callable object which provides the |
|
44 same interface as the :meth:`readline` method of built-in file objects (see |
|
45 section :ref:`bltin-file-objects`). Each call to the function should return one |
|
46 line of input as a string. Alternately, *readline* may be a callable object that |
|
47 signals completion by raising :exc:`StopIteration`. |
|
48 |
|
49 .. versionchanged:: 2.5 |
|
50 Added :exc:`StopIteration` support. |
|
51 |
|
52 The second parameter, *tokeneater*, must also be a callable object. It is |
|
53 called once for each token, with five arguments, corresponding to the tuples |
|
54 generated by :func:`generate_tokens`. |
|
55 |
|
56 All constants from the :mod:`token` module are also exported from |
|
57 :mod:`tokenize`, as are two additional token type values that might be passed to |
|
58 the *tokeneater* function by :func:`tokenize`: |
|
59 |
|
60 |
|
61 .. data:: COMMENT |
|
62 |
|
63 Token value used to indicate a comment. |
|
64 |
|
65 |
|
66 .. data:: NL |
|
67 |
|
68 Token value used to indicate a non-terminating newline. The NEWLINE token |
|
69 indicates the end of a logical line of Python code; NL tokens are generated when |
|
70 a logical line of code is continued over multiple physical lines. |
|
71 |
|
72 Another function is provided to reverse the tokenization process. This is useful |
|
73 for creating tools that tokenize a script, modify the token stream, and write |
|
74 back the modified script. |
|
75 |
|
76 |
|
77 .. function:: untokenize(iterable) |
|
78 |
|
79 Converts tokens back into Python source code. The *iterable* must return |
|
80 sequences with at least two elements, the token type and the token string. Any |
|
81 additional sequence elements are ignored. |
|
82 |
|
83 The reconstructed script is returned as a single string. The result is |
|
84 guaranteed to tokenize back to match the input so that the conversion is |
|
85 lossless and round-trips are assured. The guarantee applies only to the token |
|
86 type and token string as the spacing between tokens (column positions) may |
|
87 change. |
|
88 |
|
89 .. versionadded:: 2.5 |
|
90 |
|
91 Example of a script re-writer that transforms float literals into Decimal |
|
92 objects:: |
|
93 |
|
94 def decistmt(s): |
|
95 """Substitute Decimals for floats in a string of statements. |
|
96 |
|
97 >>> from decimal import Decimal |
|
98 >>> s = 'print +21.3e-5*-.1234/81.7' |
|
99 >>> decistmt(s) |
|
100 "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')" |
|
101 |
|
102 >>> exec(s) |
|
103 -3.21716034272e-007 |
|
104 >>> exec(decistmt(s)) |
|
105 -3.217160342717258261933904529E-7 |
|
106 |
|
107 """ |
|
108 result = [] |
|
109 g = generate_tokens(StringIO(s).readline) # tokenize the string |
|
110 for toknum, tokval, _, _, _ in g: |
|
111 if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens |
|
112 result.extend([ |
|
113 (NAME, 'Decimal'), |
|
114 (OP, '('), |
|
115 (STRING, repr(tokval)), |
|
116 (OP, ')') |
|
117 ]) |
|
118 else: |
|
119 result.append((toknum, tokval)) |
|
120 return untokenize(result) |
|
121 |