|
1 |
|
2 :mod:`shlex` --- Simple lexical analysis |
|
3 ======================================== |
|
4 |
|
5 .. module:: shlex |
|
6 :synopsis: Simple lexical analysis for Unix shell-like languages. |
|
7 .. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com> |
|
8 .. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> |
|
9 .. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> |
|
10 .. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> |
|
11 |
|
12 |
|
13 .. versionadded:: 1.5.2 |
|
14 |
|
15 The :class:`shlex` class makes it easy to write lexical analyzers for simple |
|
16 syntaxes resembling that of the Unix shell. This will often be useful for |
|
17 writing minilanguages, (for example, in run control files for Python |
|
18 applications) or for parsing quoted strings. |
|
19 |
|
20 .. note:: |
|
21 |
|
22 The :mod:`shlex` module currently does not support Unicode input. |
|
23 |
|
24 The :mod:`shlex` module defines the following functions: |
|
25 |
|
26 |
|
27 .. function:: split(s[, comments[, posix]]) |
|
28 |
|
29 Split the string *s* using shell-like syntax. If *comments* is :const:`False` |
|
30 (the default), the parsing of comments in the given string will be disabled |
|
31 (setting the :attr:`commenters` member of the :class:`shlex` instance to the |
|
32 empty string). This function operates in POSIX mode by default, but uses |
|
33 non-POSIX mode if the *posix* argument is false. |
|
34 |
|
35 .. versionadded:: 2.3 |
|
36 |
|
37 .. versionchanged:: 2.6 |
|
38 Added the *posix* parameter. |
|
39 |
|
40 .. note:: |
|
41 |
|
42 Since the :func:`split` function instantiates a :class:`shlex` instance, passing |
|
43 ``None`` for *s* will read the string to split from standard input. |
|
44 |
|
45 The :mod:`shlex` module defines the following class: |
|
46 |
|
47 |
|
48 .. class:: shlex([instream[, infile[, posix]]]) |
|
49 |
|
50 A :class:`shlex` instance or subclass instance is a lexical analyzer object. |
|
51 The initialization argument, if present, specifies where to read characters |
|
52 from. It must be a file-/stream-like object with :meth:`read` and |
|
53 :meth:`readline` methods, or a string (strings are accepted since Python 2.3). |
|
54 If no argument is given, input will be taken from ``sys.stdin``. The second |
|
55 optional argument is a filename string, which sets the initial value of the |
|
56 :attr:`infile` member. If the *instream* argument is omitted or equal to |
|
57 ``sys.stdin``, this second argument defaults to "stdin". The *posix* argument |
|
58 was introduced in Python 2.3, and defines the operational mode. When *posix* is |
|
59 not true (default), the :class:`shlex` instance will operate in compatibility |
|
60 mode. When operating in POSIX mode, :class:`shlex` will try to be as close as |
|
61 possible to the POSIX shell parsing rules. |
|
62 |
|
63 |
|
64 .. seealso:: |
|
65 |
|
66 Module :mod:`ConfigParser` |
|
67 Parser for configuration files similar to the Windows :file:`.ini` files. |
|
68 |
|
69 |
|
70 .. _shlex-objects: |
|
71 |
|
72 shlex Objects |
|
73 ------------- |
|
74 |
|
75 A :class:`shlex` instance has the following methods: |
|
76 |
|
77 |
|
78 .. method:: shlex.get_token() |
|
79 |
|
80 Return a token. If tokens have been stacked using :meth:`push_token`, pop a |
|
81 token off the stack. Otherwise, read one from the input stream. If reading |
|
82 encounters an immediate end-of-file, :attr:`self.eof` is returned (the empty |
|
83 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode). |
|
84 |
|
85 |
|
86 .. method:: shlex.push_token(str) |
|
87 |
|
88 Push the argument onto the token stack. |
|
89 |
|
90 |
|
91 .. method:: shlex.read_token() |
|
92 |
|
93 Read a raw token. Ignore the pushback stack, and do not interpret source |
|
94 requests. (This is not ordinarily a useful entry point, and is documented here |
|
95 only for the sake of completeness.) |
|
96 |
|
97 |
|
98 .. method:: shlex.sourcehook(filename) |
|
99 |
|
100 When :class:`shlex` detects a source request (see :attr:`source` below) this |
|
101 method is given the following token as argument, and expected to return a tuple |
|
102 consisting of a filename and an open file-like object. |
|
103 |
|
104 Normally, this method first strips any quotes off the argument. If the result |
|
105 is an absolute pathname, or there was no previous source request in effect, or |
|
106 the previous source was a stream (such as ``sys.stdin``), the result is left |
|
107 alone. Otherwise, if the result is a relative pathname, the directory part of |
|
108 the name of the file immediately before it on the source inclusion stack is |
|
109 prepended (this behavior is like the way the C preprocessor handles ``#include |
|
110 "file.h"``). |
|
111 |
|
112 The result of the manipulations is treated as a filename, and returned as the |
|
113 first component of the tuple, with :func:`open` called on it to yield the second |
|
114 component. (Note: this is the reverse of the order of arguments in instance |
|
115 initialization!) |
|
116 |
|
117 This hook is exposed so that you can use it to implement directory search paths, |
|
118 addition of file extensions, and other namespace hacks. There is no |
|
119 corresponding 'close' hook, but a shlex instance will call the :meth:`close` |
|
120 method of the sourced input stream when it returns EOF. |
|
121 |
|
122 For more explicit control of source stacking, use the :meth:`push_source` and |
|
123 :meth:`pop_source` methods. |
|
124 |
|
125 |
|
126 .. method:: shlex.push_source(stream[, filename]) |
|
127 |
|
128 Push an input source stream onto the input stack. If the filename argument is |
|
129 specified it will later be available for use in error messages. This is the |
|
130 same method used internally by the :meth:`sourcehook` method. |
|
131 |
|
132 .. versionadded:: 2.1 |
|
133 |
|
134 |
|
135 .. method:: shlex.pop_source() |
|
136 |
|
137 Pop the last-pushed input source from the input stack. This is the same method |
|
138 used internally when the lexer reaches EOF on a stacked input stream. |
|
139 |
|
140 .. versionadded:: 2.1 |
|
141 |
|
142 |
|
143 .. method:: shlex.error_leader([file[, line]]) |
|
144 |
|
145 This method generates an error message leader in the format of a Unix C compiler |
|
146 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced |
|
147 with the name of the current source file and the ``%d`` with the current input |
|
148 line number (the optional arguments can be used to override these). |
|
149 |
|
150 This convenience is provided to encourage :mod:`shlex` users to generate error |
|
151 messages in the standard, parseable format understood by Emacs and other Unix |
|
152 tools. |
|
153 |
|
154 Instances of :class:`shlex` subclasses have some public instance variables which |
|
155 either control lexical analysis or can be used for debugging: |
|
156 |
|
157 |
|
158 .. attribute:: shlex.commenters |
|
159 |
|
160 The string of characters that are recognized as comment beginners. All |
|
161 characters from the comment beginner to end of line are ignored. Includes just |
|
162 ``'#'`` by default. |
|
163 |
|
164 |
|
165 .. attribute:: shlex.wordchars |
|
166 |
|
167 The string of characters that will accumulate into multi-character tokens. By |
|
168 default, includes all ASCII alphanumerics and underscore. |
|
169 |
|
170 |
|
171 .. attribute:: shlex.whitespace |
|
172 |
|
173 Characters that will be considered whitespace and skipped. Whitespace bounds |
|
174 tokens. By default, includes space, tab, linefeed and carriage-return. |
|
175 |
|
176 |
|
177 .. attribute:: shlex.escape |
|
178 |
|
179 Characters that will be considered as escape. This will be only used in POSIX |
|
180 mode, and includes just ``'\'`` by default. |
|
181 |
|
182 .. versionadded:: 2.3 |
|
183 |
|
184 |
|
185 .. attribute:: shlex.quotes |
|
186 |
|
187 Characters that will be considered string quotes. The token accumulates until |
|
188 the same quote is encountered again (thus, different quote types protect each |
|
189 other as in the shell.) By default, includes ASCII single and double quotes. |
|
190 |
|
191 |
|
192 .. attribute:: shlex.escapedquotes |
|
193 |
|
194 Characters in :attr:`quotes` that will interpret escape characters defined in |
|
195 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by |
|
196 default. |
|
197 |
|
198 .. versionadded:: 2.3 |
|
199 |
|
200 |
|
201 .. attribute:: shlex.whitespace_split |
|
202 |
|
203 If ``True``, tokens will only be split in whitespaces. This is useful, for |
|
204 example, for parsing command lines with :class:`shlex`, getting tokens in a |
|
205 similar way to shell arguments. |
|
206 |
|
207 .. versionadded:: 2.3 |
|
208 |
|
209 |
|
210 .. attribute:: shlex.infile |
|
211 |
|
212 The name of the current input file, as initially set at class instantiation time |
|
213 or stacked by later source requests. It may be useful to examine this when |
|
214 constructing error messages. |
|
215 |
|
216 |
|
217 .. attribute:: shlex.instream |
|
218 |
|
219 The input stream from which this :class:`shlex` instance is reading characters. |
|
220 |
|
221 |
|
222 .. attribute:: shlex.source |
|
223 |
|
224 This member is ``None`` by default. If you assign a string to it, that string |
|
225 will be recognized as a lexical-level inclusion request similar to the |
|
226 ``source`` keyword in various shells. That is, the immediately following token |
|
227 will opened as a filename and input taken from that stream until EOF, at which |
|
228 point the :meth:`close` method of that stream will be called and the input |
|
229 source will again become the original input stream. Source requests may be |
|
230 stacked any number of levels deep. |
|
231 |
|
232 |
|
233 .. attribute:: shlex.debug |
|
234 |
|
235 If this member is numeric and ``1`` or more, a :class:`shlex` instance will |
|
236 print verbose progress output on its behavior. If you need to use this, you can |
|
237 read the module source code to learn the details. |
|
238 |
|
239 |
|
240 .. attribute:: shlex.lineno |
|
241 |
|
242 Source line number (count of newlines seen so far plus one). |
|
243 |
|
244 |
|
245 .. attribute:: shlex.token |
|
246 |
|
247 The token buffer. It may be useful to examine this when catching exceptions. |
|
248 |
|
249 |
|
250 .. attribute:: shlex.eof |
|
251 |
|
252 Token used to determine end of file. This will be set to the empty string |
|
253 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode. |
|
254 |
|
255 .. versionadded:: 2.3 |
|
256 |
|
257 |
|
258 .. _shlex-parsing-rules: |
|
259 |
|
260 Parsing Rules |
|
261 ------------- |
|
262 |
|
263 When operating in non-POSIX mode, :class:`shlex` will try to obey to the |
|
264 following rules. |
|
265 |
|
266 * Quote characters are not recognized within words (``Do"Not"Separate`` is |
|
267 parsed as the single word ``Do"Not"Separate``); |
|
268 |
|
269 * Escape characters are not recognized; |
|
270 |
|
271 * Enclosing characters in quotes preserve the literal value of all characters |
|
272 within the quotes; |
|
273 |
|
274 * Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and |
|
275 ``Separate``); |
|
276 |
|
277 * If :attr:`whitespace_split` is ``False``, any character not declared to be a |
|
278 word character, whitespace, or a quote will be returned as a single-character |
|
279 token. If it is ``True``, :class:`shlex` will only split words in whitespaces; |
|
280 |
|
281 * EOF is signaled with an empty string (``''``); |
|
282 |
|
283 * It's not possible to parse empty strings, even if quoted. |
|
284 |
|
285 When operating in POSIX mode, :class:`shlex` will try to obey to the following |
|
286 parsing rules. |
|
287 |
|
288 * Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is |
|
289 parsed as the single word ``DoNotSeparate``); |
|
290 |
|
291 * Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the |
|
292 next character that follows; |
|
293 |
|
294 * Enclosing characters in quotes which are not part of :attr:`escapedquotes` |
|
295 (e.g. ``"'"``) preserve the literal value of all characters within the quotes; |
|
296 |
|
297 * Enclosing characters in quotes which are part of :attr:`escapedquotes` (e.g. |
|
298 ``'"'``) preserves the literal value of all characters within the quotes, with |
|
299 the exception of the characters mentioned in :attr:`escape`. The escape |
|
300 characters retain its special meaning only when followed by the quote in use, or |
|
301 the escape character itself. Otherwise the escape character will be considered a |
|
302 normal character. |
|
303 |
|
304 * EOF is signaled with a :const:`None` value; |
|
305 |
|
306 * Quoted empty strings (``''``) are allowed; |
|
307 |