|
1 |
|
2 :mod:`cgi` --- Common Gateway Interface support. |
|
3 ================================================ |
|
4 |
|
5 .. module:: cgi |
|
6 :synopsis: Helpers for running Python scripts via the Common Gateway Interface. |
|
7 |
|
8 |
|
9 .. index:: |
|
10 pair: WWW; server |
|
11 pair: CGI; protocol |
|
12 pair: HTTP; protocol |
|
13 pair: MIME; headers |
|
14 single: URL |
|
15 single: Common Gateway Interface |
|
16 |
|
17 Support module for Common Gateway Interface (CGI) scripts. |
|
18 |
|
19 This module defines a number of utilities for use by CGI scripts written in |
|
20 Python. |
|
21 |
|
22 |
|
23 Introduction |
|
24 ------------ |
|
25 |
|
26 .. _cgi-intro: |
|
27 |
|
28 A CGI script is invoked by an HTTP server, usually to process user input |
|
29 submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element. |
|
30 |
|
31 Most often, CGI scripts live in the server's special :file:`cgi-bin` directory. |
|
32 The HTTP server places all sorts of information about the request (such as the |
|
33 client's hostname, the requested URL, the query string, and lots of other |
|
34 goodies) in the script's shell environment, executes the script, and sends the |
|
35 script's output back to the client. |
|
36 |
|
37 The script's input is connected to the client too, and sometimes the form data |
|
38 is read this way; at other times the form data is passed via the "query string" |
|
39 part of the URL. This module is intended to take care of the different cases |
|
40 and provide a simpler interface to the Python script. It also provides a number |
|
41 of utilities that help in debugging scripts, and the latest addition is support |
|
42 for file uploads from a form (if your browser supports it). |
|
43 |
|
44 The output of a CGI script should consist of two sections, separated by a blank |
|
45 line. The first section contains a number of headers, telling the client what |
|
46 kind of data is following. Python code to generate a minimal header section |
|
47 looks like this:: |
|
48 |
|
49 print "Content-Type: text/html" # HTML is following |
|
50 print # blank line, end of headers |
|
51 |
|
52 The second section is usually HTML, which allows the client software to display |
|
53 nicely formatted text with header, in-line images, etc. Here's Python code that |
|
54 prints a simple piece of HTML:: |
|
55 |
|
56 print "<TITLE>CGI script output</TITLE>" |
|
57 print "<H1>This is my first CGI script</H1>" |
|
58 print "Hello, world!" |
|
59 |
|
60 |
|
61 .. _using-the-cgi-module: |
|
62 |
|
63 Using the cgi module |
|
64 -------------------- |
|
65 |
|
66 Begin by writing ``import cgi``. Do not use ``from cgi import *`` --- the |
|
67 module defines all sorts of names for its own use or for backward compatibility |
|
68 that you don't want in your namespace. |
|
69 |
|
70 When you write a new script, consider adding the line:: |
|
71 |
|
72 import cgitb; cgitb.enable() |
|
73 |
|
74 This activates a special exception handler that will display detailed reports in |
|
75 the Web browser if any errors occur. If you'd rather not show the guts of your |
|
76 program to users of your script, you can have the reports saved to files |
|
77 instead, with a line like this:: |
|
78 |
|
79 import cgitb; cgitb.enable(display=0, logdir="/tmp") |
|
80 |
|
81 It's very helpful to use this feature during script development. The reports |
|
82 produced by :mod:`cgitb` provide information that can save you a lot of time in |
|
83 tracking down bugs. You can always remove the ``cgitb`` line later when you |
|
84 have tested your script and are confident that it works correctly. |
|
85 |
|
86 To get at submitted form data, it's best to use the :class:`FieldStorage` class. |
|
87 The other classes defined in this module are provided mostly for backward |
|
88 compatibility. Instantiate it exactly once, without arguments. This reads the |
|
89 form contents from standard input or the environment (depending on the value of |
|
90 various environment variables set according to the CGI standard). Since it may |
|
91 consume standard input, it should be instantiated only once. |
|
92 |
|
93 The :class:`FieldStorage` instance can be indexed like a Python dictionary, and |
|
94 also supports the standard dictionary methods :meth:`has_key` and :meth:`keys`. |
|
95 The built-in :func:`len` is also supported. Form fields containing empty |
|
96 strings are ignored and do not appear in the dictionary; to keep such values, |
|
97 provide a true value for the optional *keep_blank_values* keyword parameter when |
|
98 creating the :class:`FieldStorage` instance. |
|
99 |
|
100 For instance, the following code (which assumes that the |
|
101 :mailheader:`Content-Type` header and blank line have already been printed) |
|
102 checks that the fields ``name`` and ``addr`` are both set to a non-empty |
|
103 string:: |
|
104 |
|
105 form = cgi.FieldStorage() |
|
106 if not (form.has_key("name") and form.has_key("addr")): |
|
107 print "<H1>Error</H1>" |
|
108 print "Please fill in the name and addr fields." |
|
109 return |
|
110 print "<p>name:", form["name"].value |
|
111 print "<p>addr:", form["addr"].value |
|
112 ...further form processing here... |
|
113 |
|
114 Here the fields, accessed through ``form[key]``, are themselves instances of |
|
115 :class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form |
|
116 encoding). The :attr:`value` attribute of the instance yields the string value |
|
117 of the field. The :meth:`getvalue` method returns this string value directly; |
|
118 it also accepts an optional second argument as a default to return if the |
|
119 requested key is not present. |
|
120 |
|
121 If the submitted form data contains more than one field with the same name, the |
|
122 object retrieved by ``form[key]`` is not a :class:`FieldStorage` or |
|
123 :class:`MiniFieldStorage` instance but a list of such instances. Similarly, in |
|
124 this situation, ``form.getvalue(key)`` would return a list of strings. If you |
|
125 expect this possibility (when your HTML form contains multiple fields with the |
|
126 same name), use the :func:`getlist` function, which always returns a list of |
|
127 values (so that you do not need to special-case the single item case). For |
|
128 example, this code concatenates any number of username fields, separated by |
|
129 commas:: |
|
130 |
|
131 value = form.getlist("username") |
|
132 usernames = ",".join(value) |
|
133 |
|
134 If a field represents an uploaded file, accessing the value via the |
|
135 :attr:`value` attribute or the :func:`getvalue` method reads the entire file in |
|
136 memory as a string. This may not be what you want. You can test for an uploaded |
|
137 file by testing either the :attr:`filename` attribute or the :attr:`file` |
|
138 attribute. You can then read the data at leisure from the :attr:`file` |
|
139 attribute:: |
|
140 |
|
141 fileitem = form["userfile"] |
|
142 if fileitem.file: |
|
143 # It's an uploaded file; count lines |
|
144 linecount = 0 |
|
145 while 1: |
|
146 line = fileitem.file.readline() |
|
147 if not line: break |
|
148 linecount = linecount + 1 |
|
149 |
|
150 If an error is encountered when obtaining the contents of an uploaded file |
|
151 (for example, when the user interrupts the form submission by clicking on |
|
152 a Back or Cancel button) the :attr:`done` attribute of the object for the |
|
153 field will be set to the value -1. |
|
154 |
|
155 The file upload draft standard entertains the possibility of uploading multiple |
|
156 files from one field (using a recursive :mimetype:`multipart/\*` encoding). |
|
157 When this occurs, the item will be a dictionary-like :class:`FieldStorage` item. |
|
158 This can be determined by testing its :attr:`type` attribute, which should be |
|
159 :mimetype:`multipart/form-data` (or perhaps another MIME type matching |
|
160 :mimetype:`multipart/\*`). In this case, it can be iterated over recursively |
|
161 just like the top-level form object. |
|
162 |
|
163 When a form is submitted in the "old" format (as the query string or as a single |
|
164 data part of type :mimetype:`application/x-www-form-urlencoded`), the items will |
|
165 actually be instances of the class :class:`MiniFieldStorage`. In this case, the |
|
166 :attr:`list`, :attr:`file`, and :attr:`filename` attributes are always ``None``. |
|
167 |
|
168 A form submitted via POST that also has a query string will contain both |
|
169 :class:`FieldStorage` and :class:`MiniFieldStorage` items. |
|
170 |
|
171 Higher Level Interface |
|
172 ---------------------- |
|
173 |
|
174 .. versionadded:: 2.2 |
|
175 |
|
176 The previous section explains how to read CGI form data using the |
|
177 :class:`FieldStorage` class. This section describes a higher level interface |
|
178 which was added to this class to allow one to do it in a more readable and |
|
179 intuitive way. The interface doesn't make the techniques described in previous |
|
180 sections obsolete --- they are still useful to process file uploads efficiently, |
|
181 for example. |
|
182 |
|
183 .. XXX: Is this true ? |
|
184 |
|
185 The interface consists of two simple methods. Using the methods you can process |
|
186 form data in a generic way, without the need to worry whether only one or more |
|
187 values were posted under one name. |
|
188 |
|
189 In the previous section, you learned to write following code anytime you |
|
190 expected a user to post more than one value under one name:: |
|
191 |
|
192 item = form.getvalue("item") |
|
193 if isinstance(item, list): |
|
194 # The user is requesting more than one item. |
|
195 else: |
|
196 # The user is requesting only one item. |
|
197 |
|
198 This situation is common for example when a form contains a group of multiple |
|
199 checkboxes with the same name:: |
|
200 |
|
201 <input type="checkbox" name="item" value="1" /> |
|
202 <input type="checkbox" name="item" value="2" /> |
|
203 |
|
204 In most situations, however, there's only one form control with a particular |
|
205 name in a form and then you expect and need only one value associated with this |
|
206 name. So you write a script containing for example this code:: |
|
207 |
|
208 user = form.getvalue("user").upper() |
|
209 |
|
210 The problem with the code is that you should never expect that a client will |
|
211 provide valid input to your scripts. For example, if a curious user appends |
|
212 another ``user=foo`` pair to the query string, then the script would crash, |
|
213 because in this situation the ``getvalue("user")`` method call returns a list |
|
214 instead of a string. Calling the :meth:`toupper` method on a list is not valid |
|
215 (since lists do not have a method of this name) and results in an |
|
216 :exc:`AttributeError` exception. |
|
217 |
|
218 Therefore, the appropriate way to read form data values was to always use the |
|
219 code which checks whether the obtained value is a single value or a list of |
|
220 values. That's annoying and leads to less readable scripts. |
|
221 |
|
222 A more convenient approach is to use the methods :meth:`getfirst` and |
|
223 :meth:`getlist` provided by this higher level interface. |
|
224 |
|
225 |
|
226 .. method:: FieldStorage.getfirst(name[, default]) |
|
227 |
|
228 This method always returns only one value associated with form field *name*. |
|
229 The method returns only the first value in case that more values were posted |
|
230 under such name. Please note that the order in which the values are received |
|
231 may vary from browser to browser and should not be counted on. [#]_ If no such |
|
232 form field or value exists then the method returns the value specified by the |
|
233 optional parameter *default*. This parameter defaults to ``None`` if not |
|
234 specified. |
|
235 |
|
236 |
|
237 .. method:: FieldStorage.getlist(name) |
|
238 |
|
239 This method always returns a list of values associated with form field *name*. |
|
240 The method returns an empty list if no such form field or value exists for |
|
241 *name*. It returns a list consisting of one item if only one such value exists. |
|
242 |
|
243 Using these methods you can write nice compact code:: |
|
244 |
|
245 import cgi |
|
246 form = cgi.FieldStorage() |
|
247 user = form.getfirst("user", "").upper() # This way it's safe. |
|
248 for item in form.getlist("item"): |
|
249 do_something(item) |
|
250 |
|
251 |
|
252 Old classes |
|
253 ----------- |
|
254 |
|
255 .. deprecated:: 2.6 |
|
256 |
|
257 These classes, present in earlier versions of the :mod:`cgi` module, are |
|
258 still supported for backward compatibility. New applications should use the |
|
259 :class:`FieldStorage` class. |
|
260 |
|
261 :class:`SvFormContentDict` stores single value form content as dictionary; it |
|
262 assumes each field name occurs in the form only once. |
|
263 |
|
264 :class:`FormContentDict` stores multiple value form content as a dictionary (the |
|
265 form items are lists of values). Useful if your form contains multiple fields |
|
266 with the same name. |
|
267 |
|
268 Other classes (:class:`FormContent`, :class:`InterpFormContentDict`) are present |
|
269 for backwards compatibility with really old applications only. |
|
270 |
|
271 |
|
272 .. _functions-in-cgi-module: |
|
273 |
|
274 Functions |
|
275 --------- |
|
276 |
|
277 These are useful if you want more control, or if you want to employ some of the |
|
278 algorithms implemented in this module in other circumstances. |
|
279 |
|
280 |
|
281 .. function:: parse(fp[, keep_blank_values[, strict_parsing]]) |
|
282 |
|
283 Parse a query in the environment or from a file (the file defaults to |
|
284 ``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are |
|
285 passed to :func:`urlparse.parse_qs` unchanged. |
|
286 |
|
287 |
|
288 .. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]]) |
|
289 |
|
290 This function is deprecated in this module. Use :func:`urlparse.parse_qs` |
|
291 instead. It is maintained here only for backward compatiblity. |
|
292 |
|
293 .. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]]) |
|
294 |
|
295 This function is deprecated in this module. Use :func:`urlparse.parse_qsl` |
|
296 instead. It is maintained here only for backward compatiblity. |
|
297 |
|
298 .. function:: parse_multipart(fp, pdict) |
|
299 |
|
300 Parse input of type :mimetype:`multipart/form-data` (for file uploads). |
|
301 Arguments are *fp* for the input file and *pdict* for a dictionary containing |
|
302 other parameters in the :mailheader:`Content-Type` header. |
|
303 |
|
304 Returns a dictionary just like :func:`urlparse.parse_qs` keys are the field names, each |
|
305 value is a list of values for that field. This is easy to use but not much good |
|
306 if you are expecting megabytes to be uploaded --- in that case, use the |
|
307 :class:`FieldStorage` class instead which is much more flexible. |
|
308 |
|
309 Note that this does not parse nested multipart parts --- use |
|
310 :class:`FieldStorage` for that. |
|
311 |
|
312 |
|
313 .. function:: parse_header(string) |
|
314 |
|
315 Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a |
|
316 dictionary of parameters. |
|
317 |
|
318 |
|
319 .. function:: test() |
|
320 |
|
321 Robust test CGI script, usable as main program. Writes minimal HTTP headers and |
|
322 formats all information provided to the script in HTML form. |
|
323 |
|
324 |
|
325 .. function:: print_environ() |
|
326 |
|
327 Format the shell environment in HTML. |
|
328 |
|
329 |
|
330 .. function:: print_form(form) |
|
331 |
|
332 Format a form in HTML. |
|
333 |
|
334 |
|
335 .. function:: print_directory() |
|
336 |
|
337 Format the current directory in HTML. |
|
338 |
|
339 |
|
340 .. function:: print_environ_usage() |
|
341 |
|
342 Print a list of useful (used by CGI) environment variables in HTML. |
|
343 |
|
344 |
|
345 .. function:: escape(s[, quote]) |
|
346 |
|
347 Convert the characters ``'&'``, ``'<'`` and ``'>'`` in string *s* to HTML-safe |
|
348 sequences. Use this if you need to display text that might contain such |
|
349 characters in HTML. If the optional flag *quote* is true, the quotation mark |
|
350 character (``'"'``) is also translated; this helps for inclusion in an HTML |
|
351 attribute value, as in ``<A HREF="...">``. If the value to be quoted might |
|
352 include single- or double-quote characters, or both, consider using the |
|
353 :func:`quoteattr` function in the :mod:`xml.sax.saxutils` module instead. |
|
354 |
|
355 |
|
356 .. _cgi-security: |
|
357 |
|
358 Caring about security |
|
359 --------------------- |
|
360 |
|
361 .. index:: pair: CGI; security |
|
362 |
|
363 There's one important rule: if you invoke an external program (via the |
|
364 :func:`os.system` or :func:`os.popen` functions. or others with similar |
|
365 functionality), make very sure you don't pass arbitrary strings received from |
|
366 the client to the shell. This is a well-known security hole whereby clever |
|
367 hackers anywhere on the Web can exploit a gullible CGI script to invoke |
|
368 arbitrary shell commands. Even parts of the URL or field names cannot be |
|
369 trusted, since the request doesn't have to come from your form! |
|
370 |
|
371 To be on the safe side, if you must pass a string gotten from a form to a shell |
|
372 command, you should make sure the string contains only alphanumeric characters, |
|
373 dashes, underscores, and periods. |
|
374 |
|
375 |
|
376 Installing your CGI script on a Unix system |
|
377 ------------------------------------------- |
|
378 |
|
379 Read the documentation for your HTTP server and check with your local system |
|
380 administrator to find the directory where CGI scripts should be installed; |
|
381 usually this is in a directory :file:`cgi-bin` in the server tree. |
|
382 |
|
383 Make sure that your script is readable and executable by "others"; the Unix file |
|
384 mode should be ``0755`` octal (use ``chmod 0755 filename``). Make sure that the |
|
385 first line of the script contains ``#!`` starting in column 1 followed by the |
|
386 pathname of the Python interpreter, for instance:: |
|
387 |
|
388 #!/usr/local/bin/python |
|
389 |
|
390 Make sure the Python interpreter exists and is executable by "others". |
|
391 |
|
392 Make sure that any files your script needs to read or write are readable or |
|
393 writable, respectively, by "others" --- their mode should be ``0644`` for |
|
394 readable and ``0666`` for writable. This is because, for security reasons, the |
|
395 HTTP server executes your script as user "nobody", without any special |
|
396 privileges. It can only read (write, execute) files that everybody can read |
|
397 (write, execute). The current directory at execution time is also different (it |
|
398 is usually the server's cgi-bin directory) and the set of environment variables |
|
399 is also different from what you get when you log in. In particular, don't count |
|
400 on the shell's search path for executables (:envvar:`PATH`) or the Python module |
|
401 search path (:envvar:`PYTHONPATH`) to be set to anything interesting. |
|
402 |
|
403 If you need to load modules from a directory which is not on Python's default |
|
404 module search path, you can change the path in your script, before importing |
|
405 other modules. For example:: |
|
406 |
|
407 import sys |
|
408 sys.path.insert(0, "/usr/home/joe/lib/python") |
|
409 sys.path.insert(0, "/usr/local/lib/python") |
|
410 |
|
411 (This way, the directory inserted last will be searched first!) |
|
412 |
|
413 Instructions for non-Unix systems will vary; check your HTTP server's |
|
414 documentation (it will usually have a section on CGI scripts). |
|
415 |
|
416 |
|
417 Testing your CGI script |
|
418 ----------------------- |
|
419 |
|
420 Unfortunately, a CGI script will generally not run when you try it from the |
|
421 command line, and a script that works perfectly from the command line may fail |
|
422 mysteriously when run from the server. There's one reason why you should still |
|
423 test your script from the command line: if it contains a syntax error, the |
|
424 Python interpreter won't execute it at all, and the HTTP server will most likely |
|
425 send a cryptic error to the client. |
|
426 |
|
427 Assuming your script has no syntax errors, yet it does not work, you have no |
|
428 choice but to read the next section. |
|
429 |
|
430 |
|
431 Debugging CGI scripts |
|
432 --------------------- |
|
433 |
|
434 .. index:: pair: CGI; debugging |
|
435 |
|
436 First of all, check for trivial installation errors --- reading the section |
|
437 above on installing your CGI script carefully can save you a lot of time. If |
|
438 you wonder whether you have understood the installation procedure correctly, try |
|
439 installing a copy of this module file (:file:`cgi.py`) as a CGI script. When |
|
440 invoked as a script, the file will dump its environment and the contents of the |
|
441 form in HTML form. Give it the right mode etc, and send it a request. If it's |
|
442 installed in the standard :file:`cgi-bin` directory, it should be possible to |
|
443 send it a request by entering a URL into your browser of the form:: |
|
444 |
|
445 http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home |
|
446 |
|
447 If this gives an error of type 404, the server cannot find the script -- perhaps |
|
448 you need to install it in a different directory. If it gives another error, |
|
449 there's an installation problem that you should fix before trying to go any |
|
450 further. If you get a nicely formatted listing of the environment and form |
|
451 content (in this example, the fields should be listed as "addr" with value "At |
|
452 Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been |
|
453 installed correctly. If you follow the same procedure for your own script, you |
|
454 should now be able to debug it. |
|
455 |
|
456 The next step could be to call the :mod:`cgi` module's :func:`test` function |
|
457 from your script: replace its main code with the single statement :: |
|
458 |
|
459 cgi.test() |
|
460 |
|
461 This should produce the same results as those gotten from installing the |
|
462 :file:`cgi.py` file itself. |
|
463 |
|
464 When an ordinary Python script raises an unhandled exception (for whatever |
|
465 reason: of a typo in a module name, a file that can't be opened, etc.), the |
|
466 Python interpreter prints a nice traceback and exits. While the Python |
|
467 interpreter will still do this when your CGI script raises an exception, most |
|
468 likely the traceback will end up in one of the HTTP server's log files, or be |
|
469 discarded altogether. |
|
470 |
|
471 Fortunately, once you have managed to get your script to execute *some* code, |
|
472 you can easily send tracebacks to the Web browser using the :mod:`cgitb` module. |
|
473 If you haven't done so already, just add the line:: |
|
474 |
|
475 import cgitb; cgitb.enable() |
|
476 |
|
477 to the top of your script. Then try running it again; when a problem occurs, |
|
478 you should see a detailed report that will likely make apparent the cause of the |
|
479 crash. |
|
480 |
|
481 If you suspect that there may be a problem in importing the :mod:`cgitb` module, |
|
482 you can use an even more robust approach (which only uses built-in modules):: |
|
483 |
|
484 import sys |
|
485 sys.stderr = sys.stdout |
|
486 print "Content-Type: text/plain" |
|
487 print |
|
488 ...your code here... |
|
489 |
|
490 This relies on the Python interpreter to print the traceback. The content type |
|
491 of the output is set to plain text, which disables all HTML processing. If your |
|
492 script works, the raw HTML will be displayed by your client. If it raises an |
|
493 exception, most likely after the first two lines have been printed, a traceback |
|
494 will be displayed. Because no HTML interpretation is going on, the traceback |
|
495 will be readable. |
|
496 |
|
497 |
|
498 Common problems and solutions |
|
499 ----------------------------- |
|
500 |
|
501 * Most HTTP servers buffer the output from CGI scripts until the script is |
|
502 completed. This means that it is not possible to display a progress report on |
|
503 the client's display while the script is running. |
|
504 |
|
505 * Check the installation instructions above. |
|
506 |
|
507 * Check the HTTP server's log files. (``tail -f logfile`` in a separate window |
|
508 may be useful!) |
|
509 |
|
510 * Always check a script for syntax errors first, by doing something like |
|
511 ``python script.py``. |
|
512 |
|
513 * If your script does not have any syntax errors, try adding ``import cgitb; |
|
514 cgitb.enable()`` to the top of the script. |
|
515 |
|
516 * When invoking external programs, make sure they can be found. Usually, this |
|
517 means using absolute path names --- :envvar:`PATH` is usually not set to a very |
|
518 useful value in a CGI script. |
|
519 |
|
520 * When reading or writing external files, make sure they can be read or written |
|
521 by the userid under which your CGI script will be running: this is typically the |
|
522 userid under which the web server is running, or some explicitly specified |
|
523 userid for a web server's ``suexec`` feature. |
|
524 |
|
525 * Don't try to give a CGI script a set-uid mode. This doesn't work on most |
|
526 systems, and is a security liability as well. |
|
527 |
|
528 .. rubric:: Footnotes |
|
529 |
|
530 .. [#] Note that some recent versions of the HTML specification do state what order the |
|
531 field values should be supplied in, but knowing whether a request was |
|
532 received from a conforming browser, or even from a browser at all, is tedious |
|
533 and error-prone. |
|
534 |