|
1 :mod:`urllib2` --- extensible library for opening URLs |
|
2 ====================================================== |
|
3 |
|
4 .. module:: urllib2 |
|
5 :synopsis: Next generation URL opening library. |
|
6 .. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net> |
|
7 .. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> |
|
8 |
|
9 |
|
10 .. note:: |
|
11 The :mod:`urllib2` module has been split across several modules in |
|
12 Python 3.0 named :mod:`urllib.request` and :mod:`urllib.error`. |
|
13 The :term:`2to3` tool will automatically adapt imports when converting |
|
14 your sources to 3.0. |
|
15 |
|
16 |
|
17 The :mod:`urllib2` module defines functions and classes which help in opening |
|
18 URLs (mostly HTTP) in a complex world --- basic and digest authentication, |
|
19 redirections, cookies and more. |
|
20 |
|
21 The :mod:`urllib2` module defines the following functions: |
|
22 |
|
23 |
|
24 .. function:: urlopen(url[, data][, timeout]) |
|
25 |
|
26 Open the URL *url*, which can be either a string or a :class:`Request` object. |
|
27 |
|
28 *data* may be a string specifying additional data to send to the server, or |
|
29 ``None`` if no such data is needed. Currently HTTP requests are the only ones |
|
30 that use *data*; the HTTP request will be a POST instead of a GET when the |
|
31 *data* parameter is provided. *data* should be a buffer in the standard |
|
32 :mimetype:`application/x-www-form-urlencoded` format. The |
|
33 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and |
|
34 returns a string in this format. |
|
35 |
|
36 The optional *timeout* parameter specifies a timeout in seconds for blocking |
|
37 operations like the connection attempt (if not specified, the global default |
|
38 timeout setting will be used). This actually only works for HTTP, HTTPS, |
|
39 FTP and FTPS connections. |
|
40 |
|
41 This function returns a file-like object with two additional methods: |
|
42 |
|
43 * :meth:`geturl` --- return the URL of the resource retrieved, commonly used to |
|
44 determine if a redirect was followed |
|
45 |
|
46 * :meth:`info` --- return the meta-information of the page, such as headers, in |
|
47 the form of an ``httplib.HTTPMessage`` instance |
|
48 (see `Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_) |
|
49 |
|
50 Raises :exc:`URLError` on errors. |
|
51 |
|
52 Note that ``None`` may be returned if no handler handles the request (though the |
|
53 default installed global :class:`OpenerDirector` uses :class:`UnknownHandler` to |
|
54 ensure this never happens). |
|
55 |
|
56 .. versionchanged:: 2.6 |
|
57 *timeout* was added. |
|
58 |
|
59 |
|
60 .. function:: install_opener(opener) |
|
61 |
|
62 Install an :class:`OpenerDirector` instance as the default global opener. |
|
63 Installing an opener is only necessary if you want urlopen to use that opener; |
|
64 otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`. |
|
65 The code does not check for a real :class:`OpenerDirector`, and any class with |
|
66 the appropriate interface will work. |
|
67 |
|
68 |
|
69 .. function:: build_opener([handler, ...]) |
|
70 |
|
71 Return an :class:`OpenerDirector` instance, which chains the handlers in the |
|
72 order given. *handler*\s can be either instances of :class:`BaseHandler`, or |
|
73 subclasses of :class:`BaseHandler` (in which case it must be possible to call |
|
74 the constructor without any parameters). Instances of the following classes |
|
75 will be in front of the *handler*\s, unless the *handler*\s contain them, |
|
76 instances of them or subclasses of them: :class:`ProxyHandler`, |
|
77 :class:`UnknownHandler`, :class:`HTTPHandler`, :class:`HTTPDefaultErrorHandler`, |
|
78 :class:`HTTPRedirectHandler`, :class:`FTPHandler`, :class:`FileHandler`, |
|
79 :class:`HTTPErrorProcessor`. |
|
80 |
|
81 If the Python installation has SSL support (i.e., if the :mod:`ssl` module can be imported), |
|
82 :class:`HTTPSHandler` will also be added. |
|
83 |
|
84 Beginning in Python 2.3, a :class:`BaseHandler` subclass may also change its |
|
85 :attr:`handler_order` member variable to modify its position in the handlers |
|
86 list. |
|
87 |
|
88 The following exceptions are raised as appropriate: |
|
89 |
|
90 |
|
91 .. exception:: URLError |
|
92 |
|
93 The handlers raise this exception (or derived exceptions) when they run into a |
|
94 problem. It is a subclass of :exc:`IOError`. |
|
95 |
|
96 .. attribute:: reason |
|
97 |
|
98 The reason for this error. It can be a message string or another exception |
|
99 instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local |
|
100 URLs). |
|
101 |
|
102 |
|
103 .. exception:: HTTPError |
|
104 |
|
105 Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError` |
|
106 can also function as a non-exceptional file-like return value (the same thing |
|
107 that :func:`urlopen` returns). This is useful when handling exotic HTTP |
|
108 errors, such as requests for authentication. |
|
109 |
|
110 .. attribute:: code |
|
111 |
|
112 An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_. |
|
113 This numeric value corresponds to a value found in the dictionary of |
|
114 codes as found in :attr:`BaseHTTPServer.BaseHTTPRequestHandler.responses`. |
|
115 |
|
116 |
|
117 |
|
118 The following classes are provided: |
|
119 |
|
120 |
|
121 .. class:: Request(url[, data][, headers][, origin_req_host][, unverifiable]) |
|
122 |
|
123 This class is an abstraction of a URL request. |
|
124 |
|
125 *url* should be a string containing a valid URL. |
|
126 |
|
127 *data* may be a string specifying additional data to send to the server, or |
|
128 ``None`` if no such data is needed. Currently HTTP requests are the only ones |
|
129 that use *data*; the HTTP request will be a POST instead of a GET when the |
|
130 *data* parameter is provided. *data* should be a buffer in the standard |
|
131 :mimetype:`application/x-www-form-urlencoded` format. The |
|
132 :func:`urllib.urlencode` function takes a mapping or sequence of 2-tuples and |
|
133 returns a string in this format. |
|
134 |
|
135 *headers* should be a dictionary, and will be treated as if :meth:`add_header` |
|
136 was called with each key and value as arguments. This is often used to "spoof" |
|
137 the ``User-Agent`` header, which is used by a browser to identify itself -- |
|
138 some HTTP servers only allow requests coming from common browsers as opposed |
|
139 to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 |
|
140 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s |
|
141 default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6). |
|
142 |
|
143 The final two arguments are only of interest for correct handling of third-party |
|
144 HTTP cookies: |
|
145 |
|
146 *origin_req_host* should be the request-host of the origin transaction, as |
|
147 defined by :rfc:`2965`. It defaults to ``cookielib.request_host(self)``. This |
|
148 is the host name or IP address of the original request that was initiated by the |
|
149 user. For example, if the request is for an image in an HTML document, this |
|
150 should be the request-host of the request for the page containing the image. |
|
151 |
|
152 *unverifiable* should indicate whether the request is unverifiable, as defined |
|
153 by RFC 2965. It defaults to False. An unverifiable request is one whose URL |
|
154 the user did not have the option to approve. For example, if the request is for |
|
155 an image in an HTML document, and the user had no option to approve the |
|
156 automatic fetching of the image, this should be true. |
|
157 |
|
158 |
|
159 .. class:: OpenerDirector() |
|
160 |
|
161 The :class:`OpenerDirector` class opens URLs via :class:`BaseHandler`\ s chained |
|
162 together. It manages the chaining of handlers, and recovery from errors. |
|
163 |
|
164 |
|
165 .. class:: BaseHandler() |
|
166 |
|
167 This is the base class for all registered handlers --- and handles only the |
|
168 simple mechanics of registration. |
|
169 |
|
170 |
|
171 .. class:: HTTPDefaultErrorHandler() |
|
172 |
|
173 A class which defines a default handler for HTTP error responses; all responses |
|
174 are turned into :exc:`HTTPError` exceptions. |
|
175 |
|
176 |
|
177 .. class:: HTTPRedirectHandler() |
|
178 |
|
179 A class to handle redirections. |
|
180 |
|
181 |
|
182 .. class:: HTTPCookieProcessor([cookiejar]) |
|
183 |
|
184 A class to handle HTTP Cookies. |
|
185 |
|
186 |
|
187 .. class:: ProxyHandler([proxies]) |
|
188 |
|
189 Cause requests to go through a proxy. If *proxies* is given, it must be a |
|
190 dictionary mapping protocol names to URLs of proxies. The default is to read the |
|
191 list of proxies from the environment variables :envvar:`<protocol>_proxy`. |
|
192 To disable autodetected proxy pass an empty dictionary. |
|
193 |
|
194 |
|
195 .. class:: HTTPPasswordMgr() |
|
196 |
|
197 Keep a database of ``(realm, uri) -> (user, password)`` mappings. |
|
198 |
|
199 |
|
200 .. class:: HTTPPasswordMgrWithDefaultRealm() |
|
201 |
|
202 Keep a database of ``(realm, uri) -> (user, password)`` mappings. A realm of |
|
203 ``None`` is considered a catch-all realm, which is searched if no other realm |
|
204 fits. |
|
205 |
|
206 |
|
207 .. class:: AbstractBasicAuthHandler([password_mgr]) |
|
208 |
|
209 This is a mixin class that helps with HTTP authentication, both to the remote |
|
210 host and to a proxy. *password_mgr*, if given, should be something that is |
|
211 compatible with :class:`HTTPPasswordMgr`; refer to section |
|
212 :ref:`http-password-mgr` for information on the interface that must be |
|
213 supported. |
|
214 |
|
215 |
|
216 .. class:: HTTPBasicAuthHandler([password_mgr]) |
|
217 |
|
218 Handle authentication with the remote host. *password_mgr*, if given, should be |
|
219 something that is compatible with :class:`HTTPPasswordMgr`; refer to section |
|
220 :ref:`http-password-mgr` for information on the interface that must be |
|
221 supported. |
|
222 |
|
223 |
|
224 .. class:: ProxyBasicAuthHandler([password_mgr]) |
|
225 |
|
226 Handle authentication with the proxy. *password_mgr*, if given, should be |
|
227 something that is compatible with :class:`HTTPPasswordMgr`; refer to section |
|
228 :ref:`http-password-mgr` for information on the interface that must be |
|
229 supported. |
|
230 |
|
231 |
|
232 .. class:: AbstractDigestAuthHandler([password_mgr]) |
|
233 |
|
234 This is a mixin class that helps with HTTP authentication, both to the remote |
|
235 host and to a proxy. *password_mgr*, if given, should be something that is |
|
236 compatible with :class:`HTTPPasswordMgr`; refer to section |
|
237 :ref:`http-password-mgr` for information on the interface that must be |
|
238 supported. |
|
239 |
|
240 |
|
241 .. class:: HTTPDigestAuthHandler([password_mgr]) |
|
242 |
|
243 Handle authentication with the remote host. *password_mgr*, if given, should be |
|
244 something that is compatible with :class:`HTTPPasswordMgr`; refer to section |
|
245 :ref:`http-password-mgr` for information on the interface that must be |
|
246 supported. |
|
247 |
|
248 |
|
249 .. class:: ProxyDigestAuthHandler([password_mgr]) |
|
250 |
|
251 Handle authentication with the proxy. *password_mgr*, if given, should be |
|
252 something that is compatible with :class:`HTTPPasswordMgr`; refer to section |
|
253 :ref:`http-password-mgr` for information on the interface that must be |
|
254 supported. |
|
255 |
|
256 |
|
257 .. class:: HTTPHandler() |
|
258 |
|
259 A class to handle opening of HTTP URLs. |
|
260 |
|
261 |
|
262 .. class:: HTTPSHandler() |
|
263 |
|
264 A class to handle opening of HTTPS URLs. |
|
265 |
|
266 |
|
267 .. class:: FileHandler() |
|
268 |
|
269 Open local files. |
|
270 |
|
271 |
|
272 .. class:: FTPHandler() |
|
273 |
|
274 Open FTP URLs. |
|
275 |
|
276 |
|
277 .. class:: CacheFTPHandler() |
|
278 |
|
279 Open FTP URLs, keeping a cache of open FTP connections to minimize delays. |
|
280 |
|
281 |
|
282 .. class:: UnknownHandler() |
|
283 |
|
284 A catch-all class to handle unknown URLs. |
|
285 |
|
286 |
|
287 .. _request-objects: |
|
288 |
|
289 Request Objects |
|
290 --------------- |
|
291 |
|
292 The following methods describe all of :class:`Request`'s public interface, and |
|
293 so all must be overridden in subclasses. |
|
294 |
|
295 |
|
296 .. method:: Request.add_data(data) |
|
297 |
|
298 Set the :class:`Request` data to *data*. This is ignored by all handlers except |
|
299 HTTP handlers --- and there it should be a byte string, and will change the |
|
300 request to be ``POST`` rather than ``GET``. |
|
301 |
|
302 |
|
303 .. method:: Request.get_method() |
|
304 |
|
305 Return a string indicating the HTTP request method. This is only meaningful for |
|
306 HTTP requests, and currently always returns ``'GET'`` or ``'POST'``. |
|
307 |
|
308 |
|
309 .. method:: Request.has_data() |
|
310 |
|
311 Return whether the instance has a non-\ ``None`` data. |
|
312 |
|
313 |
|
314 .. method:: Request.get_data() |
|
315 |
|
316 Return the instance's data. |
|
317 |
|
318 |
|
319 .. method:: Request.add_header(key, val) |
|
320 |
|
321 Add another header to the request. Headers are currently ignored by all |
|
322 handlers except HTTP handlers, where they are added to the list of headers sent |
|
323 to the server. Note that there cannot be more than one header with the same |
|
324 name, and later calls will overwrite previous calls in case the *key* collides. |
|
325 Currently, this is no loss of HTTP functionality, since all headers which have |
|
326 meaning when used more than once have a (header-specific) way of gaining the |
|
327 same functionality using only one header. |
|
328 |
|
329 |
|
330 .. method:: Request.add_unredirected_header(key, header) |
|
331 |
|
332 Add a header that will not be added to a redirected request. |
|
333 |
|
334 .. versionadded:: 2.4 |
|
335 |
|
336 |
|
337 .. method:: Request.has_header(header) |
|
338 |
|
339 Return whether the instance has the named header (checks both regular and |
|
340 unredirected). |
|
341 |
|
342 .. versionadded:: 2.4 |
|
343 |
|
344 |
|
345 .. method:: Request.get_full_url() |
|
346 |
|
347 Return the URL given in the constructor. |
|
348 |
|
349 |
|
350 .. method:: Request.get_type() |
|
351 |
|
352 Return the type of the URL --- also known as the scheme. |
|
353 |
|
354 |
|
355 .. method:: Request.get_host() |
|
356 |
|
357 Return the host to which a connection will be made. |
|
358 |
|
359 |
|
360 .. method:: Request.get_selector() |
|
361 |
|
362 Return the selector --- the part of the URL that is sent to the server. |
|
363 |
|
364 |
|
365 .. method:: Request.set_proxy(host, type) |
|
366 |
|
367 Prepare the request by connecting to a proxy server. The *host* and *type* will |
|
368 replace those of the instance, and the instance's selector will be the original |
|
369 URL given in the constructor. |
|
370 |
|
371 |
|
372 .. method:: Request.get_origin_req_host() |
|
373 |
|
374 Return the request-host of the origin transaction, as defined by :rfc:`2965`. |
|
375 See the documentation for the :class:`Request` constructor. |
|
376 |
|
377 |
|
378 .. method:: Request.is_unverifiable() |
|
379 |
|
380 Return whether the request is unverifiable, as defined by RFC 2965. See the |
|
381 documentation for the :class:`Request` constructor. |
|
382 |
|
383 |
|
384 .. _opener-director-objects: |
|
385 |
|
386 OpenerDirector Objects |
|
387 ---------------------- |
|
388 |
|
389 :class:`OpenerDirector` instances have the following methods: |
|
390 |
|
391 |
|
392 .. method:: OpenerDirector.add_handler(handler) |
|
393 |
|
394 *handler* should be an instance of :class:`BaseHandler`. The following methods |
|
395 are searched, and added to the possible chains (note that HTTP errors are a |
|
396 special case). |
|
397 |
|
398 * :meth:`protocol_open` --- signal that the handler knows how to open *protocol* |
|
399 URLs. |
|
400 |
|
401 * :meth:`http_error_type` --- signal that the handler knows how to handle HTTP |
|
402 errors with HTTP error code *type*. |
|
403 |
|
404 * :meth:`protocol_error` --- signal that the handler knows how to handle errors |
|
405 from (non-\ ``http``) *protocol*. |
|
406 |
|
407 * :meth:`protocol_request` --- signal that the handler knows how to pre-process |
|
408 *protocol* requests. |
|
409 |
|
410 * :meth:`protocol_response` --- signal that the handler knows how to |
|
411 post-process *protocol* responses. |
|
412 |
|
413 |
|
414 .. method:: OpenerDirector.open(url[, data][, timeout]) |
|
415 |
|
416 Open the given *url* (which can be a request object or a string), optionally |
|
417 passing the given *data*. Arguments, return values and exceptions raised are |
|
418 the same as those of :func:`urlopen` (which simply calls the :meth:`open` |
|
419 method on the currently installed global :class:`OpenerDirector`). The |
|
420 optional *timeout* parameter specifies a timeout in seconds for blocking |
|
421 operations like the connection attempt (if not specified, the global default |
|
422 timeout setting will be usedi). The timeout feature actually works only for |
|
423 HTTP, HTTPS, FTP and FTPS connections). |
|
424 |
|
425 .. versionchanged:: 2.6 |
|
426 *timeout* was added. |
|
427 |
|
428 |
|
429 .. method:: OpenerDirector.error(proto[, arg[, ...]]) |
|
430 |
|
431 Handle an error of the given protocol. This will call the registered error |
|
432 handlers for the given protocol with the given arguments (which are protocol |
|
433 specific). The HTTP protocol is a special case which uses the HTTP response |
|
434 code to determine the specific error handler; refer to the :meth:`http_error_\*` |
|
435 methods of the handler classes. |
|
436 |
|
437 Return values and exceptions raised are the same as those of :func:`urlopen`. |
|
438 |
|
439 OpenerDirector objects open URLs in three stages: |
|
440 |
|
441 The order in which these methods are called within each stage is determined by |
|
442 sorting the handler instances. |
|
443 |
|
444 #. Every handler with a method named like :meth:`protocol_request` has that |
|
445 method called to pre-process the request. |
|
446 |
|
447 #. Handlers with a method named like :meth:`protocol_open` are called to handle |
|
448 the request. This stage ends when a handler either returns a non-\ :const:`None` |
|
449 value (ie. a response), or raises an exception (usually :exc:`URLError`). |
|
450 Exceptions are allowed to propagate. |
|
451 |
|
452 In fact, the above algorithm is first tried for methods named |
|
453 :meth:`default_open`. If all such methods return :const:`None`, the algorithm |
|
454 is repeated for methods named like :meth:`protocol_open`. If all such methods |
|
455 return :const:`None`, the algorithm is repeated for methods named |
|
456 :meth:`unknown_open`. |
|
457 |
|
458 Note that the implementation of these methods may involve calls of the parent |
|
459 :class:`OpenerDirector` instance's :meth:`.open` and :meth:`.error` methods. |
|
460 |
|
461 #. Every handler with a method named like :meth:`protocol_response` has that |
|
462 method called to post-process the response. |
|
463 |
|
464 |
|
465 .. _base-handler-objects: |
|
466 |
|
467 BaseHandler Objects |
|
468 ------------------- |
|
469 |
|
470 :class:`BaseHandler` objects provide a couple of methods that are directly |
|
471 useful, and others that are meant to be used by derived classes. These are |
|
472 intended for direct use: |
|
473 |
|
474 |
|
475 .. method:: BaseHandler.add_parent(director) |
|
476 |
|
477 Add a director as parent. |
|
478 |
|
479 |
|
480 .. method:: BaseHandler.close() |
|
481 |
|
482 Remove any parents. |
|
483 |
|
484 The following members and methods should only be used by classes derived from |
|
485 :class:`BaseHandler`. |
|
486 |
|
487 .. note:: |
|
488 |
|
489 The convention has been adopted that subclasses defining |
|
490 :meth:`protocol_request` or :meth:`protocol_response` methods are named |
|
491 :class:`\*Processor`; all others are named :class:`\*Handler`. |
|
492 |
|
493 |
|
494 .. attribute:: BaseHandler.parent |
|
495 |
|
496 A valid :class:`OpenerDirector`, which can be used to open using a different |
|
497 protocol, or handle errors. |
|
498 |
|
499 |
|
500 .. method:: BaseHandler.default_open(req) |
|
501 |
|
502 This method is *not* defined in :class:`BaseHandler`, but subclasses should |
|
503 define it if they want to catch all URLs. |
|
504 |
|
505 This method, if implemented, will be called by the parent |
|
506 :class:`OpenerDirector`. It should return a file-like object as described in |
|
507 the return value of the :meth:`open` of :class:`OpenerDirector`, or ``None``. |
|
508 It should raise :exc:`URLError`, unless a truly exceptional thing happens (for |
|
509 example, :exc:`MemoryError` should not be mapped to :exc:`URLError`). |
|
510 |
|
511 This method will be called before any protocol-specific open method. |
|
512 |
|
513 |
|
514 .. method:: BaseHandler.protocol_open(req) |
|
515 :noindex: |
|
516 |
|
517 This method is *not* defined in :class:`BaseHandler`, but subclasses should |
|
518 define it if they want to handle URLs with the given protocol. |
|
519 |
|
520 This method, if defined, will be called by the parent :class:`OpenerDirector`. |
|
521 Return values should be the same as for :meth:`default_open`. |
|
522 |
|
523 |
|
524 .. method:: BaseHandler.unknown_open(req) |
|
525 |
|
526 This method is *not* defined in :class:`BaseHandler`, but subclasses should |
|
527 define it if they want to catch all URLs with no specific registered handler to |
|
528 open it. |
|
529 |
|
530 This method, if implemented, will be called by the :attr:`parent` |
|
531 :class:`OpenerDirector`. Return values should be the same as for |
|
532 :meth:`default_open`. |
|
533 |
|
534 |
|
535 .. method:: BaseHandler.http_error_default(req, fp, code, msg, hdrs) |
|
536 |
|
537 This method is *not* defined in :class:`BaseHandler`, but subclasses should |
|
538 override it if they intend to provide a catch-all for otherwise unhandled HTTP |
|
539 errors. It will be called automatically by the :class:`OpenerDirector` getting |
|
540 the error, and should not normally be called in other circumstances. |
|
541 |
|
542 *req* will be a :class:`Request` object, *fp* will be a file-like object with |
|
543 the HTTP error body, *code* will be the three-digit code of the error, *msg* |
|
544 will be the user-visible explanation of the code and *hdrs* will be a mapping |
|
545 object with the headers of the error. |
|
546 |
|
547 Return values and exceptions raised should be the same as those of |
|
548 :func:`urlopen`. |
|
549 |
|
550 |
|
551 .. method:: BaseHandler.http_error_nnn(req, fp, code, msg, hdrs) |
|
552 |
|
553 *nnn* should be a three-digit HTTP error code. This method is also not defined |
|
554 in :class:`BaseHandler`, but will be called, if it exists, on an instance of a |
|
555 subclass, when an HTTP error with code *nnn* occurs. |
|
556 |
|
557 Subclasses should override this method to handle specific HTTP errors. |
|
558 |
|
559 Arguments, return values and exceptions raised should be the same as for |
|
560 :meth:`http_error_default`. |
|
561 |
|
562 |
|
563 .. method:: BaseHandler.protocol_request(req) |
|
564 :noindex: |
|
565 |
|
566 This method is *not* defined in :class:`BaseHandler`, but subclasses should |
|
567 define it if they want to pre-process requests of the given protocol. |
|
568 |
|
569 This method, if defined, will be called by the parent :class:`OpenerDirector`. |
|
570 *req* will be a :class:`Request` object. The return value should be a |
|
571 :class:`Request` object. |
|
572 |
|
573 |
|
574 .. method:: BaseHandler.protocol_response(req, response) |
|
575 :noindex: |
|
576 |
|
577 This method is *not* defined in :class:`BaseHandler`, but subclasses should |
|
578 define it if they want to post-process responses of the given protocol. |
|
579 |
|
580 This method, if defined, will be called by the parent :class:`OpenerDirector`. |
|
581 *req* will be a :class:`Request` object. *response* will be an object |
|
582 implementing the same interface as the return value of :func:`urlopen`. The |
|
583 return value should implement the same interface as the return value of |
|
584 :func:`urlopen`. |
|
585 |
|
586 |
|
587 .. _http-redirect-handler: |
|
588 |
|
589 HTTPRedirectHandler Objects |
|
590 --------------------------- |
|
591 |
|
592 .. note:: |
|
593 |
|
594 Some HTTP redirections require action from this module's client code. If this |
|
595 is the case, :exc:`HTTPError` is raised. See :rfc:`2616` for details of the |
|
596 precise meanings of the various redirection codes. |
|
597 |
|
598 |
|
599 .. method:: HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs) |
|
600 |
|
601 Return a :class:`Request` or ``None`` in response to a redirect. This is called |
|
602 by the default implementations of the :meth:`http_error_30\*` methods when a |
|
603 redirection is received from the server. If a redirection should take place, |
|
604 return a new :class:`Request` to allow :meth:`http_error_30\*` to perform the |
|
605 redirect. Otherwise, raise :exc:`HTTPError` if no other handler should try to |
|
606 handle this URL, or return ``None`` if you can't but another handler might. |
|
607 |
|
608 .. note:: |
|
609 |
|
610 The default implementation of this method does not strictly follow :rfc:`2616`, |
|
611 which says that 301 and 302 responses to ``POST`` requests must not be |
|
612 automatically redirected without confirmation by the user. In reality, browsers |
|
613 do allow automatic redirection of these responses, changing the POST to a |
|
614 ``GET``, and the default implementation reproduces this behavior. |
|
615 |
|
616 |
|
617 .. method:: HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) |
|
618 |
|
619 Redirect to the ``Location:`` URL. This method is called by the parent |
|
620 :class:`OpenerDirector` when getting an HTTP 'moved permanently' response. |
|
621 |
|
622 |
|
623 .. method:: HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) |
|
624 |
|
625 The same as :meth:`http_error_301`, but called for the 'found' response. |
|
626 |
|
627 |
|
628 .. method:: HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) |
|
629 |
|
630 The same as :meth:`http_error_301`, but called for the 'see other' response. |
|
631 |
|
632 |
|
633 .. method:: HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) |
|
634 |
|
635 The same as :meth:`http_error_301`, but called for the 'temporary redirect' |
|
636 response. |
|
637 |
|
638 |
|
639 .. _http-cookie-processor: |
|
640 |
|
641 HTTPCookieProcessor Objects |
|
642 --------------------------- |
|
643 |
|
644 .. versionadded:: 2.4 |
|
645 |
|
646 :class:`HTTPCookieProcessor` instances have one attribute: |
|
647 |
|
648 |
|
649 .. attribute:: HTTPCookieProcessor.cookiejar |
|
650 |
|
651 The :class:`cookielib.CookieJar` in which cookies are stored. |
|
652 |
|
653 |
|
654 .. _proxy-handler: |
|
655 |
|
656 ProxyHandler Objects |
|
657 -------------------- |
|
658 |
|
659 |
|
660 .. method:: ProxyHandler.protocol_open(request) |
|
661 :noindex: |
|
662 |
|
663 The :class:`ProxyHandler` will have a method :meth:`protocol_open` for every |
|
664 *protocol* which has a proxy in the *proxies* dictionary given in the |
|
665 constructor. The method will modify requests to go through the proxy, by |
|
666 calling ``request.set_proxy()``, and call the next handler in the chain to |
|
667 actually execute the protocol. |
|
668 |
|
669 |
|
670 .. _http-password-mgr: |
|
671 |
|
672 HTTPPasswordMgr Objects |
|
673 ----------------------- |
|
674 |
|
675 These methods are available on :class:`HTTPPasswordMgr` and |
|
676 :class:`HTTPPasswordMgrWithDefaultRealm` objects. |
|
677 |
|
678 |
|
679 .. method:: HTTPPasswordMgr.add_password(realm, uri, user, passwd) |
|
680 |
|
681 *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and |
|
682 *passwd* must be strings. This causes ``(user, passwd)`` to be used as |
|
683 authentication tokens when authentication for *realm* and a super-URI of any of |
|
684 the given URIs is given. |
|
685 |
|
686 |
|
687 .. method:: HTTPPasswordMgr.find_user_password(realm, authuri) |
|
688 |
|
689 Get user/password for given realm and URI, if any. This method will return |
|
690 ``(None, None)`` if there is no matching user/password. |
|
691 |
|
692 For :class:`HTTPPasswordMgrWithDefaultRealm` objects, the realm ``None`` will be |
|
693 searched if the given *realm* has no matching user/password. |
|
694 |
|
695 |
|
696 .. _abstract-basic-auth-handler: |
|
697 |
|
698 AbstractBasicAuthHandler Objects |
|
699 -------------------------------- |
|
700 |
|
701 |
|
702 .. method:: AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) |
|
703 |
|
704 Handle an authentication request by getting a user/password pair, and re-trying |
|
705 the request. *authreq* should be the name of the header where the information |
|
706 about the realm is included in the request, *host* specifies the URL and path to |
|
707 authenticate for, *req* should be the (failed) :class:`Request` object, and |
|
708 *headers* should be the error headers. |
|
709 |
|
710 *host* is either an authority (e.g. ``"python.org"``) or a URL containing an |
|
711 authority component (e.g. ``"http://python.org/"``). In either case, the |
|
712 authority must not contain a userinfo component (so, ``"python.org"`` and |
|
713 ``"python.org:80"`` are fine, ``"joe:password@python.org"`` is not). |
|
714 |
|
715 |
|
716 .. _http-basic-auth-handler: |
|
717 |
|
718 HTTPBasicAuthHandler Objects |
|
719 ---------------------------- |
|
720 |
|
721 |
|
722 .. method:: HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) |
|
723 |
|
724 Retry the request with authentication information, if available. |
|
725 |
|
726 |
|
727 .. _proxy-basic-auth-handler: |
|
728 |
|
729 ProxyBasicAuthHandler Objects |
|
730 ----------------------------- |
|
731 |
|
732 |
|
733 .. method:: ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) |
|
734 |
|
735 Retry the request with authentication information, if available. |
|
736 |
|
737 |
|
738 .. _abstract-digest-auth-handler: |
|
739 |
|
740 AbstractDigestAuthHandler Objects |
|
741 --------------------------------- |
|
742 |
|
743 |
|
744 .. method:: AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) |
|
745 |
|
746 *authreq* should be the name of the header where the information about the realm |
|
747 is included in the request, *host* should be the host to authenticate to, *req* |
|
748 should be the (failed) :class:`Request` object, and *headers* should be the |
|
749 error headers. |
|
750 |
|
751 |
|
752 .. _http-digest-auth-handler: |
|
753 |
|
754 HTTPDigestAuthHandler Objects |
|
755 ----------------------------- |
|
756 |
|
757 |
|
758 .. method:: HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) |
|
759 |
|
760 Retry the request with authentication information, if available. |
|
761 |
|
762 |
|
763 .. _proxy-digest-auth-handler: |
|
764 |
|
765 ProxyDigestAuthHandler Objects |
|
766 ------------------------------ |
|
767 |
|
768 |
|
769 .. method:: ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) |
|
770 |
|
771 Retry the request with authentication information, if available. |
|
772 |
|
773 |
|
774 .. _http-handler-objects: |
|
775 |
|
776 HTTPHandler Objects |
|
777 ------------------- |
|
778 |
|
779 |
|
780 .. method:: HTTPHandler.http_open(req) |
|
781 |
|
782 Send an HTTP request, which can be either GET or POST, depending on |
|
783 ``req.has_data()``. |
|
784 |
|
785 |
|
786 .. _https-handler-objects: |
|
787 |
|
788 HTTPSHandler Objects |
|
789 -------------------- |
|
790 |
|
791 |
|
792 .. method:: HTTPSHandler.https_open(req) |
|
793 |
|
794 Send an HTTPS request, which can be either GET or POST, depending on |
|
795 ``req.has_data()``. |
|
796 |
|
797 |
|
798 .. _file-handler-objects: |
|
799 |
|
800 FileHandler Objects |
|
801 ------------------- |
|
802 |
|
803 |
|
804 .. method:: FileHandler.file_open(req) |
|
805 |
|
806 Open the file locally, if there is no host name, or the host name is |
|
807 ``'localhost'``. Change the protocol to ``ftp`` otherwise, and retry opening it |
|
808 using :attr:`parent`. |
|
809 |
|
810 |
|
811 .. _ftp-handler-objects: |
|
812 |
|
813 FTPHandler Objects |
|
814 ------------------ |
|
815 |
|
816 |
|
817 .. method:: FTPHandler.ftp_open(req) |
|
818 |
|
819 Open the FTP file indicated by *req*. The login is always done with empty |
|
820 username and password. |
|
821 |
|
822 |
|
823 .. _cacheftp-handler-objects: |
|
824 |
|
825 CacheFTPHandler Objects |
|
826 ----------------------- |
|
827 |
|
828 :class:`CacheFTPHandler` objects are :class:`FTPHandler` objects with the |
|
829 following additional methods: |
|
830 |
|
831 |
|
832 .. method:: CacheFTPHandler.setTimeout(t) |
|
833 |
|
834 Set timeout of connections to *t* seconds. |
|
835 |
|
836 |
|
837 .. method:: CacheFTPHandler.setMaxConns(m) |
|
838 |
|
839 Set maximum number of cached connections to *m*. |
|
840 |
|
841 |
|
842 .. _unknown-handler-objects: |
|
843 |
|
844 UnknownHandler Objects |
|
845 ---------------------- |
|
846 |
|
847 |
|
848 .. method:: UnknownHandler.unknown_open() |
|
849 |
|
850 Raise a :exc:`URLError` exception. |
|
851 |
|
852 |
|
853 .. _http-error-processor-objects: |
|
854 |
|
855 HTTPErrorProcessor Objects |
|
856 -------------------------- |
|
857 |
|
858 .. versionadded:: 2.4 |
|
859 |
|
860 |
|
861 .. method:: HTTPErrorProcessor.unknown_open() |
|
862 |
|
863 Process HTTP error responses. |
|
864 |
|
865 For 200 error codes, the response object is returned immediately. |
|
866 |
|
867 For non-200 error codes, this simply passes the job on to the |
|
868 :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`. |
|
869 Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an |
|
870 :exc:`HTTPError` if no other handler handles the error. |
|
871 |
|
872 |
|
873 .. _urllib2-examples: |
|
874 |
|
875 Examples |
|
876 -------- |
|
877 |
|
878 This example gets the python.org main page and displays the first 100 bytes of |
|
879 it:: |
|
880 |
|
881 >>> import urllib2 |
|
882 >>> f = urllib2.urlopen('http://www.python.org/') |
|
883 >>> print f.read(100) |
|
884 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
|
885 <?xml-stylesheet href="./css/ht2html |
|
886 |
|
887 Here we are sending a data-stream to the stdin of a CGI and reading the data it |
|
888 returns to us. Note that this example will only work when the Python |
|
889 installation supports SSL. :: |
|
890 |
|
891 >>> import urllib2 |
|
892 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi', |
|
893 ... data='This data is passed to stdin of the CGI') |
|
894 >>> f = urllib2.urlopen(req) |
|
895 >>> print f.read() |
|
896 Got Data: "This data is passed to stdin of the CGI" |
|
897 |
|
898 The code for the sample CGI used in the above example is:: |
|
899 |
|
900 #!/usr/bin/env python |
|
901 import sys |
|
902 data = sys.stdin.read() |
|
903 print 'Content-type: text-plain\n\nGot Data: "%s"' % data |
|
904 |
|
905 Use of Basic HTTP Authentication:: |
|
906 |
|
907 import urllib2 |
|
908 # Create an OpenerDirector with support for Basic HTTP Authentication... |
|
909 auth_handler = urllib2.HTTPBasicAuthHandler() |
|
910 auth_handler.add_password(realm='PDQ Application', |
|
911 uri='https://mahler:8092/site-updates.py', |
|
912 user='klem', |
|
913 passwd='kadidd!ehopper') |
|
914 opener = urllib2.build_opener(auth_handler) |
|
915 # ...and install it globally so it can be used with urlopen. |
|
916 urllib2.install_opener(opener) |
|
917 urllib2.urlopen('http://www.example.com/login.html') |
|
918 |
|
919 :func:`build_opener` provides many handlers by default, including a |
|
920 :class:`ProxyHandler`. By default, :class:`ProxyHandler` uses the environment |
|
921 variables named ``<scheme>_proxy``, where ``<scheme>`` is the URL scheme |
|
922 involved. For example, the :envvar:`http_proxy` environment variable is read to |
|
923 obtain the HTTP proxy's URL. |
|
924 |
|
925 This example replaces the default :class:`ProxyHandler` with one that uses |
|
926 programmatically-supplied proxy URLs, and adds proxy authorization support with |
|
927 :class:`ProxyBasicAuthHandler`. :: |
|
928 |
|
929 proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'}) |
|
930 proxy_auth_handler = urllib2.HTTPBasicAuthHandler() |
|
931 proxy_auth_handler.add_password('realm', 'host', 'username', 'password') |
|
932 |
|
933 opener = build_opener(proxy_handler, proxy_auth_handler) |
|
934 # This time, rather than install the OpenerDirector, we use it directly: |
|
935 opener.open('http://www.example.com/login.html') |
|
936 |
|
937 Adding HTTP headers: |
|
938 |
|
939 Use the *headers* argument to the :class:`Request` constructor, or:: |
|
940 |
|
941 import urllib2 |
|
942 req = urllib2.Request('http://www.example.com/') |
|
943 req.add_header('Referer', 'http://www.python.org/') |
|
944 r = urllib2.urlopen(req) |
|
945 |
|
946 :class:`OpenerDirector` automatically adds a :mailheader:`User-Agent` header to |
|
947 every :class:`Request`. To change this:: |
|
948 |
|
949 import urllib2 |
|
950 opener = urllib2.build_opener() |
|
951 opener.addheaders = [('User-agent', 'Mozilla/5.0')] |
|
952 opener.open('http://www.example.com/') |
|
953 |
|
954 Also, remember that a few standard headers (:mailheader:`Content-Length`, |
|
955 :mailheader:`Content-Type` and :mailheader:`Host`) are added when the |
|
956 :class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`). |
|
957 |