symbian-qemu-0.9.1-12/python-2.6.1/Doc/library/urlparse.rst
changeset 1 2fb8b9db1c86
equal deleted inserted replaced
0:ffa851df0825 1:2fb8b9db1c86
       
     1 :mod:`urlparse` --- Parse URLs into components
       
     2 ==============================================
       
     3 
       
     4 .. module:: urlparse
       
     5    :synopsis: Parse URLs into or assemble them from components.
       
     6 
       
     7 
       
     8 .. index::
       
     9    single: WWW
       
    10    single: World Wide Web
       
    11    single: URL
       
    12    pair: URL; parsing
       
    13    pair: relative; URL
       
    14 
       
    15 .. note::
       
    16    The :mod:`urlparse` module is renamed to :mod:`urllib.parse` in Python 3.0.
       
    17    The :term:`2to3` tool will automatically adapt imports when converting
       
    18    your sources to 3.0.
       
    19 
       
    20 
       
    21 This module defines a standard interface to break Uniform Resource Locator (URL)
       
    22 strings up in components (addressing scheme, network location, path etc.), to
       
    23 combine the components back into a URL string, and to convert a "relative URL"
       
    24 to an absolute URL given a "base URL."
       
    25 
       
    26 The module has been designed to match the Internet RFC on Relative Uniform
       
    27 Resource Locators (and discovered a bug in an earlier draft!). It supports the
       
    28 following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
       
    29 ``https``, ``imap``, ``mailto``, ``mms``, ``news``,  ``nntp``, ``prospero``,
       
    30 ``rsync``, ``rtsp``, ``rtspu``,  ``sftp``, ``shttp``, ``sip``, ``sips``,
       
    31 ``snews``, ``svn``,  ``svn+ssh``, ``telnet``, ``wais``.
       
    32 
       
    33 .. versionadded:: 2.5
       
    34    Support for the ``sftp`` and ``sips`` schemes.
       
    35 
       
    36 The :mod:`urlparse` module defines the following functions:
       
    37 
       
    38 
       
    39 .. function:: urlparse(urlstring[, default_scheme[, allow_fragments]])
       
    40 
       
    41    Parse a URL into six components, returning a 6-tuple.  This corresponds to the
       
    42    general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
       
    43    Each tuple item is a string, possibly empty. The components are not broken up in
       
    44    smaller parts (for example, the network location is a single string), and %
       
    45    escapes are not expanded. The delimiters as shown above are not part of the
       
    46    result, except for a leading slash in the *path* component, which is retained if
       
    47    present.  For example:
       
    48 
       
    49       >>> from urlparse import urlparse
       
    50       >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
       
    51       >>> o   # doctest: +NORMALIZE_WHITESPACE
       
    52       ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
       
    53                   params='', query='', fragment='')
       
    54       >>> o.scheme
       
    55       'http'
       
    56       >>> o.port
       
    57       80
       
    58       >>> o.geturl()
       
    59       'http://www.cwi.nl:80/%7Eguido/Python.html'
       
    60 
       
    61    If the *default_scheme* argument is specified, it gives the default addressing
       
    62    scheme, to be used only if the URL does not specify one.  The default value for
       
    63    this argument is the empty string.
       
    64 
       
    65    If the *allow_fragments* argument is false, fragment identifiers are not
       
    66    allowed, even if the URL's addressing scheme normally does support them.  The
       
    67    default value for this argument is :const:`True`.
       
    68 
       
    69    The return value is actually an instance of a subclass of :class:`tuple`.  This
       
    70    class has the following additional read-only convenience attributes:
       
    71 
       
    72    +------------------+-------+--------------------------+----------------------+
       
    73    | Attribute        | Index | Value                    | Value if not present |
       
    74    +==================+=======+==========================+======================+
       
    75    | :attr:`scheme`   | 0     | URL scheme specifier     | empty string         |
       
    76    +------------------+-------+--------------------------+----------------------+
       
    77    | :attr:`netloc`   | 1     | Network location part    | empty string         |
       
    78    +------------------+-------+--------------------------+----------------------+
       
    79    | :attr:`path`     | 2     | Hierarchical path        | empty string         |
       
    80    +------------------+-------+--------------------------+----------------------+
       
    81    | :attr:`params`   | 3     | Parameters for last path | empty string         |
       
    82    |                  |       | element                  |                      |
       
    83    +------------------+-------+--------------------------+----------------------+
       
    84    | :attr:`query`    | 4     | Query component          | empty string         |
       
    85    +------------------+-------+--------------------------+----------------------+
       
    86    | :attr:`fragment` | 5     | Fragment identifier      | empty string         |
       
    87    +------------------+-------+--------------------------+----------------------+
       
    88    | :attr:`username` |       | User name                | :const:`None`        |
       
    89    +------------------+-------+--------------------------+----------------------+
       
    90    | :attr:`password` |       | Password                 | :const:`None`        |
       
    91    +------------------+-------+--------------------------+----------------------+
       
    92    | :attr:`hostname` |       | Host name (lower case)   | :const:`None`        |
       
    93    +------------------+-------+--------------------------+----------------------+
       
    94    | :attr:`port`     |       | Port number as integer,  | :const:`None`        |
       
    95    |                  |       | if present               |                      |
       
    96    +------------------+-------+--------------------------+----------------------+
       
    97 
       
    98    See section :ref:`urlparse-result-object` for more information on the result
       
    99    object.
       
   100 
       
   101    .. versionchanged:: 2.5
       
   102       Added attributes to return value.
       
   103 
       
   104 .. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]])
       
   105 
       
   106    Parse a query string given as a string argument (data of type
       
   107    :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
       
   108    dictionary.  The dictionary keys are the unique query variable names and the
       
   109    values are lists of values for each name.
       
   110 
       
   111    The optional argument *keep_blank_values* is a flag indicating whether blank
       
   112    values in URL encoded queries should be treated as blank strings.   A true value
       
   113    indicates that blanks should be retained as  blank strings.  The default false
       
   114    value indicates that blank values are to be ignored and treated as if they were
       
   115    not included.
       
   116 
       
   117    The optional argument *strict_parsing* is a flag indicating what to do with
       
   118    parsing errors.  If false (the default), errors are silently ignored.  If true,
       
   119    errors raise a :exc:`ValueError` exception.
       
   120 
       
   121    Use the :func:`urllib.urlencode` function to convert such dictionaries into
       
   122    query strings.
       
   123 
       
   124 
       
   125 .. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]])
       
   126 
       
   127    Parse a query string given as a string argument (data of type
       
   128    :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a list of
       
   129    name, value pairs.
       
   130 
       
   131    The optional argument *keep_blank_values* is a flag indicating whether blank
       
   132    values in URL encoded queries should be treated as blank strings.   A true value
       
   133    indicates that blanks should be retained as  blank strings.  The default false
       
   134    value indicates that blank values are to be ignored and treated as if they were
       
   135    not included.
       
   136 
       
   137    The optional argument *strict_parsing* is a flag indicating what to do with
       
   138    parsing errors.  If false (the default), errors are silently ignored.  If true,
       
   139    errors raise a :exc:`ValueError` exception.
       
   140 
       
   141    Use the :func:`urllib.urlencode` function to convert such lists of pairs into
       
   142    query strings.
       
   143 
       
   144 .. function:: urlunparse(parts)
       
   145 
       
   146    Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
       
   147    can be any six-item iterable. This may result in a slightly different, but
       
   148    equivalent URL, if the URL that was parsed originally had unnecessary delimiters
       
   149    (for example, a ? with an empty query; the RFC states that these are
       
   150    equivalent).
       
   151 
       
   152 
       
   153 .. function:: urlsplit(urlstring[, default_scheme[, allow_fragments]])
       
   154 
       
   155    This is similar to :func:`urlparse`, but does not split the params from the URL.
       
   156    This should generally be used instead of :func:`urlparse` if the more recent URL
       
   157    syntax allowing parameters to be applied to each segment of the *path* portion
       
   158    of the URL (see :rfc:`2396`) is wanted.  A separate function is needed to
       
   159    separate the path segments and parameters.  This function returns a 5-tuple:
       
   160    (addressing scheme, network location, path, query, fragment identifier).
       
   161 
       
   162    The return value is actually an instance of a subclass of :class:`tuple`.  This
       
   163    class has the following additional read-only convenience attributes:
       
   164 
       
   165    +------------------+-------+-------------------------+----------------------+
       
   166    | Attribute        | Index | Value                   | Value if not present |
       
   167    +==================+=======+=========================+======================+
       
   168    | :attr:`scheme`   | 0     | URL scheme specifier    | empty string         |
       
   169    +------------------+-------+-------------------------+----------------------+
       
   170    | :attr:`netloc`   | 1     | Network location part   | empty string         |
       
   171    +------------------+-------+-------------------------+----------------------+
       
   172    | :attr:`path`     | 2     | Hierarchical path       | empty string         |
       
   173    +------------------+-------+-------------------------+----------------------+
       
   174    | :attr:`query`    | 3     | Query component         | empty string         |
       
   175    +------------------+-------+-------------------------+----------------------+
       
   176    | :attr:`fragment` | 4     | Fragment identifier     | empty string         |
       
   177    +------------------+-------+-------------------------+----------------------+
       
   178    | :attr:`username` |       | User name               | :const:`None`        |
       
   179    +------------------+-------+-------------------------+----------------------+
       
   180    | :attr:`password` |       | Password                | :const:`None`        |
       
   181    +------------------+-------+-------------------------+----------------------+
       
   182    | :attr:`hostname` |       | Host name (lower case)  | :const:`None`        |
       
   183    +------------------+-------+-------------------------+----------------------+
       
   184    | :attr:`port`     |       | Port number as integer, | :const:`None`        |
       
   185    |                  |       | if present              |                      |
       
   186    +------------------+-------+-------------------------+----------------------+
       
   187 
       
   188    See section :ref:`urlparse-result-object` for more information on the result
       
   189    object.
       
   190 
       
   191    .. versionadded:: 2.2
       
   192 
       
   193    .. versionchanged:: 2.5
       
   194       Added attributes to return value.
       
   195 
       
   196 
       
   197 .. function:: urlunsplit(parts)
       
   198 
       
   199    Combine the elements of a tuple as returned by :func:`urlsplit` into a complete
       
   200    URL as a string. The *parts* argument can be any five-item iterable. This may
       
   201    result in a slightly different, but equivalent URL, if the URL that was parsed
       
   202    originally had unnecessary delimiters (for example, a ? with an empty query; the
       
   203    RFC states that these are equivalent).
       
   204 
       
   205    .. versionadded:: 2.2
       
   206 
       
   207 
       
   208 .. function:: urljoin(base, url[, allow_fragments])
       
   209 
       
   210    Construct a full ("absolute") URL by combining a "base URL" (*base*) with
       
   211    another URL (*url*).  Informally, this uses components of the base URL, in
       
   212    particular the addressing scheme, the network location and (part of) the path,
       
   213    to provide missing components in the relative URL.  For example:
       
   214 
       
   215       >>> from urlparse import urljoin
       
   216       >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
       
   217       'http://www.cwi.nl/%7Eguido/FAQ.html'
       
   218 
       
   219    The *allow_fragments* argument has the same meaning and default as for
       
   220    :func:`urlparse`.
       
   221 
       
   222    .. note::
       
   223 
       
   224       If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
       
   225       the *url*'s host name and/or scheme will be present in the result.  For example:
       
   226 
       
   227    .. doctest::
       
   228 
       
   229       >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
       
   230       ...         '//www.python.org/%7Eguido')
       
   231       'http://www.python.org/%7Eguido'
       
   232 
       
   233    If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
       
   234    :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
       
   235 
       
   236 
       
   237 .. function:: urldefrag(url)
       
   238 
       
   239    If *url* contains a fragment identifier, returns a modified version of *url*
       
   240    with no fragment identifier, and the fragment identifier as a separate string.
       
   241    If there is no fragment identifier in *url*, returns *url* unmodified and an
       
   242    empty string.
       
   243 
       
   244 
       
   245 .. seealso::
       
   246 
       
   247    :rfc:`1738` - Uniform Resource Locators (URL)
       
   248       This specifies the formal syntax and semantics of absolute URLs.
       
   249 
       
   250    :rfc:`1808` - Relative Uniform Resource Locators
       
   251       This Request For Comments includes the rules for joining an absolute and a
       
   252       relative URL, including a fair number of "Abnormal Examples" which govern the
       
   253       treatment of border cases.
       
   254 
       
   255    :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
       
   256       Document describing the generic syntactic requirements for both Uniform Resource
       
   257       Names (URNs) and Uniform Resource Locators (URLs).
       
   258 
       
   259 
       
   260 .. _urlparse-result-object:
       
   261 
       
   262 Results of :func:`urlparse` and :func:`urlsplit`
       
   263 ------------------------------------------------
       
   264 
       
   265 The result objects from the :func:`urlparse` and :func:`urlsplit` functions are
       
   266 subclasses of the :class:`tuple` type.  These subclasses add the attributes
       
   267 described in those functions, as well as provide an additional method:
       
   268 
       
   269 
       
   270 .. method:: ParseResult.geturl()
       
   271 
       
   272    Return the re-combined version of the original URL as a string. This may differ
       
   273    from the original URL in that the scheme will always be normalized to lower case
       
   274    and empty components may be dropped. Specifically, empty parameters, queries,
       
   275    and fragment identifiers will be removed.
       
   276 
       
   277    The result of this method is a fixpoint if passed back through the original
       
   278    parsing function:
       
   279 
       
   280       >>> import urlparse
       
   281       >>> url = 'HTTP://www.Python.org/doc/#'
       
   282 
       
   283       >>> r1 = urlparse.urlsplit(url)
       
   284       >>> r1.geturl()
       
   285       'http://www.Python.org/doc/'
       
   286 
       
   287       >>> r2 = urlparse.urlsplit(r1.geturl())
       
   288       >>> r2.geturl()
       
   289       'http://www.Python.org/doc/'
       
   290 
       
   291    .. versionadded:: 2.5
       
   292 
       
   293 The following classes provide the implementations of the parse results::
       
   294 
       
   295 
       
   296 .. class:: BaseResult
       
   297 
       
   298    Base class for the concrete result classes.  This provides most of the attribute
       
   299    definitions.  It does not provide a :meth:`geturl` method.  It is derived from
       
   300    :class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__`
       
   301    methods.
       
   302 
       
   303 
       
   304 .. class:: ParseResult(scheme, netloc, path, params, query, fragment)
       
   305 
       
   306    Concrete class for :func:`urlparse` results.  The :meth:`__new__` method is
       
   307    overridden to support checking that the right number of arguments are passed.
       
   308 
       
   309 
       
   310 .. class:: SplitResult(scheme, netloc, path, query, fragment)
       
   311 
       
   312    Concrete class for :func:`urlsplit` results.  The :meth:`__new__` method is
       
   313    overridden to support checking that the right number of arguments are passed.
       
   314