MCL/sf/adapt/qemu: comparison symbian-qemu-0.9.1-12/python-2.6.1/Doc/library/htmlparser.rst

equal deleted inserted replaced

-:ffa851df0825
+:2fb8b9db1c86
+:mod:`HTMLParser` --- Simple HTML and XHTML parser
+==================================================
+.. module:: HTMLParser
+:synopsis: A simple parser that can handle HTML and XHTML.
+.. note::
+The :mod:`HTMLParser` module has been renamed to :mod:`html.parser` in Python
+3.0.  The :term:`2to3` tool will automatically adapt imports when converting
+your sources to 3.0.
+.. versionadded:: 2.2
+.. index::
+single: HTML
+single: XHTML
+This module defines a class :class:`HTMLParser` which serves as the basis for
+parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
+Unlike the parser in :mod:`htmllib`, this parser is not based on the SGML parser
+in :mod:`sgmllib`.
+.. class:: HTMLParser()
+The :class:`HTMLParser` class is instantiated without arguments.
+An :class:`HTMLParser` instance is fed HTML data and calls handler functions when tags
+begin and end.  The :class:`HTMLParser` class is meant to be overridden by the
+user to provide a desired behavior.
+Unlike the parser in :mod:`htmllib`, this parser does not check that end tags
+match start tags or call the end-tag handler for elements which are closed
+implicitly by closing an outer element.
+An exception is defined as well:
+.. exception:: HTMLParseError
+Exception raised by the :class:`HTMLParser` class when it encounters an error
+while parsing.  This exception provides three attributes: :attr:`msg` is a brief
+message explaining the error, :attr:`lineno` is the number of the line on which
+the broken construct was detected, and :attr:`offset` is the number of
+characters into the line at which the construct starts.
+:class:`HTMLParser` instances have the following methods:
+.. method:: HTMLParser.reset()
+Reset the instance.  Loses all unprocessed data.  This is called implicitly at
+instantiation time.
+.. method:: HTMLParser.feed(data)
+Feed some text to the parser.  It is processed insofar as it consists of
+complete elements; incomplete data is buffered until more data is fed or
+:meth:`close` is called.
+.. method:: HTMLParser.close()
+Force processing of all buffered data as if it were followed by an end-of-file
+mark.  This method may be redefined by a derived class to define additional
+processing at the end of the input, but the redefined version should always call
+the :class:`HTMLParser` base class method :meth:`close`.
+.. method:: HTMLParser.getpos()
+Return current line number and offset.
+.. method:: HTMLParser.get_starttag_text()
+Return the text of the most recently opened start tag.  This should not normally
+be needed for structured processing, but may be useful in dealing with HTML "as
+deployed" or for re-generating input with minimal changes (whitespace between
+attributes can be preserved, etc.).
+.. method:: HTMLParser.handle_starttag(tag, attrs)
+This method is called to handle the start of a tag.  It is intended to be
+overridden by a derived class; the base class implementation does nothing.
+The *tag* argument is the name of the tag converted to lower case. The *attrs*
+argument is a list of ``(name, value)`` pairs containing the attributes found
+inside the tag's ``<>`` brackets.  The *name* will be translated to lower case,
+and quotes in the *value* have been removed, and character and entity references
+have been replaced.  For instance, for the tag ``<A
+HREF="http://www.cwi.nl/">``, this method would be called as
+``handle_starttag('a', [('href', 'http://www.cwi.nl/')])``.
+.. versionchanged:: 2.6
+All entity references from :mod:`htmlentitydefs` are now replaced in the attribute
+values.
+.. method:: HTMLParser.handle_startendtag(tag, attrs)
+Similar to :meth:`handle_starttag`, but called when the parser encounters an
+XHTML-style empty tag (``<a .../>``).  This method may be overridden by
+subclasses which require this particular lexical information; the default
+implementation simple calls :meth:`handle_starttag` and :meth:`handle_endtag`.
+.. method:: HTMLParser.handle_endtag(tag)
+This method is called to handle the end tag of an element.  It is intended to be
+overridden by a derived class; the base class implementation does nothing.  The
+*tag* argument is the name of the tag converted to lower case.
+.. method:: HTMLParser.handle_data(data)
+This method is called to process arbitrary data.  It is intended to be
+overridden by a derived class; the base class implementation does nothing.
+.. method:: HTMLParser.handle_charref(name)
+This method is called to process a character reference of the form ``&#ref;``.
+It is intended to be overridden by a derived class; the base class
+implementation does nothing.
+.. method:: HTMLParser.handle_entityref(name)
+This method is called to process a general entity reference of the form
+``&name;`` where *name* is an general entity reference.  It is intended to be
+overridden by a derived class; the base class implementation does nothing.
+.. method:: HTMLParser.handle_comment(data)
+This method is called when a comment is encountered.  The *comment* argument is
+a string containing the text between the ``--`` and ``--`` delimiters, but not
+the delimiters themselves.  For example, the comment ``<!--text-->`` will cause
+this method to be called with the argument ``'text'``.  It is intended to be
+overridden by a derived class; the base class implementation does nothing.
+.. method:: HTMLParser.handle_decl(decl)
+Method called when an SGML declaration is read by the parser.  The *decl*
+parameter will be the entire contents of the declaration inside the ``<!``...\
+``>`` markup.  It is intended to be overridden by a derived class; the base
+class implementation does nothing.
+.. method:: HTMLParser.handle_pi(data)
+Method called when a processing instruction is encountered.  The *data*
+parameter will contain the entire processing instruction. For example, for the
+processing instruction ``<?proc color='red'>``, this method would be called as
+``handle_pi("proc color='red'")``.  It is intended to be overridden by a derived
+class; the base class implementation does nothing.
+.. note::
+The :class:`HTMLParser` class uses the SGML syntactic rules for processing
+instructions.  An XHTML processing instruction using the trailing ``'?'`` will
+cause the ``'?'`` to be included in *data*.
+.. _htmlparser-example:
+Example HTML Parser Application
+-------------------------------
+As a basic example, below is a very basic HTML parser that uses the
+:class:`HTMLParser` class to print out tags as they are encountered::
+from HTMLParser import HTMLParser
+class MyHTMLParser(HTMLParser):
+def handle_starttag(self, tag, attrs):
+print "Encountered the beginning of a %s tag" % tag
+def handle_endtag(self, tag):
+print "Encountered the end of a %s tag" % tag