|
1 |
|
2 :mod:`xml.sax.handler` --- Base classes for SAX handlers |
|
3 ======================================================== |
|
4 |
|
5 .. module:: xml.sax.handler |
|
6 :synopsis: Base classes for SAX event handlers. |
|
7 .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
|
8 .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
|
9 |
|
10 |
|
11 .. versionadded:: 2.0 |
|
12 |
|
13 The SAX API defines four kinds of handlers: content handlers, DTD handlers, |
|
14 error handlers, and entity resolvers. Applications normally only need to |
|
15 implement those interfaces whose events they are interested in; they can |
|
16 implement the interfaces in a single object or in multiple objects. Handler |
|
17 implementations should inherit from the base classes provided in the module |
|
18 :mod:`xml.sax.handler`, so that all methods get default implementations. |
|
19 |
|
20 |
|
21 .. class:: ContentHandler |
|
22 |
|
23 This is the main callback interface in SAX, and the one most important to |
|
24 applications. The order of events in this interface mirrors the order of the |
|
25 information in the document. |
|
26 |
|
27 |
|
28 .. class:: DTDHandler |
|
29 |
|
30 Handle DTD events. |
|
31 |
|
32 This interface specifies only those DTD events required for basic parsing |
|
33 (unparsed entities and attributes). |
|
34 |
|
35 |
|
36 .. class:: EntityResolver |
|
37 |
|
38 Basic interface for resolving entities. If you create an object implementing |
|
39 this interface, then register the object with your Parser, the parser will call |
|
40 the method in your object to resolve all external entities. |
|
41 |
|
42 |
|
43 .. class:: ErrorHandler |
|
44 |
|
45 Interface used by the parser to present error and warning messages to the |
|
46 application. The methods of this object control whether errors are immediately |
|
47 converted to exceptions or are handled in some other way. |
|
48 |
|
49 In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants |
|
50 for the feature and property names. |
|
51 |
|
52 |
|
53 .. data:: feature_namespaces |
|
54 |
|
55 Value: ``"http://xml.org/sax/features/namespaces"`` --- true: Perform Namespace |
|
56 processing. --- false: Optionally do not perform Namespace processing (implies |
|
57 namespace-prefixes; default). --- access: (parsing) read-only; (not parsing) |
|
58 read/write |
|
59 |
|
60 |
|
61 .. data:: feature_namespace_prefixes |
|
62 |
|
63 Value: ``"http://xml.org/sax/features/namespace-prefixes"`` --- true: Report |
|
64 the original prefixed names and attributes used for Namespace |
|
65 declarations. --- false: Do not report attributes used for Namespace |
|
66 declarations, and optionally do not report original prefixed names |
|
67 (default). --- access: (parsing) read-only; (not parsing) read/write |
|
68 |
|
69 |
|
70 .. data:: feature_string_interning |
|
71 |
|
72 Value: ``"http://xml.org/sax/features/string-interning"`` --- true: All element |
|
73 names, prefixes, attribute names, Namespace URIs, and local names are interned |
|
74 using the built-in intern function. --- false: Names are not necessarily |
|
75 interned, although they may be (default). --- access: (parsing) read-only; (not |
|
76 parsing) read/write |
|
77 |
|
78 |
|
79 .. data:: feature_validation |
|
80 |
|
81 Value: ``"http://xml.org/sax/features/validation"`` --- true: Report all |
|
82 validation errors (implies external-general-entities and |
|
83 external-parameter-entities). --- false: Do not report validation errors. --- |
|
84 access: (parsing) read-only; (not parsing) read/write |
|
85 |
|
86 |
|
87 .. data:: feature_external_ges |
|
88 |
|
89 Value: ``"http://xml.org/sax/features/external-general-entities"`` --- true: |
|
90 Include all external general (text) entities. --- false: Do not include |
|
91 external general entities. --- access: (parsing) read-only; (not parsing) |
|
92 read/write |
|
93 |
|
94 |
|
95 .. data:: feature_external_pes |
|
96 |
|
97 Value: ``"http://xml.org/sax/features/external-parameter-entities"`` --- true: |
|
98 Include all external parameter entities, including the external DTD subset. --- |
|
99 false: Do not include any external parameter entities, even the external DTD |
|
100 subset. --- access: (parsing) read-only; (not parsing) read/write |
|
101 |
|
102 |
|
103 .. data:: all_features |
|
104 |
|
105 List of all features. |
|
106 |
|
107 |
|
108 .. data:: property_lexical_handler |
|
109 |
|
110 Value: ``"http://xml.org/sax/properties/lexical-handler"`` --- data type: |
|
111 xml.sax.sax2lib.LexicalHandler (not supported in Python 2) --- description: An |
|
112 optional extension handler for lexical events like comments. --- access: |
|
113 read/write |
|
114 |
|
115 |
|
116 .. data:: property_declaration_handler |
|
117 |
|
118 Value: ``"http://xml.org/sax/properties/declaration-handler"`` --- data type: |
|
119 xml.sax.sax2lib.DeclHandler (not supported in Python 2) --- description: An |
|
120 optional extension handler for DTD-related events other than notations and |
|
121 unparsed entities. --- access: read/write |
|
122 |
|
123 |
|
124 .. data:: property_dom_node |
|
125 |
|
126 Value: ``"http://xml.org/sax/properties/dom-node"`` --- data type: |
|
127 org.w3c.dom.Node (not supported in Python 2) --- description: When parsing, |
|
128 the current DOM node being visited if this is a DOM iterator; when not parsing, |
|
129 the root DOM node for iteration. --- access: (parsing) read-only; (not parsing) |
|
130 read/write |
|
131 |
|
132 |
|
133 .. data:: property_xml_string |
|
134 |
|
135 Value: ``"http://xml.org/sax/properties/xml-string"`` --- data type: String --- |
|
136 description: The literal string of characters that was the source for the |
|
137 current event. --- access: read-only |
|
138 |
|
139 |
|
140 .. data:: all_properties |
|
141 |
|
142 List of all known property names. |
|
143 |
|
144 |
|
145 .. _content-handler-objects: |
|
146 |
|
147 ContentHandler Objects |
|
148 ---------------------- |
|
149 |
|
150 Users are expected to subclass :class:`ContentHandler` to support their |
|
151 application. The following methods are called by the parser on the appropriate |
|
152 events in the input document: |
|
153 |
|
154 |
|
155 .. method:: ContentHandler.setDocumentLocator(locator) |
|
156 |
|
157 Called by the parser to give the application a locator for locating the origin |
|
158 of document events. |
|
159 |
|
160 SAX parsers are strongly encouraged (though not absolutely required) to supply a |
|
161 locator: if it does so, it must supply the locator to the application by |
|
162 invoking this method before invoking any of the other methods in the |
|
163 DocumentHandler interface. |
|
164 |
|
165 The locator allows the application to determine the end position of any |
|
166 document-related event, even if the parser is not reporting an error. Typically, |
|
167 the application will use this information for reporting its own errors (such as |
|
168 character content that does not match an application's business rules). The |
|
169 information returned by the locator is probably not sufficient for use with a |
|
170 search engine. |
|
171 |
|
172 Note that the locator will return correct information only during the invocation |
|
173 of the events in this interface. The application should not attempt to use it at |
|
174 any other time. |
|
175 |
|
176 |
|
177 .. method:: ContentHandler.startDocument() |
|
178 |
|
179 Receive notification of the beginning of a document. |
|
180 |
|
181 The SAX parser will invoke this method only once, before any other methods in |
|
182 this interface or in DTDHandler (except for :meth:`setDocumentLocator`). |
|
183 |
|
184 |
|
185 .. method:: ContentHandler.endDocument() |
|
186 |
|
187 Receive notification of the end of a document. |
|
188 |
|
189 The SAX parser will invoke this method only once, and it will be the last method |
|
190 invoked during the parse. The parser shall not invoke this method until it has |
|
191 either abandoned parsing (because of an unrecoverable error) or reached the end |
|
192 of input. |
|
193 |
|
194 |
|
195 .. method:: ContentHandler.startPrefixMapping(prefix, uri) |
|
196 |
|
197 Begin the scope of a prefix-URI Namespace mapping. |
|
198 |
|
199 The information from this event is not necessary for normal Namespace |
|
200 processing: the SAX XML reader will automatically replace prefixes for element |
|
201 and attribute names when the ``feature_namespaces`` feature is enabled (the |
|
202 default). |
|
203 |
|
204 There are cases, however, when applications need to use prefixes in character |
|
205 data or in attribute values, where they cannot safely be expanded automatically; |
|
206 the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the |
|
207 information to the application to expand prefixes in those contexts itself, if |
|
208 necessary. |
|
209 |
|
210 .. XXX This is not really the default, is it? MvL |
|
211 |
|
212 Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not |
|
213 guaranteed to be properly nested relative to each-other: all |
|
214 :meth:`startPrefixMapping` events will occur before the corresponding |
|
215 :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur |
|
216 after the corresponding :meth:`endElement` event, but their order is not |
|
217 guaranteed. |
|
218 |
|
219 |
|
220 .. method:: ContentHandler.endPrefixMapping(prefix) |
|
221 |
|
222 End the scope of a prefix-URI mapping. |
|
223 |
|
224 See :meth:`startPrefixMapping` for details. This event will always occur after |
|
225 the corresponding :meth:`endElement` event, but the order of |
|
226 :meth:`endPrefixMapping` events is not otherwise guaranteed. |
|
227 |
|
228 |
|
229 .. method:: ContentHandler.startElement(name, attrs) |
|
230 |
|
231 Signals the start of an element in non-namespace mode. |
|
232 |
|
233 The *name* parameter contains the raw XML 1.0 name of the element type as a |
|
234 string and the *attrs* parameter holds an object of the :class:`Attributes` |
|
235 interface (see :ref:`attributes-objects`) containing the attributes of |
|
236 the element. The object passed as *attrs* may be re-used by the parser; holding |
|
237 on to a reference to it is not a reliable way to keep a copy of the attributes. |
|
238 To keep a copy of the attributes, use the :meth:`copy` method of the *attrs* |
|
239 object. |
|
240 |
|
241 |
|
242 .. method:: ContentHandler.endElement(name) |
|
243 |
|
244 Signals the end of an element in non-namespace mode. |
|
245 |
|
246 The *name* parameter contains the name of the element type, just as with the |
|
247 :meth:`startElement` event. |
|
248 |
|
249 |
|
250 .. method:: ContentHandler.startElementNS(name, qname, attrs) |
|
251 |
|
252 Signals the start of an element in namespace mode. |
|
253 |
|
254 The *name* parameter contains the name of the element type as a ``(uri, |
|
255 localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in |
|
256 the source document, and the *attrs* parameter holds an instance of the |
|
257 :class:`AttributesNS` interface (see :ref:`attributes-ns-objects`) |
|
258 containing the attributes of the element. If no namespace is associated with |
|
259 the element, the *uri* component of *name* will be ``None``. The object passed |
|
260 as *attrs* may be re-used by the parser; holding on to a reference to it is not |
|
261 a reliable way to keep a copy of the attributes. To keep a copy of the |
|
262 attributes, use the :meth:`copy` method of the *attrs* object. |
|
263 |
|
264 Parsers may set the *qname* parameter to ``None``, unless the |
|
265 ``feature_namespace_prefixes`` feature is activated. |
|
266 |
|
267 |
|
268 .. method:: ContentHandler.endElementNS(name, qname) |
|
269 |
|
270 Signals the end of an element in namespace mode. |
|
271 |
|
272 The *name* parameter contains the name of the element type, just as with the |
|
273 :meth:`startElementNS` method, likewise the *qname* parameter. |
|
274 |
|
275 |
|
276 .. method:: ContentHandler.characters(content) |
|
277 |
|
278 Receive notification of character data. |
|
279 |
|
280 The Parser will call this method to report each chunk of character data. SAX |
|
281 parsers may return all contiguous character data in a single chunk, or they may |
|
282 split it into several chunks; however, all of the characters in any single event |
|
283 must come from the same external entity so that the Locator provides useful |
|
284 information. |
|
285 |
|
286 *content* may be a Unicode string or a byte string; the ``expat`` reader module |
|
287 produces always Unicode strings. |
|
288 |
|
289 .. note:: |
|
290 |
|
291 The earlier SAX 1 interface provided by the Python XML Special Interest Group |
|
292 used a more Java-like interface for this method. Since most parsers used from |
|
293 Python did not take advantage of the older interface, the simpler signature was |
|
294 chosen to replace it. To convert old code to the new interface, use *content* |
|
295 instead of slicing content with the old *offset* and *length* parameters. |
|
296 |
|
297 |
|
298 .. method:: ContentHandler.ignorableWhitespace(whitespace) |
|
299 |
|
300 Receive notification of ignorable whitespace in element content. |
|
301 |
|
302 Validating Parsers must use this method to report each chunk of ignorable |
|
303 whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating |
|
304 parsers may also use this method if they are capable of parsing and using |
|
305 content models. |
|
306 |
|
307 SAX parsers may return all contiguous whitespace in a single chunk, or they may |
|
308 split it into several chunks; however, all of the characters in any single event |
|
309 must come from the same external entity, so that the Locator provides useful |
|
310 information. |
|
311 |
|
312 |
|
313 .. method:: ContentHandler.processingInstruction(target, data) |
|
314 |
|
315 Receive notification of a processing instruction. |
|
316 |
|
317 The Parser will invoke this method once for each processing instruction found: |
|
318 note that processing instructions may occur before or after the main document |
|
319 element. |
|
320 |
|
321 A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a |
|
322 text declaration (XML 1.0, section 4.3.1) using this method. |
|
323 |
|
324 |
|
325 .. method:: ContentHandler.skippedEntity(name) |
|
326 |
|
327 Receive notification of a skipped entity. |
|
328 |
|
329 The Parser will invoke this method once for each entity skipped. Non-validating |
|
330 processors may skip entities if they have not seen the declarations (because, |
|
331 for example, the entity was declared in an external DTD subset). All processors |
|
332 may skip external entities, depending on the values of the |
|
333 ``feature_external_ges`` and the ``feature_external_pes`` properties. |
|
334 |
|
335 |
|
336 .. _dtd-handler-objects: |
|
337 |
|
338 DTDHandler Objects |
|
339 ------------------ |
|
340 |
|
341 :class:`DTDHandler` instances provide the following methods: |
|
342 |
|
343 |
|
344 .. method:: DTDHandler.notationDecl(name, publicId, systemId) |
|
345 |
|
346 Handle a notation declaration event. |
|
347 |
|
348 |
|
349 .. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata) |
|
350 |
|
351 Handle an unparsed entity declaration event. |
|
352 |
|
353 |
|
354 .. _entity-resolver-objects: |
|
355 |
|
356 EntityResolver Objects |
|
357 ---------------------- |
|
358 |
|
359 |
|
360 .. method:: EntityResolver.resolveEntity(publicId, systemId) |
|
361 |
|
362 Resolve the system identifier of an entity and return either the system |
|
363 identifier to read from as a string, or an InputSource to read from. The default |
|
364 implementation returns *systemId*. |
|
365 |
|
366 |
|
367 .. _sax-error-handler: |
|
368 |
|
369 ErrorHandler Objects |
|
370 -------------------- |
|
371 |
|
372 Objects with this interface are used to receive error and warning information |
|
373 from the :class:`XMLReader`. If you create an object that implements this |
|
374 interface, then register the object with your :class:`XMLReader`, the parser |
|
375 will call the methods in your object to report all warnings and errors. There |
|
376 are three levels of errors available: warnings, (possibly) recoverable errors, |
|
377 and unrecoverable errors. All methods take a :exc:`SAXParseException` as the |
|
378 only parameter. Errors and warnings may be converted to an exception by raising |
|
379 the passed-in exception object. |
|
380 |
|
381 |
|
382 .. method:: ErrorHandler.error(exception) |
|
383 |
|
384 Called when the parser encounters a recoverable error. If this method does not |
|
385 raise an exception, parsing may continue, but further document information |
|
386 should not be expected by the application. Allowing the parser to continue may |
|
387 allow additional errors to be discovered in the input document. |
|
388 |
|
389 |
|
390 .. method:: ErrorHandler.fatalError(exception) |
|
391 |
|
392 Called when the parser encounters an error it cannot recover from; parsing is |
|
393 expected to terminate when this method returns. |
|
394 |
|
395 |
|
396 .. method:: ErrorHandler.warning(exception) |
|
397 |
|
398 Called when the parser presents minor warning information to the application. |
|
399 Parsing is expected to continue when this method returns, and document |
|
400 information will continue to be passed to the application. Raising an exception |
|
401 in this method will cause parsing to end. |
|
402 |