|
1 |
|
2 :mod:`xml.sax.xmlreader` --- Interface for XML parsers |
|
3 ====================================================== |
|
4 |
|
5 .. module:: xml.sax.xmlreader |
|
6 :synopsis: Interface which SAX-compliant XML parsers must implement. |
|
7 .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
|
8 .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
|
9 |
|
10 |
|
11 .. versionadded:: 2.0 |
|
12 |
|
13 SAX parsers implement the :class:`XMLReader` interface. They are implemented in |
|
14 a Python module, which must provide a function :func:`create_parser`. This |
|
15 function is invoked by :func:`xml.sax.make_parser` with no arguments to create |
|
16 a new parser object. |
|
17 |
|
18 |
|
19 .. class:: XMLReader() |
|
20 |
|
21 Base class which can be inherited by SAX parsers. |
|
22 |
|
23 |
|
24 .. class:: IncrementalParser() |
|
25 |
|
26 In some cases, it is desirable not to parse an input source at once, but to feed |
|
27 chunks of the document as they get available. Note that the reader will normally |
|
28 not read the entire file, but read it in chunks as well; still :meth:`parse` |
|
29 won't return until the entire document is processed. So these interfaces should |
|
30 be used if the blocking behaviour of :meth:`parse` is not desirable. |
|
31 |
|
32 When the parser is instantiated it is ready to begin accepting data from the |
|
33 feed method immediately. After parsing has been finished with a call to close |
|
34 the reset method must be called to make the parser ready to accept new data, |
|
35 either from feed or using the parse method. |
|
36 |
|
37 Note that these methods must *not* be called during parsing, that is, after |
|
38 parse has been called and before it returns. |
|
39 |
|
40 By default, the class also implements the parse method of the XMLReader |
|
41 interface using the feed, close and reset methods of the IncrementalParser |
|
42 interface as a convenience to SAX 2.0 driver writers. |
|
43 |
|
44 |
|
45 .. class:: Locator() |
|
46 |
|
47 Interface for associating a SAX event with a document location. A locator object |
|
48 will return valid results only during calls to DocumentHandler methods; at any |
|
49 other time, the results are unpredictable. If information is not available, |
|
50 methods may return ``None``. |
|
51 |
|
52 |
|
53 .. class:: InputSource([systemId]) |
|
54 |
|
55 Encapsulation of the information needed by the :class:`XMLReader` to read |
|
56 entities. |
|
57 |
|
58 This class may include information about the public identifier, system |
|
59 identifier, byte stream (possibly with character encoding information) and/or |
|
60 the character stream of an entity. |
|
61 |
|
62 Applications will create objects of this class for use in the |
|
63 :meth:`XMLReader.parse` method and for returning from |
|
64 EntityResolver.resolveEntity. |
|
65 |
|
66 An :class:`InputSource` belongs to the application, the :class:`XMLReader` is |
|
67 not allowed to modify :class:`InputSource` objects passed to it from the |
|
68 application, although it may make copies and modify those. |
|
69 |
|
70 |
|
71 .. class:: AttributesImpl(attrs) |
|
72 |
|
73 This is an implementation of the :class:`Attributes` interface (see section |
|
74 :ref:`attributes-objects`). This is a dictionary-like object which |
|
75 represents the element attributes in a :meth:`startElement` call. In addition |
|
76 to the most useful dictionary operations, it supports a number of other |
|
77 methods as described by the interface. Objects of this class should be |
|
78 instantiated by readers; *attrs* must be a dictionary-like object containing |
|
79 a mapping from attribute names to attribute values. |
|
80 |
|
81 |
|
82 .. class:: AttributesNSImpl(attrs, qnames) |
|
83 |
|
84 Namespace-aware variant of :class:`AttributesImpl`, which will be passed to |
|
85 :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but |
|
86 understands attribute names as two-tuples of *namespaceURI* and |
|
87 *localname*. In addition, it provides a number of methods expecting qualified |
|
88 names as they appear in the original document. This class implements the |
|
89 :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`). |
|
90 |
|
91 |
|
92 .. _xmlreader-objects: |
|
93 |
|
94 XMLReader Objects |
|
95 ----------------- |
|
96 |
|
97 The :class:`XMLReader` interface supports the following methods: |
|
98 |
|
99 |
|
100 .. method:: XMLReader.parse(source) |
|
101 |
|
102 Process an input source, producing SAX events. The *source* object can be a |
|
103 system identifier (a string identifying the input source -- typically a file |
|
104 name or an URL), a file-like object, or an :class:`InputSource` object. When |
|
105 :meth:`parse` returns, the input is completely processed, and the parser object |
|
106 can be discarded or reset. As a limitation, the current implementation only |
|
107 accepts byte streams; processing of character streams is for further study. |
|
108 |
|
109 |
|
110 .. method:: XMLReader.getContentHandler() |
|
111 |
|
112 Return the current :class:`ContentHandler`. |
|
113 |
|
114 |
|
115 .. method:: XMLReader.setContentHandler(handler) |
|
116 |
|
117 Set the current :class:`ContentHandler`. If no :class:`ContentHandler` is set, |
|
118 content events will be discarded. |
|
119 |
|
120 |
|
121 .. method:: XMLReader.getDTDHandler() |
|
122 |
|
123 Return the current :class:`DTDHandler`. |
|
124 |
|
125 |
|
126 .. method:: XMLReader.setDTDHandler(handler) |
|
127 |
|
128 Set the current :class:`DTDHandler`. If no :class:`DTDHandler` is set, DTD |
|
129 events will be discarded. |
|
130 |
|
131 |
|
132 .. method:: XMLReader.getEntityResolver() |
|
133 |
|
134 Return the current :class:`EntityResolver`. |
|
135 |
|
136 |
|
137 .. method:: XMLReader.setEntityResolver(handler) |
|
138 |
|
139 Set the current :class:`EntityResolver`. If no :class:`EntityResolver` is set, |
|
140 attempts to resolve an external entity will result in opening the system |
|
141 identifier for the entity, and fail if it is not available. |
|
142 |
|
143 |
|
144 .. method:: XMLReader.getErrorHandler() |
|
145 |
|
146 Return the current :class:`ErrorHandler`. |
|
147 |
|
148 |
|
149 .. method:: XMLReader.setErrorHandler(handler) |
|
150 |
|
151 Set the current error handler. If no :class:`ErrorHandler` is set, errors will |
|
152 be raised as exceptions, and warnings will be printed. |
|
153 |
|
154 |
|
155 .. method:: XMLReader.setLocale(locale) |
|
156 |
|
157 Allow an application to set the locale for errors and warnings. |
|
158 |
|
159 SAX parsers are not required to provide localization for errors and warnings; if |
|
160 they cannot support the requested locale, however, they must throw a SAX |
|
161 exception. Applications may request a locale change in the middle of a parse. |
|
162 |
|
163 |
|
164 .. method:: XMLReader.getFeature(featurename) |
|
165 |
|
166 Return the current setting for feature *featurename*. If the feature is not |
|
167 recognized, :exc:`SAXNotRecognizedException` is raised. The well-known |
|
168 featurenames are listed in the module :mod:`xml.sax.handler`. |
|
169 |
|
170 |
|
171 .. method:: XMLReader.setFeature(featurename, value) |
|
172 |
|
173 Set the *featurename* to *value*. If the feature is not recognized, |
|
174 :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not |
|
175 supported by the parser, *SAXNotSupportedException* is raised. |
|
176 |
|
177 |
|
178 .. method:: XMLReader.getProperty(propertyname) |
|
179 |
|
180 Return the current setting for property *propertyname*. If the property is not |
|
181 recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known |
|
182 propertynames are listed in the module :mod:`xml.sax.handler`. |
|
183 |
|
184 |
|
185 .. method:: XMLReader.setProperty(propertyname, value) |
|
186 |
|
187 Set the *propertyname* to *value*. If the property is not recognized, |
|
188 :exc:`SAXNotRecognizedException` is raised. If the property or its setting is |
|
189 not supported by the parser, *SAXNotSupportedException* is raised. |
|
190 |
|
191 |
|
192 .. _incremental-parser-objects: |
|
193 |
|
194 IncrementalParser Objects |
|
195 ------------------------- |
|
196 |
|
197 Instances of :class:`IncrementalParser` offer the following additional methods: |
|
198 |
|
199 |
|
200 .. method:: IncrementalParser.feed(data) |
|
201 |
|
202 Process a chunk of *data*. |
|
203 |
|
204 |
|
205 .. method:: IncrementalParser.close() |
|
206 |
|
207 Assume the end of the document. That will check well-formedness conditions that |
|
208 can be checked only at the end, invoke handlers, and may clean up resources |
|
209 allocated during parsing. |
|
210 |
|
211 |
|
212 .. method:: IncrementalParser.reset() |
|
213 |
|
214 This method is called after close has been called to reset the parser so that it |
|
215 is ready to parse new documents. The results of calling parse or feed after |
|
216 close without calling reset are undefined. |
|
217 |
|
218 |
|
219 .. _locator-objects: |
|
220 |
|
221 Locator Objects |
|
222 --------------- |
|
223 |
|
224 Instances of :class:`Locator` provide these methods: |
|
225 |
|
226 |
|
227 .. method:: Locator.getColumnNumber() |
|
228 |
|
229 Return the column number where the current event ends. |
|
230 |
|
231 |
|
232 .. method:: Locator.getLineNumber() |
|
233 |
|
234 Return the line number where the current event ends. |
|
235 |
|
236 |
|
237 .. method:: Locator.getPublicId() |
|
238 |
|
239 Return the public identifier for the current event. |
|
240 |
|
241 |
|
242 .. method:: Locator.getSystemId() |
|
243 |
|
244 Return the system identifier for the current event. |
|
245 |
|
246 |
|
247 .. _input-source-objects: |
|
248 |
|
249 InputSource Objects |
|
250 ------------------- |
|
251 |
|
252 |
|
253 .. method:: InputSource.setPublicId(id) |
|
254 |
|
255 Sets the public identifier of this :class:`InputSource`. |
|
256 |
|
257 |
|
258 .. method:: InputSource.getPublicId() |
|
259 |
|
260 Returns the public identifier of this :class:`InputSource`. |
|
261 |
|
262 |
|
263 .. method:: InputSource.setSystemId(id) |
|
264 |
|
265 Sets the system identifier of this :class:`InputSource`. |
|
266 |
|
267 |
|
268 .. method:: InputSource.getSystemId() |
|
269 |
|
270 Returns the system identifier of this :class:`InputSource`. |
|
271 |
|
272 |
|
273 .. method:: InputSource.setEncoding(encoding) |
|
274 |
|
275 Sets the character encoding of this :class:`InputSource`. |
|
276 |
|
277 The encoding must be a string acceptable for an XML encoding declaration (see |
|
278 section 4.3.3 of the XML recommendation). |
|
279 |
|
280 The encoding attribute of the :class:`InputSource` is ignored if the |
|
281 :class:`InputSource` also contains a character stream. |
|
282 |
|
283 |
|
284 .. method:: InputSource.getEncoding() |
|
285 |
|
286 Get the character encoding of this InputSource. |
|
287 |
|
288 |
|
289 .. method:: InputSource.setByteStream(bytefile) |
|
290 |
|
291 Set the byte stream (a Python file-like object which does not perform |
|
292 byte-to-character conversion) for this input source. |
|
293 |
|
294 The SAX parser will ignore this if there is also a character stream specified, |
|
295 but it will use a byte stream in preference to opening a URI connection itself. |
|
296 |
|
297 If the application knows the character encoding of the byte stream, it should |
|
298 set it with the setEncoding method. |
|
299 |
|
300 |
|
301 .. method:: InputSource.getByteStream() |
|
302 |
|
303 Get the byte stream for this input source. |
|
304 |
|
305 The getEncoding method will return the character encoding for this byte stream, |
|
306 or None if unknown. |
|
307 |
|
308 |
|
309 .. method:: InputSource.setCharacterStream(charfile) |
|
310 |
|
311 Set the character stream for this input source. (The stream must be a Python 1.6 |
|
312 Unicode-wrapped file-like that performs conversion to Unicode strings.) |
|
313 |
|
314 If there is a character stream specified, the SAX parser will ignore any byte |
|
315 stream and will not attempt to open a URI connection to the system identifier. |
|
316 |
|
317 |
|
318 .. method:: InputSource.getCharacterStream() |
|
319 |
|
320 Get the character stream for this input source. |
|
321 |
|
322 |
|
323 .. _attributes-objects: |
|
324 |
|
325 The :class:`Attributes` Interface |
|
326 --------------------------------- |
|
327 |
|
328 :class:`Attributes` objects implement a portion of the mapping protocol, |
|
329 including the methods :meth:`copy`, :meth:`get`, :meth:`has_key`, :meth:`items`, |
|
330 :meth:`keys`, and :meth:`values`. The following methods are also provided: |
|
331 |
|
332 |
|
333 .. method:: Attributes.getLength() |
|
334 |
|
335 Return the number of attributes. |
|
336 |
|
337 |
|
338 .. method:: Attributes.getNames() |
|
339 |
|
340 Return the names of the attributes. |
|
341 |
|
342 |
|
343 .. method:: Attributes.getType(name) |
|
344 |
|
345 Returns the type of the attribute *name*, which is normally ``'CDATA'``. |
|
346 |
|
347 |
|
348 .. method:: Attributes.getValue(name) |
|
349 |
|
350 Return the value of attribute *name*. |
|
351 |
|
352 .. getValueByQName, getNameByQName, getQNameByName, getQNames available |
|
353 .. here already, but documented only for derived class. |
|
354 |
|
355 |
|
356 .. _attributes-ns-objects: |
|
357 |
|
358 The :class:`AttributesNS` Interface |
|
359 ----------------------------------- |
|
360 |
|
361 This interface is a subtype of the :class:`Attributes` interface (see section |
|
362 :ref:`attributes-objects`). All methods supported by that interface are also |
|
363 available on :class:`AttributesNS` objects. |
|
364 |
|
365 The following methods are also available: |
|
366 |
|
367 |
|
368 .. method:: AttributesNS.getValueByQName(name) |
|
369 |
|
370 Return the value for a qualified name. |
|
371 |
|
372 |
|
373 .. method:: AttributesNS.getNameByQName(name) |
|
374 |
|
375 Return the ``(namespace, localname)`` pair for a qualified *name*. |
|
376 |
|
377 |
|
378 .. method:: AttributesNS.getQNameByName(name) |
|
379 |
|
380 Return the qualified name for a ``(namespace, localname)`` pair. |
|
381 |
|
382 |
|
383 .. method:: AttributesNS.getQNames() |
|
384 |
|
385 Return the qualified names of all attributes. |
|
386 |