|
1 |
|
2 :mod:`xml.parsers.expat` --- Fast XML parsing using Expat |
|
3 ========================================================= |
|
4 |
|
5 .. module:: xml.parsers.expat |
|
6 :synopsis: An interface to the Expat non-validating XML parser. |
|
7 .. moduleauthor:: Paul Prescod <paul@prescod.net> |
|
8 |
|
9 |
|
10 .. Markup notes: |
|
11 |
|
12 Many of the attributes of the XMLParser objects are callbacks. Since |
|
13 signature information must be presented, these are described using the method |
|
14 directive. Since they are attributes which are set by client code, in-text |
|
15 references to these attributes should be marked using the :member: role. |
|
16 |
|
17 .. versionadded:: 2.0 |
|
18 |
|
19 .. index:: single: Expat |
|
20 |
|
21 The :mod:`xml.parsers.expat` module is a Python interface to the Expat |
|
22 non-validating XML parser. The module provides a single extension type, |
|
23 :class:`xmlparser`, that represents the current state of an XML parser. After |
|
24 an :class:`xmlparser` object has been created, various attributes of the object |
|
25 can be set to handler functions. When an XML document is then fed to the |
|
26 parser, the handler functions are called for the character data and markup in |
|
27 the XML document. |
|
28 |
|
29 .. index:: module: pyexpat |
|
30 |
|
31 This module uses the :mod:`pyexpat` module to provide access to the Expat |
|
32 parser. Direct use of the :mod:`pyexpat` module is deprecated. |
|
33 |
|
34 This module provides one exception and one type object: |
|
35 |
|
36 |
|
37 .. exception:: ExpatError |
|
38 |
|
39 The exception raised when Expat reports an error. See section |
|
40 :ref:`expaterror-objects` for more information on interpreting Expat errors. |
|
41 |
|
42 |
|
43 .. exception:: error |
|
44 |
|
45 Alias for :exc:`ExpatError`. |
|
46 |
|
47 |
|
48 .. data:: XMLParserType |
|
49 |
|
50 The type of the return values from the :func:`ParserCreate` function. |
|
51 |
|
52 The :mod:`xml.parsers.expat` module contains two functions: |
|
53 |
|
54 |
|
55 .. function:: ErrorString(errno) |
|
56 |
|
57 Returns an explanatory string for a given error number *errno*. |
|
58 |
|
59 |
|
60 .. function:: ParserCreate([encoding[, namespace_separator]]) |
|
61 |
|
62 Creates and returns a new :class:`xmlparser` object. *encoding*, if specified, |
|
63 must be a string naming the encoding used by the XML data. Expat doesn't |
|
64 support as many encodings as Python does, and its repertoire of encodings can't |
|
65 be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If |
|
66 *encoding* [1]_ is given it will override the implicit or explicit encoding of the |
|
67 document. |
|
68 |
|
69 Expat can optionally do XML namespace processing for you, enabled by providing a |
|
70 value for *namespace_separator*. The value must be a one-character string; a |
|
71 :exc:`ValueError` will be raised if the string has an illegal length (``None`` |
|
72 is considered the same as omission). When namespace processing is enabled, |
|
73 element type names and attribute names that belong to a namespace will be |
|
74 expanded. The element name passed to the element handlers |
|
75 :attr:`StartElementHandler` and :attr:`EndElementHandler` will be the |
|
76 concatenation of the namespace URI, the namespace separator character, and the |
|
77 local part of the name. If the namespace separator is a zero byte (``chr(0)``) |
|
78 then the namespace URI and the local part will be concatenated without any |
|
79 separator. |
|
80 |
|
81 For example, if *namespace_separator* is set to a space character (``' '``) and |
|
82 the following document is parsed:: |
|
83 |
|
84 <?xml version="1.0"?> |
|
85 <root xmlns = "http://default-namespace.org/" |
|
86 xmlns:py = "http://www.python.org/ns/"> |
|
87 <py:elem1 /> |
|
88 <elem2 xmlns="" /> |
|
89 </root> |
|
90 |
|
91 :attr:`StartElementHandler` will receive the following strings for each |
|
92 element:: |
|
93 |
|
94 http://default-namespace.org/ root |
|
95 http://www.python.org/ns/ elem1 |
|
96 elem2 |
|
97 |
|
98 |
|
99 .. seealso:: |
|
100 |
|
101 `The Expat XML Parser <http://www.libexpat.org/>`_ |
|
102 Home page of the Expat project. |
|
103 |
|
104 |
|
105 .. _xmlparser-objects: |
|
106 |
|
107 XMLParser Objects |
|
108 ----------------- |
|
109 |
|
110 :class:`xmlparser` objects have the following methods: |
|
111 |
|
112 |
|
113 .. method:: xmlparser.Parse(data[, isfinal]) |
|
114 |
|
115 Parses the contents of the string *data*, calling the appropriate handler |
|
116 functions to process the parsed data. *isfinal* must be true on the final call |
|
117 to this method. *data* can be the empty string at any time. |
|
118 |
|
119 |
|
120 .. method:: xmlparser.ParseFile(file) |
|
121 |
|
122 Parse XML data reading from the object *file*. *file* only needs to provide |
|
123 the ``read(nbytes)`` method, returning the empty string when there's no more |
|
124 data. |
|
125 |
|
126 |
|
127 .. method:: xmlparser.SetBase(base) |
|
128 |
|
129 Sets the base to be used for resolving relative URIs in system identifiers in |
|
130 declarations. Resolving relative identifiers is left to the application: this |
|
131 value will be passed through as the *base* argument to the |
|
132 :func:`ExternalEntityRefHandler`, :func:`NotationDeclHandler`, and |
|
133 :func:`UnparsedEntityDeclHandler` functions. |
|
134 |
|
135 |
|
136 .. method:: xmlparser.GetBase() |
|
137 |
|
138 Returns a string containing the base set by a previous call to :meth:`SetBase`, |
|
139 or ``None`` if :meth:`SetBase` hasn't been called. |
|
140 |
|
141 |
|
142 .. method:: xmlparser.GetInputContext() |
|
143 |
|
144 Returns the input data that generated the current event as a string. The data is |
|
145 in the encoding of the entity which contains the text. When called while an |
|
146 event handler is not active, the return value is ``None``. |
|
147 |
|
148 .. versionadded:: 2.1 |
|
149 |
|
150 |
|
151 .. method:: xmlparser.ExternalEntityParserCreate(context[, encoding]) |
|
152 |
|
153 Create a "child" parser which can be used to parse an external parsed entity |
|
154 referred to by content parsed by the parent parser. The *context* parameter |
|
155 should be the string passed to the :meth:`ExternalEntityRefHandler` handler |
|
156 function, described below. The child parser is created with the |
|
157 :attr:`ordered_attributes`, :attr:`returns_unicode` and |
|
158 :attr:`specified_attributes` set to the values of this parser. |
|
159 |
|
160 |
|
161 .. method:: xmlparser.UseForeignDTD([flag]) |
|
162 |
|
163 Calling this with a true value for *flag* (the default) will cause Expat to call |
|
164 the :attr:`ExternalEntityRefHandler` with :const:`None` for all arguments to |
|
165 allow an alternate DTD to be loaded. If the document does not contain a |
|
166 document type declaration, the :attr:`ExternalEntityRefHandler` will still be |
|
167 called, but the :attr:`StartDoctypeDeclHandler` and |
|
168 :attr:`EndDoctypeDeclHandler` will not be called. |
|
169 |
|
170 Passing a false value for *flag* will cancel a previous call that passed a true |
|
171 value, but otherwise has no effect. |
|
172 |
|
173 This method can only be called before the :meth:`Parse` or :meth:`ParseFile` |
|
174 methods are called; calling it after either of those have been called causes |
|
175 :exc:`ExpatError` to be raised with the :attr:`code` attribute set to |
|
176 :const:`errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING`. |
|
177 |
|
178 .. versionadded:: 2.3 |
|
179 |
|
180 :class:`xmlparser` objects have the following attributes: |
|
181 |
|
182 |
|
183 .. attribute:: xmlparser.buffer_size |
|
184 |
|
185 The size of the buffer used when :attr:`buffer_text` is true. |
|
186 A new buffer size can be set by assigning a new integer value |
|
187 to this attribute. |
|
188 When the size is changed, the buffer will be flushed. |
|
189 |
|
190 .. versionadded:: 2.3 |
|
191 |
|
192 .. versionchanged:: 2.6 |
|
193 The buffer size can now be changed. |
|
194 |
|
195 .. attribute:: xmlparser.buffer_text |
|
196 |
|
197 Setting this to true causes the :class:`xmlparser` object to buffer textual |
|
198 content returned by Expat to avoid multiple calls to the |
|
199 :meth:`CharacterDataHandler` callback whenever possible. This can improve |
|
200 performance substantially since Expat normally breaks character data into chunks |
|
201 at every line ending. This attribute is false by default, and may be changed at |
|
202 any time. |
|
203 |
|
204 .. versionadded:: 2.3 |
|
205 |
|
206 |
|
207 .. attribute:: xmlparser.buffer_used |
|
208 |
|
209 If :attr:`buffer_text` is enabled, the number of bytes stored in the buffer. |
|
210 These bytes represent UTF-8 encoded text. This attribute has no meaningful |
|
211 interpretation when :attr:`buffer_text` is false. |
|
212 |
|
213 .. versionadded:: 2.3 |
|
214 |
|
215 |
|
216 .. attribute:: xmlparser.ordered_attributes |
|
217 |
|
218 Setting this attribute to a non-zero integer causes the attributes to be |
|
219 reported as a list rather than a dictionary. The attributes are presented in |
|
220 the order found in the document text. For each attribute, two list entries are |
|
221 presented: the attribute name and the attribute value. (Older versions of this |
|
222 module also used this format.) By default, this attribute is false; it may be |
|
223 changed at any time. |
|
224 |
|
225 .. versionadded:: 2.1 |
|
226 |
|
227 |
|
228 .. attribute:: xmlparser.returns_unicode |
|
229 |
|
230 If this attribute is set to a non-zero integer, the handler functions will be |
|
231 passed Unicode strings. If :attr:`returns_unicode` is :const:`False`, 8-bit |
|
232 strings containing UTF-8 encoded data will be passed to the handlers. This is |
|
233 :const:`True` by default when Python is built with Unicode support. |
|
234 |
|
235 .. versionchanged:: 1.6 |
|
236 Can be changed at any time to affect the result type. |
|
237 |
|
238 |
|
239 .. attribute:: xmlparser.specified_attributes |
|
240 |
|
241 If set to a non-zero integer, the parser will report only those attributes which |
|
242 were specified in the document instance and not those which were derived from |
|
243 attribute declarations. Applications which set this need to be especially |
|
244 careful to use what additional information is available from the declarations as |
|
245 needed to comply with the standards for the behavior of XML processors. By |
|
246 default, this attribute is false; it may be changed at any time. |
|
247 |
|
248 .. versionadded:: 2.1 |
|
249 |
|
250 The following attributes contain values relating to the most recent error |
|
251 encountered by an :class:`xmlparser` object, and will only have correct values |
|
252 once a call to :meth:`Parse` or :meth:`ParseFile` has raised a |
|
253 :exc:`xml.parsers.expat.ExpatError` exception. |
|
254 |
|
255 |
|
256 .. attribute:: xmlparser.ErrorByteIndex |
|
257 |
|
258 Byte index at which an error occurred. |
|
259 |
|
260 |
|
261 .. attribute:: xmlparser.ErrorCode |
|
262 |
|
263 Numeric code specifying the problem. This value can be passed to the |
|
264 :func:`ErrorString` function, or compared to one of the constants defined in the |
|
265 ``errors`` object. |
|
266 |
|
267 |
|
268 .. attribute:: xmlparser.ErrorColumnNumber |
|
269 |
|
270 Column number at which an error occurred. |
|
271 |
|
272 |
|
273 .. attribute:: xmlparser.ErrorLineNumber |
|
274 |
|
275 Line number at which an error occurred. |
|
276 |
|
277 The following attributes contain values relating to the current parse location |
|
278 in an :class:`xmlparser` object. During a callback reporting a parse event they |
|
279 indicate the location of the first of the sequence of characters that generated |
|
280 the event. When called outside of a callback, the position indicated will be |
|
281 just past the last parse event (regardless of whether there was an associated |
|
282 callback). |
|
283 |
|
284 .. versionadded:: 2.4 |
|
285 |
|
286 |
|
287 .. attribute:: xmlparser.CurrentByteIndex |
|
288 |
|
289 Current byte index in the parser input. |
|
290 |
|
291 |
|
292 .. attribute:: xmlparser.CurrentColumnNumber |
|
293 |
|
294 Current column number in the parser input. |
|
295 |
|
296 |
|
297 .. attribute:: xmlparser.CurrentLineNumber |
|
298 |
|
299 Current line number in the parser input. |
|
300 |
|
301 Here is the list of handlers that can be set. To set a handler on an |
|
302 :class:`xmlparser` object *o*, use ``o.handlername = func``. *handlername* must |
|
303 be taken from the following list, and *func* must be a callable object accepting |
|
304 the correct number of arguments. The arguments are all strings, unless |
|
305 otherwise stated. |
|
306 |
|
307 |
|
308 .. method:: xmlparser.XmlDeclHandler(version, encoding, standalone) |
|
309 |
|
310 Called when the XML declaration is parsed. The XML declaration is the |
|
311 (optional) declaration of the applicable version of the XML recommendation, the |
|
312 encoding of the document text, and an optional "standalone" declaration. |
|
313 *version* and *encoding* will be strings of the type dictated by the |
|
314 :attr:`returns_unicode` attribute, and *standalone* will be ``1`` if the |
|
315 document is declared standalone, ``0`` if it is declared not to be standalone, |
|
316 or ``-1`` if the standalone clause was omitted. This is only available with |
|
317 Expat version 1.95.0 or newer. |
|
318 |
|
319 .. versionadded:: 2.1 |
|
320 |
|
321 |
|
322 .. method:: xmlparser.StartDoctypeDeclHandler(doctypeName, systemId, publicId, has_internal_subset) |
|
323 |
|
324 Called when Expat begins parsing the document type declaration (``<!DOCTYPE |
|
325 ...``). The *doctypeName* is provided exactly as presented. The *systemId* and |
|
326 *publicId* parameters give the system and public identifiers if specified, or |
|
327 ``None`` if omitted. *has_internal_subset* will be true if the document |
|
328 contains and internal document declaration subset. This requires Expat version |
|
329 1.2 or newer. |
|
330 |
|
331 |
|
332 .. method:: xmlparser.EndDoctypeDeclHandler() |
|
333 |
|
334 Called when Expat is done parsing the document type declaration. This requires |
|
335 Expat version 1.2 or newer. |
|
336 |
|
337 |
|
338 .. method:: xmlparser.ElementDeclHandler(name, model) |
|
339 |
|
340 Called once for each element type declaration. *name* is the name of the |
|
341 element type, and *model* is a representation of the content model. |
|
342 |
|
343 |
|
344 .. method:: xmlparser.AttlistDeclHandler(elname, attname, type, default, required) |
|
345 |
|
346 Called for each declared attribute for an element type. If an attribute list |
|
347 declaration declares three attributes, this handler is called three times, once |
|
348 for each attribute. *elname* is the name of the element to which the |
|
349 declaration applies and *attname* is the name of the attribute declared. The |
|
350 attribute type is a string passed as *type*; the possible values are |
|
351 ``'CDATA'``, ``'ID'``, ``'IDREF'``, ... *default* gives the default value for |
|
352 the attribute used when the attribute is not specified by the document instance, |
|
353 or ``None`` if there is no default value (``#IMPLIED`` values). If the |
|
354 attribute is required to be given in the document instance, *required* will be |
|
355 true. This requires Expat version 1.95.0 or newer. |
|
356 |
|
357 |
|
358 .. method:: xmlparser.StartElementHandler(name, attributes) |
|
359 |
|
360 Called for the start of every element. *name* is a string containing the |
|
361 element name, and *attributes* is a dictionary mapping attribute names to their |
|
362 values. |
|
363 |
|
364 |
|
365 .. method:: xmlparser.EndElementHandler(name) |
|
366 |
|
367 Called for the end of every element. |
|
368 |
|
369 |
|
370 .. method:: xmlparser.ProcessingInstructionHandler(target, data) |
|
371 |
|
372 Called for every processing instruction. |
|
373 |
|
374 |
|
375 .. method:: xmlparser.CharacterDataHandler(data) |
|
376 |
|
377 Called for character data. This will be called for normal character data, CDATA |
|
378 marked content, and ignorable whitespace. Applications which must distinguish |
|
379 these cases can use the :attr:`StartCdataSectionHandler`, |
|
380 :attr:`EndCdataSectionHandler`, and :attr:`ElementDeclHandler` callbacks to |
|
381 collect the required information. |
|
382 |
|
383 |
|
384 .. method:: xmlparser.UnparsedEntityDeclHandler(entityName, base, systemId, publicId, notationName) |
|
385 |
|
386 Called for unparsed (NDATA) entity declarations. This is only present for |
|
387 version 1.2 of the Expat library; for more recent versions, use |
|
388 :attr:`EntityDeclHandler` instead. (The underlying function in the Expat |
|
389 library has been declared obsolete.) |
|
390 |
|
391 |
|
392 .. method:: xmlparser.EntityDeclHandler(entityName, is_parameter_entity, value, base, systemId, publicId, notationName) |
|
393 |
|
394 Called for all entity declarations. For parameter and internal entities, |
|
395 *value* will be a string giving the declared contents of the entity; this will |
|
396 be ``None`` for external entities. The *notationName* parameter will be |
|
397 ``None`` for parsed entities, and the name of the notation for unparsed |
|
398 entities. *is_parameter_entity* will be true if the entity is a parameter entity |
|
399 or false for general entities (most applications only need to be concerned with |
|
400 general entities). This is only available starting with version 1.95.0 of the |
|
401 Expat library. |
|
402 |
|
403 .. versionadded:: 2.1 |
|
404 |
|
405 |
|
406 .. method:: xmlparser.NotationDeclHandler(notationName, base, systemId, publicId) |
|
407 |
|
408 Called for notation declarations. *notationName*, *base*, and *systemId*, and |
|
409 *publicId* are strings if given. If the public identifier is omitted, |
|
410 *publicId* will be ``None``. |
|
411 |
|
412 |
|
413 .. method:: xmlparser.StartNamespaceDeclHandler(prefix, uri) |
|
414 |
|
415 Called when an element contains a namespace declaration. Namespace declarations |
|
416 are processed before the :attr:`StartElementHandler` is called for the element |
|
417 on which declarations are placed. |
|
418 |
|
419 |
|
420 .. method:: xmlparser.EndNamespaceDeclHandler(prefix) |
|
421 |
|
422 Called when the closing tag is reached for an element that contained a |
|
423 namespace declaration. This is called once for each namespace declaration on |
|
424 the element in the reverse of the order for which the |
|
425 :attr:`StartNamespaceDeclHandler` was called to indicate the start of each |
|
426 namespace declaration's scope. Calls to this handler are made after the |
|
427 corresponding :attr:`EndElementHandler` for the end of the element. |
|
428 |
|
429 |
|
430 .. method:: xmlparser.CommentHandler(data) |
|
431 |
|
432 Called for comments. *data* is the text of the comment, excluding the leading |
|
433 '``<!-``\ ``-``' and trailing '``-``\ ``->``'. |
|
434 |
|
435 |
|
436 .. method:: xmlparser.StartCdataSectionHandler() |
|
437 |
|
438 Called at the start of a CDATA section. This and :attr:`EndCdataSectionHandler` |
|
439 are needed to be able to identify the syntactical start and end for CDATA |
|
440 sections. |
|
441 |
|
442 |
|
443 .. method:: xmlparser.EndCdataSectionHandler() |
|
444 |
|
445 Called at the end of a CDATA section. |
|
446 |
|
447 |
|
448 .. method:: xmlparser.DefaultHandler(data) |
|
449 |
|
450 Called for any characters in the XML document for which no applicable handler |
|
451 has been specified. This means characters that are part of a construct which |
|
452 could be reported, but for which no handler has been supplied. |
|
453 |
|
454 |
|
455 .. method:: xmlparser.DefaultHandlerExpand(data) |
|
456 |
|
457 This is the same as the :func:`DefaultHandler`, but doesn't inhibit expansion |
|
458 of internal entities. The entity reference will not be passed to the default |
|
459 handler. |
|
460 |
|
461 |
|
462 .. method:: xmlparser.NotStandaloneHandler() |
|
463 |
|
464 Called if the XML document hasn't been declared as being a standalone document. |
|
465 This happens when there is an external subset or a reference to a parameter |
|
466 entity, but the XML declaration does not set standalone to ``yes`` in an XML |
|
467 declaration. If this handler returns ``0``, then the parser will throw an |
|
468 :const:`XML_ERROR_NOT_STANDALONE` error. If this handler is not set, no |
|
469 exception is raised by the parser for this condition. |
|
470 |
|
471 |
|
472 .. method:: xmlparser.ExternalEntityRefHandler(context, base, systemId, publicId) |
|
473 |
|
474 Called for references to external entities. *base* is the current base, as set |
|
475 by a previous call to :meth:`SetBase`. The public and system identifiers, |
|
476 *systemId* and *publicId*, are strings if given; if the public identifier is not |
|
477 given, *publicId* will be ``None``. The *context* value is opaque and should |
|
478 only be used as described below. |
|
479 |
|
480 For external entities to be parsed, this handler must be implemented. It is |
|
481 responsible for creating the sub-parser using |
|
482 ``ExternalEntityParserCreate(context)``, initializing it with the appropriate |
|
483 callbacks, and parsing the entity. This handler should return an integer; if it |
|
484 returns ``0``, the parser will throw an |
|
485 :const:`XML_ERROR_EXTERNAL_ENTITY_HANDLING` error, otherwise parsing will |
|
486 continue. |
|
487 |
|
488 If this handler is not provided, external entities are reported by the |
|
489 :attr:`DefaultHandler` callback, if provided. |
|
490 |
|
491 |
|
492 .. _expaterror-objects: |
|
493 |
|
494 ExpatError Exceptions |
|
495 --------------------- |
|
496 |
|
497 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
|
498 |
|
499 |
|
500 :exc:`ExpatError` exceptions have a number of interesting attributes: |
|
501 |
|
502 |
|
503 .. attribute:: ExpatError.code |
|
504 |
|
505 Expat's internal error number for the specific error. This will match one of |
|
506 the constants defined in the ``errors`` object from this module. |
|
507 |
|
508 .. versionadded:: 2.1 |
|
509 |
|
510 |
|
511 .. attribute:: ExpatError.lineno |
|
512 |
|
513 Line number on which the error was detected. The first line is numbered ``1``. |
|
514 |
|
515 .. versionadded:: 2.1 |
|
516 |
|
517 |
|
518 .. attribute:: ExpatError.offset |
|
519 |
|
520 Character offset into the line where the error occurred. The first column is |
|
521 numbered ``0``. |
|
522 |
|
523 .. versionadded:: 2.1 |
|
524 |
|
525 |
|
526 .. _expat-example: |
|
527 |
|
528 Example |
|
529 ------- |
|
530 |
|
531 The following program defines three handlers that just print out their |
|
532 arguments. :: |
|
533 |
|
534 import xml.parsers.expat |
|
535 |
|
536 # 3 handler functions |
|
537 def start_element(name, attrs): |
|
538 print 'Start element:', name, attrs |
|
539 def end_element(name): |
|
540 print 'End element:', name |
|
541 def char_data(data): |
|
542 print 'Character data:', repr(data) |
|
543 |
|
544 p = xml.parsers.expat.ParserCreate() |
|
545 |
|
546 p.StartElementHandler = start_element |
|
547 p.EndElementHandler = end_element |
|
548 p.CharacterDataHandler = char_data |
|
549 |
|
550 p.Parse("""<?xml version="1.0"?> |
|
551 <parent id="top"><child1 name="paul">Text goes here</child1> |
|
552 <child2 name="fred">More text</child2> |
|
553 </parent>""", 1) |
|
554 |
|
555 The output from this program is:: |
|
556 |
|
557 Start element: parent {'id': 'top'} |
|
558 Start element: child1 {'name': 'paul'} |
|
559 Character data: 'Text goes here' |
|
560 End element: child1 |
|
561 Character data: '\n' |
|
562 Start element: child2 {'name': 'fred'} |
|
563 Character data: 'More text' |
|
564 End element: child2 |
|
565 Character data: '\n' |
|
566 End element: parent |
|
567 |
|
568 |
|
569 .. _expat-content-models: |
|
570 |
|
571 Content Model Descriptions |
|
572 -------------------------- |
|
573 |
|
574 .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
|
575 |
|
576 |
|
577 Content modules are described using nested tuples. Each tuple contains four |
|
578 values: the type, the quantifier, the name, and a tuple of children. Children |
|
579 are simply additional content module descriptions. |
|
580 |
|
581 The values of the first two fields are constants defined in the ``model`` object |
|
582 of the :mod:`xml.parsers.expat` module. These constants can be collected in two |
|
583 groups: the model type group and the quantifier group. |
|
584 |
|
585 The constants in the model type group are: |
|
586 |
|
587 |
|
588 .. data:: XML_CTYPE_ANY |
|
589 :noindex: |
|
590 |
|
591 The element named by the model name was declared to have a content model of |
|
592 ``ANY``. |
|
593 |
|
594 |
|
595 .. data:: XML_CTYPE_CHOICE |
|
596 :noindex: |
|
597 |
|
598 The named element allows a choice from a number of options; this is used for |
|
599 content models such as ``(A | B | C)``. |
|
600 |
|
601 |
|
602 .. data:: XML_CTYPE_EMPTY |
|
603 :noindex: |
|
604 |
|
605 Elements which are declared to be ``EMPTY`` have this model type. |
|
606 |
|
607 |
|
608 .. data:: XML_CTYPE_MIXED |
|
609 :noindex: |
|
610 |
|
611 |
|
612 .. data:: XML_CTYPE_NAME |
|
613 :noindex: |
|
614 |
|
615 |
|
616 .. data:: XML_CTYPE_SEQ |
|
617 :noindex: |
|
618 |
|
619 Models which represent a series of models which follow one after the other are |
|
620 indicated with this model type. This is used for models such as ``(A, B, C)``. |
|
621 |
|
622 The constants in the quantifier group are: |
|
623 |
|
624 |
|
625 .. data:: XML_CQUANT_NONE |
|
626 :noindex: |
|
627 |
|
628 No modifier is given, so it can appear exactly once, as for ``A``. |
|
629 |
|
630 |
|
631 .. data:: XML_CQUANT_OPT |
|
632 :noindex: |
|
633 |
|
634 The model is optional: it can appear once or not at all, as for ``A?``. |
|
635 |
|
636 |
|
637 .. data:: XML_CQUANT_PLUS |
|
638 :noindex: |
|
639 |
|
640 The model must occur one or more times (like ``A+``). |
|
641 |
|
642 |
|
643 .. data:: XML_CQUANT_REP |
|
644 :noindex: |
|
645 |
|
646 The model must occur zero or more times, as for ``A*``. |
|
647 |
|
648 |
|
649 .. _expat-errors: |
|
650 |
|
651 Expat error constants |
|
652 --------------------- |
|
653 |
|
654 The following constants are provided in the ``errors`` object of the |
|
655 :mod:`xml.parsers.expat` module. These constants are useful in interpreting |
|
656 some of the attributes of the :exc:`ExpatError` exception objects raised when an |
|
657 error has occurred. |
|
658 |
|
659 The ``errors`` object has the following attributes: |
|
660 |
|
661 |
|
662 .. data:: XML_ERROR_ASYNC_ENTITY |
|
663 :noindex: |
|
664 |
|
665 |
|
666 .. data:: XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF |
|
667 :noindex: |
|
668 |
|
669 An entity reference in an attribute value referred to an external entity instead |
|
670 of an internal entity. |
|
671 |
|
672 |
|
673 .. data:: XML_ERROR_BAD_CHAR_REF |
|
674 :noindex: |
|
675 |
|
676 A character reference referred to a character which is illegal in XML (for |
|
677 example, character ``0``, or '``�``'). |
|
678 |
|
679 |
|
680 .. data:: XML_ERROR_BINARY_ENTITY_REF |
|
681 :noindex: |
|
682 |
|
683 An entity reference referred to an entity which was declared with a notation, so |
|
684 cannot be parsed. |
|
685 |
|
686 |
|
687 .. data:: XML_ERROR_DUPLICATE_ATTRIBUTE |
|
688 :noindex: |
|
689 |
|
690 An attribute was used more than once in a start tag. |
|
691 |
|
692 |
|
693 .. data:: XML_ERROR_INCORRECT_ENCODING |
|
694 :noindex: |
|
695 |
|
696 |
|
697 .. data:: XML_ERROR_INVALID_TOKEN |
|
698 :noindex: |
|
699 |
|
700 Raised when an input byte could not properly be assigned to a character; for |
|
701 example, a NUL byte (value ``0``) in a UTF-8 input stream. |
|
702 |
|
703 |
|
704 .. data:: XML_ERROR_JUNK_AFTER_DOC_ELEMENT |
|
705 :noindex: |
|
706 |
|
707 Something other than whitespace occurred after the document element. |
|
708 |
|
709 |
|
710 .. data:: XML_ERROR_MISPLACED_XML_PI |
|
711 :noindex: |
|
712 |
|
713 An XML declaration was found somewhere other than the start of the input data. |
|
714 |
|
715 |
|
716 .. data:: XML_ERROR_NO_ELEMENTS |
|
717 :noindex: |
|
718 |
|
719 The document contains no elements (XML requires all documents to contain exactly |
|
720 one top-level element).. |
|
721 |
|
722 |
|
723 .. data:: XML_ERROR_NO_MEMORY |
|
724 :noindex: |
|
725 |
|
726 Expat was not able to allocate memory internally. |
|
727 |
|
728 |
|
729 .. data:: XML_ERROR_PARAM_ENTITY_REF |
|
730 :noindex: |
|
731 |
|
732 A parameter entity reference was found where it was not allowed. |
|
733 |
|
734 |
|
735 .. data:: XML_ERROR_PARTIAL_CHAR |
|
736 :noindex: |
|
737 |
|
738 An incomplete character was found in the input. |
|
739 |
|
740 |
|
741 .. data:: XML_ERROR_RECURSIVE_ENTITY_REF |
|
742 :noindex: |
|
743 |
|
744 An entity reference contained another reference to the same entity; possibly via |
|
745 a different name, and possibly indirectly. |
|
746 |
|
747 |
|
748 .. data:: XML_ERROR_SYNTAX |
|
749 :noindex: |
|
750 |
|
751 Some unspecified syntax error was encountered. |
|
752 |
|
753 |
|
754 .. data:: XML_ERROR_TAG_MISMATCH |
|
755 :noindex: |
|
756 |
|
757 An end tag did not match the innermost open start tag. |
|
758 |
|
759 |
|
760 .. data:: XML_ERROR_UNCLOSED_TOKEN |
|
761 :noindex: |
|
762 |
|
763 Some token (such as a start tag) was not closed before the end of the stream or |
|
764 the next token was encountered. |
|
765 |
|
766 |
|
767 .. data:: XML_ERROR_UNDEFINED_ENTITY |
|
768 :noindex: |
|
769 |
|
770 A reference was made to a entity which was not defined. |
|
771 |
|
772 |
|
773 .. data:: XML_ERROR_UNKNOWN_ENCODING |
|
774 :noindex: |
|
775 |
|
776 The document encoding is not supported by Expat. |
|
777 |
|
778 |
|
779 .. data:: XML_ERROR_UNCLOSED_CDATA_SECTION |
|
780 :noindex: |
|
781 |
|
782 A CDATA marked section was not closed. |
|
783 |
|
784 |
|
785 .. data:: XML_ERROR_EXTERNAL_ENTITY_HANDLING |
|
786 :noindex: |
|
787 |
|
788 |
|
789 .. data:: XML_ERROR_NOT_STANDALONE |
|
790 :noindex: |
|
791 |
|
792 The parser determined that the document was not "standalone" though it declared |
|
793 itself to be in the XML declaration, and the :attr:`NotStandaloneHandler` was |
|
794 set and returned ``0``. |
|
795 |
|
796 |
|
797 .. data:: XML_ERROR_UNEXPECTED_STATE |
|
798 :noindex: |
|
799 |
|
800 |
|
801 .. data:: XML_ERROR_ENTITY_DECLARED_IN_PE |
|
802 :noindex: |
|
803 |
|
804 |
|
805 .. data:: XML_ERROR_FEATURE_REQUIRES_XML_DTD |
|
806 :noindex: |
|
807 |
|
808 An operation was requested that requires DTD support to be compiled in, but |
|
809 Expat was configured without DTD support. This should never be reported by a |
|
810 standard build of the :mod:`xml.parsers.expat` module. |
|
811 |
|
812 |
|
813 .. data:: XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING |
|
814 :noindex: |
|
815 |
|
816 A behavioral change was requested after parsing started that can only be changed |
|
817 before parsing has started. This is (currently) only raised by |
|
818 :meth:`UseForeignDTD`. |
|
819 |
|
820 |
|
821 .. data:: XML_ERROR_UNBOUND_PREFIX |
|
822 :noindex: |
|
823 |
|
824 An undeclared prefix was found when namespace processing was enabled. |
|
825 |
|
826 |
|
827 .. data:: XML_ERROR_UNDECLARING_PREFIX |
|
828 :noindex: |
|
829 |
|
830 The document attempted to remove the namespace declaration associated with a |
|
831 prefix. |
|
832 |
|
833 |
|
834 .. data:: XML_ERROR_INCOMPLETE_PE |
|
835 :noindex: |
|
836 |
|
837 A parameter entity contained incomplete markup. |
|
838 |
|
839 |
|
840 .. data:: XML_ERROR_XML_DECL |
|
841 :noindex: |
|
842 |
|
843 The document contained no document element at all. |
|
844 |
|
845 |
|
846 .. data:: XML_ERROR_TEXT_DECL |
|
847 :noindex: |
|
848 |
|
849 There was an error parsing a text declaration in an external entity. |
|
850 |
|
851 |
|
852 .. data:: XML_ERROR_PUBLICID |
|
853 :noindex: |
|
854 |
|
855 Characters were found in the public id that are not allowed. |
|
856 |
|
857 |
|
858 .. data:: XML_ERROR_SUSPENDED |
|
859 :noindex: |
|
860 |
|
861 The requested operation was made on a suspended parser, but isn't allowed. This |
|
862 includes attempts to provide additional input or to stop the parser. |
|
863 |
|
864 |
|
865 .. data:: XML_ERROR_NOT_SUSPENDED |
|
866 :noindex: |
|
867 |
|
868 An attempt to resume the parser was made when the parser had not been suspended. |
|
869 |
|
870 |
|
871 .. data:: XML_ERROR_ABORTED |
|
872 :noindex: |
|
873 |
|
874 This should not be reported to Python applications. |
|
875 |
|
876 |
|
877 .. data:: XML_ERROR_FINISHED |
|
878 :noindex: |
|
879 |
|
880 The requested operation was made on a parser which was finished parsing input, |
|
881 but isn't allowed. This includes attempts to provide additional input or to |
|
882 stop the parser. |
|
883 |
|
884 |
|
885 .. data:: XML_ERROR_SUSPEND_PE |
|
886 :noindex: |
|
887 |
|
888 |
|
889 .. rubric:: Footnotes |
|
890 |
|
891 .. [#] The encoding string included in XML output should conform to the |
|
892 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is |
|
893 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl |
|
894 and http://www.iana.org/assignments/character-sets . |
|
895 |