|
1 |
|
2 :mod:`xml.etree.ElementTree` --- The ElementTree XML API |
|
3 ======================================================== |
|
4 |
|
5 .. module:: xml.etree.ElementTree |
|
6 :synopsis: Implementation of the ElementTree API. |
|
7 .. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com> |
|
8 |
|
9 |
|
10 .. versionadded:: 2.5 |
|
11 |
|
12 The Element type is a flexible container object, designed to store hierarchical |
|
13 data structures in memory. The type can be described as a cross between a list |
|
14 and a dictionary. |
|
15 |
|
16 Each element has a number of properties associated with it: |
|
17 |
|
18 * a tag which is a string identifying what kind of data this element represents |
|
19 (the element type, in other words). |
|
20 |
|
21 * a number of attributes, stored in a Python dictionary. |
|
22 |
|
23 * a text string. |
|
24 |
|
25 * an optional tail string. |
|
26 |
|
27 * a number of child elements, stored in a Python sequence |
|
28 |
|
29 To create an element instance, use the Element or SubElement factory functions. |
|
30 |
|
31 The :class:`ElementTree` class can be used to wrap an element structure, and |
|
32 convert it from and to XML. |
|
33 |
|
34 A C implementation of this API is available as :mod:`xml.etree.cElementTree`. |
|
35 |
|
36 See http://effbot.org/zone/element-index.htm for tutorials and links to other |
|
37 docs. Fredrik Lundh's page is also the location of the development version of the |
|
38 xml.etree.ElementTree. |
|
39 |
|
40 .. _elementtree-functions: |
|
41 |
|
42 Functions |
|
43 --------- |
|
44 |
|
45 |
|
46 .. function:: Comment([text]) |
|
47 |
|
48 Comment element factory. This factory function creates a special element that |
|
49 will be serialized as an XML comment. The comment string can be either an 8-bit |
|
50 ASCII string or a Unicode string. *text* is a string containing the comment |
|
51 string. Returns an element instance representing a comment. |
|
52 |
|
53 |
|
54 .. function:: dump(elem) |
|
55 |
|
56 Writes an element tree or element structure to sys.stdout. This function should |
|
57 be used for debugging only. |
|
58 |
|
59 The exact output format is implementation dependent. In this version, it's |
|
60 written as an ordinary XML file. |
|
61 |
|
62 *elem* is an element tree or an individual element. |
|
63 |
|
64 |
|
65 .. function:: Element(tag[, attrib][, **extra]) |
|
66 |
|
67 Element factory. This function returns an object implementing the standard |
|
68 Element interface. The exact class or type of that object is implementation |
|
69 dependent, but it will always be compatible with the _ElementInterface class in |
|
70 this module. |
|
71 |
|
72 The element name, attribute names, and attribute values can be either 8-bit |
|
73 ASCII strings or Unicode strings. *tag* is the element name. *attrib* is an |
|
74 optional dictionary, containing element attributes. *extra* contains additional |
|
75 attributes, given as keyword arguments. Returns an element instance. |
|
76 |
|
77 |
|
78 .. function:: fromstring(text) |
|
79 |
|
80 Parses an XML section from a string constant. Same as XML. *text* is a string |
|
81 containing XML data. Returns an Element instance. |
|
82 |
|
83 |
|
84 .. function:: iselement(element) |
|
85 |
|
86 Checks if an object appears to be a valid element object. *element* is an |
|
87 element instance. Returns a true value if this is an element object. |
|
88 |
|
89 |
|
90 .. function:: iterparse(source[, events]) |
|
91 |
|
92 Parses an XML section into an element tree incrementally, and reports what's |
|
93 going on to the user. *source* is a filename or file object containing XML data. |
|
94 *events* is a list of events to report back. If omitted, only "end" events are |
|
95 reported. Returns an :term:`iterator` providing ``(event, elem)`` pairs. |
|
96 |
|
97 |
|
98 .. function:: parse(source[, parser]) |
|
99 |
|
100 Parses an XML section into an element tree. *source* is a filename or file |
|
101 object containing XML data. *parser* is an optional parser instance. If not |
|
102 given, the standard XMLTreeBuilder parser is used. Returns an ElementTree |
|
103 instance. |
|
104 |
|
105 |
|
106 .. function:: ProcessingInstruction(target[, text]) |
|
107 |
|
108 PI element factory. This factory function creates a special element that will |
|
109 be serialized as an XML processing instruction. *target* is a string containing |
|
110 the PI target. *text* is a string containing the PI contents, if given. Returns |
|
111 an element instance, representing a processing instruction. |
|
112 |
|
113 |
|
114 .. function:: SubElement(parent, tag[, attrib[, **extra]]) |
|
115 |
|
116 Subelement factory. This function creates an element instance, and appends it |
|
117 to an existing element. |
|
118 |
|
119 The element name, attribute names, and attribute values can be either 8-bit |
|
120 ASCII strings or Unicode strings. *parent* is the parent element. *tag* is the |
|
121 subelement name. *attrib* is an optional dictionary, containing element |
|
122 attributes. *extra* contains additional attributes, given as keyword arguments. |
|
123 Returns an element instance. |
|
124 |
|
125 |
|
126 .. function:: tostring(element[, encoding]) |
|
127 |
|
128 Generates a string representation of an XML element, including all subelements. |
|
129 *element* is an Element instance. *encoding* is the output encoding (default is |
|
130 US-ASCII). Returns an encoded string containing the XML data. |
|
131 |
|
132 |
|
133 .. function:: XML(text) |
|
134 |
|
135 Parses an XML section from a string constant. This function can be used to |
|
136 embed "XML literals" in Python code. *text* is a string containing XML data. |
|
137 Returns an Element instance. |
|
138 |
|
139 |
|
140 .. function:: XMLID(text) |
|
141 |
|
142 Parses an XML section from a string constant, and also returns a dictionary |
|
143 which maps from element id:s to elements. *text* is a string containing XML |
|
144 data. Returns a tuple containing an Element instance and a dictionary. |
|
145 |
|
146 |
|
147 .. _elementtree-element-interface: |
|
148 |
|
149 The Element Interface |
|
150 --------------------- |
|
151 |
|
152 Element objects returned by Element or SubElement have the following methods |
|
153 and attributes. |
|
154 |
|
155 |
|
156 .. attribute:: Element.tag |
|
157 |
|
158 A string identifying what kind of data this element represents (the element |
|
159 type, in other words). |
|
160 |
|
161 |
|
162 .. attribute:: Element.text |
|
163 |
|
164 The *text* attribute can be used to hold additional data associated with the |
|
165 element. As the name implies this attribute is usually a string but may be any |
|
166 application-specific object. If the element is created from an XML file the |
|
167 attribute will contain any text found between the element tags. |
|
168 |
|
169 |
|
170 .. attribute:: Element.tail |
|
171 |
|
172 The *tail* attribute can be used to hold additional data associated with the |
|
173 element. This attribute is usually a string but may be any application-specific |
|
174 object. If the element is created from an XML file the attribute will contain |
|
175 any text found after the element's end tag and before the next tag. |
|
176 |
|
177 |
|
178 .. attribute:: Element.attrib |
|
179 |
|
180 A dictionary containing the element's attributes. Note that while the *attrib* |
|
181 value is always a real mutable Python dictionary, an ElementTree implementation |
|
182 may choose to use another internal representation, and create the dictionary |
|
183 only if someone asks for it. To take advantage of such implementations, use the |
|
184 dictionary methods below whenever possible. |
|
185 |
|
186 The following dictionary-like methods work on the element attributes. |
|
187 |
|
188 |
|
189 .. method:: Element.clear() |
|
190 |
|
191 Resets an element. This function removes all subelements, clears all |
|
192 attributes, and sets the text and tail attributes to None. |
|
193 |
|
194 |
|
195 .. method:: Element.get(key[, default=None]) |
|
196 |
|
197 Gets the element attribute named *key*. |
|
198 |
|
199 Returns the attribute value, or *default* if the attribute was not found. |
|
200 |
|
201 |
|
202 .. method:: Element.items() |
|
203 |
|
204 Returns the element attributes as a sequence of (name, value) pairs. The |
|
205 attributes are returned in an arbitrary order. |
|
206 |
|
207 |
|
208 .. method:: Element.keys() |
|
209 |
|
210 Returns the elements attribute names as a list. The names are returned in an |
|
211 arbitrary order. |
|
212 |
|
213 |
|
214 .. method:: Element.set(key, value) |
|
215 |
|
216 Set the attribute *key* on the element to *value*. |
|
217 |
|
218 The following methods work on the element's children (subelements). |
|
219 |
|
220 |
|
221 .. method:: Element.append(subelement) |
|
222 |
|
223 Adds the element *subelement* to the end of this elements internal list of |
|
224 subelements. |
|
225 |
|
226 |
|
227 .. method:: Element.find(match) |
|
228 |
|
229 Finds the first subelement matching *match*. *match* may be a tag name or path. |
|
230 Returns an element instance or ``None``. |
|
231 |
|
232 |
|
233 .. method:: Element.findall(match) |
|
234 |
|
235 Finds all subelements matching *match*. *match* may be a tag name or path. |
|
236 Returns an iterable yielding all matching elements in document order. |
|
237 |
|
238 |
|
239 .. method:: Element.findtext(condition[, default=None]) |
|
240 |
|
241 Finds text for the first subelement matching *condition*. *condition* may be a |
|
242 tag name or path. Returns the text content of the first matching element, or |
|
243 *default* if no element was found. Note that if the matching element has no |
|
244 text content an empty string is returned. |
|
245 |
|
246 |
|
247 .. method:: Element.getchildren() |
|
248 |
|
249 Returns all subelements. The elements are returned in document order. |
|
250 |
|
251 |
|
252 .. method:: Element.getiterator([tag=None]) |
|
253 |
|
254 Creates a tree iterator with the current element as the root. The iterator |
|
255 iterates over this element and all elements below it that match the given tag. |
|
256 If tag is ``None`` or ``'*'`` then all elements are iterated over. Returns an |
|
257 iterable that provides element objects in document (depth first) order. |
|
258 |
|
259 |
|
260 .. method:: Element.insert(index, element) |
|
261 |
|
262 Inserts a subelement at the given position in this element. |
|
263 |
|
264 |
|
265 .. method:: Element.makeelement(tag, attrib) |
|
266 |
|
267 Creates a new element object of the same type as this element. Do not call this |
|
268 method, use the SubElement factory function instead. |
|
269 |
|
270 |
|
271 .. method:: Element.remove(subelement) |
|
272 |
|
273 Removes *subelement* from the element. Unlike the findXYZ methods this method |
|
274 compares elements based on the instance identity, not on tag value or contents. |
|
275 |
|
276 Element objects also support the following sequence type methods for working |
|
277 with subelements: :meth:`__delitem__`, :meth:`__getitem__`, :meth:`__setitem__`, |
|
278 :meth:`__len__`. |
|
279 |
|
280 Caution: Because Element objects do not define a :meth:`__nonzero__` method, |
|
281 elements with no subelements will test as ``False``. :: |
|
282 |
|
283 element = root.find('foo') |
|
284 |
|
285 if not element: # careful! |
|
286 print "element not found, or element has no subelements" |
|
287 |
|
288 if element is None: |
|
289 print "element not found" |
|
290 |
|
291 |
|
292 .. _elementtree-elementtree-objects: |
|
293 |
|
294 ElementTree Objects |
|
295 ------------------- |
|
296 |
|
297 |
|
298 .. class:: ElementTree([element,] [file]) |
|
299 |
|
300 ElementTree wrapper class. This class represents an entire element hierarchy, |
|
301 and adds some extra support for serialization to and from standard XML. |
|
302 |
|
303 *element* is the root element. The tree is initialized with the contents of the |
|
304 XML *file* if given. |
|
305 |
|
306 |
|
307 .. method:: _setroot(element) |
|
308 |
|
309 Replaces the root element for this tree. This discards the current |
|
310 contents of the tree, and replaces it with the given element. Use with |
|
311 care. *element* is an element instance. |
|
312 |
|
313 |
|
314 .. method:: find(path) |
|
315 |
|
316 Finds the first toplevel element with given tag. Same as |
|
317 getroot().find(path). *path* is the element to look for. Returns the |
|
318 first matching element, or ``None`` if no element was found. |
|
319 |
|
320 |
|
321 .. method:: findall(path) |
|
322 |
|
323 Finds all toplevel elements with the given tag. Same as |
|
324 getroot().findall(path). *path* is the element to look for. Returns a |
|
325 list or :term:`iterator` containing all matching elements, in document |
|
326 order. |
|
327 |
|
328 |
|
329 .. method:: findtext(path[, default]) |
|
330 |
|
331 Finds the element text for the first toplevel element with given tag. |
|
332 Same as getroot().findtext(path). *path* is the toplevel element to look |
|
333 for. *default* is the value to return if the element was not |
|
334 found. Returns the text content of the first matching element, or the |
|
335 default value no element was found. Note that if the element has is |
|
336 found, but has no text content, this method returns an empty string. |
|
337 |
|
338 |
|
339 .. method:: getiterator([tag]) |
|
340 |
|
341 Creates and returns a tree iterator for the root element. The iterator |
|
342 loops over all elements in this tree, in section order. *tag* is the tag |
|
343 to look for (default is to return all elements) |
|
344 |
|
345 |
|
346 .. method:: getroot() |
|
347 |
|
348 Returns the root element for this tree. |
|
349 |
|
350 |
|
351 .. method:: parse(source[, parser]) |
|
352 |
|
353 Loads an external XML section into this element tree. *source* is a file |
|
354 name or file object. *parser* is an optional parser instance. If not |
|
355 given, the standard XMLTreeBuilder parser is used. Returns the section |
|
356 root element. |
|
357 |
|
358 |
|
359 .. method:: write(file[, encoding]) |
|
360 |
|
361 Writes the element tree to a file, as XML. *file* is a file name, or a |
|
362 file object opened for writing. *encoding* [1]_ is the output encoding |
|
363 (default is US-ASCII). |
|
364 |
|
365 This is the XML file that is going to be manipulated:: |
|
366 |
|
367 <html> |
|
368 <head> |
|
369 <title>Example page</title> |
|
370 </head> |
|
371 <body> |
|
372 <p>Moved to <a href="http://example.org/">example.org</a> |
|
373 or <a href="http://example.com/">example.com</a>.</p> |
|
374 </body> |
|
375 </html> |
|
376 |
|
377 Example of changing the attribute "target" of every link in first paragraph:: |
|
378 |
|
379 >>> from xml.etree.ElementTree import ElementTree |
|
380 >>> tree = ElementTree() |
|
381 >>> tree.parse("index.xhtml") |
|
382 <Element html at b7d3f1ec> |
|
383 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body |
|
384 >>> p |
|
385 <Element p at 8416e0c> |
|
386 >>> links = p.getiterator("a") # Returns list of all links |
|
387 >>> links |
|
388 [<Element a at b7d4f9ec>, <Element a at b7d4fb0c>] |
|
389 >>> for i in links: # Iterates through all found links |
|
390 ... i.attrib["target"] = "blank" |
|
391 >>> tree.write("output.xhtml") |
|
392 |
|
393 .. _elementtree-qname-objects: |
|
394 |
|
395 QName Objects |
|
396 ------------- |
|
397 |
|
398 |
|
399 .. class:: QName(text_or_uri[, tag]) |
|
400 |
|
401 QName wrapper. This can be used to wrap a QName attribute value, in order to |
|
402 get proper namespace handling on output. *text_or_uri* is a string containing |
|
403 the QName value, in the form {uri}local, or, if the tag argument is given, the |
|
404 URI part of a QName. If *tag* is given, the first argument is interpreted as an |
|
405 URI, and this argument is interpreted as a local name. :class:`QName` instances |
|
406 are opaque. |
|
407 |
|
408 |
|
409 .. _elementtree-treebuilder-objects: |
|
410 |
|
411 TreeBuilder Objects |
|
412 ------------------- |
|
413 |
|
414 |
|
415 .. class:: TreeBuilder([element_factory]) |
|
416 |
|
417 Generic element structure builder. This builder converts a sequence of start, |
|
418 data, and end method calls to a well-formed element structure. You can use this |
|
419 class to build an element structure using a custom XML parser, or a parser for |
|
420 some other XML-like format. The *element_factory* is called to create new |
|
421 Element instances when given. |
|
422 |
|
423 |
|
424 .. method:: close() |
|
425 |
|
426 Flushes the parser buffers, and returns the toplevel document |
|
427 element. Returns an Element instance. |
|
428 |
|
429 |
|
430 .. method:: data(data) |
|
431 |
|
432 Adds text to the current element. *data* is a string. This should be |
|
433 either an 8-bit string containing ASCII text, or a Unicode string. |
|
434 |
|
435 |
|
436 .. method:: end(tag) |
|
437 |
|
438 Closes the current element. *tag* is the element name. Returns the closed |
|
439 element. |
|
440 |
|
441 |
|
442 .. method:: start(tag, attrs) |
|
443 |
|
444 Opens a new element. *tag* is the element name. *attrs* is a dictionary |
|
445 containing element attributes. Returns the opened element. |
|
446 |
|
447 |
|
448 .. _elementtree-xmltreebuilder-objects: |
|
449 |
|
450 XMLTreeBuilder Objects |
|
451 ---------------------- |
|
452 |
|
453 |
|
454 .. class:: XMLTreeBuilder([html,] [target]) |
|
455 |
|
456 Element structure builder for XML source data, based on the expat parser. *html* |
|
457 are predefined HTML entities. This flag is not supported by the current |
|
458 implementation. *target* is the target object. If omitted, the builder uses an |
|
459 instance of the standard TreeBuilder class. |
|
460 |
|
461 |
|
462 .. method:: close() |
|
463 |
|
464 Finishes feeding data to the parser. Returns an element structure. |
|
465 |
|
466 |
|
467 .. method:: doctype(name, pubid, system) |
|
468 |
|
469 Handles a doctype declaration. *name* is the doctype name. *pubid* is the |
|
470 public identifier. *system* is the system identifier. |
|
471 |
|
472 |
|
473 .. method:: feed(data) |
|
474 |
|
475 Feeds data to the parser. *data* is encoded data. |
|
476 |
|
477 :meth:`XMLTreeBuilder.feed` calls *target*\'s :meth:`start` method |
|
478 for each opening tag, its :meth:`end` method for each closing tag, |
|
479 and data is processed by method :meth:`data`. :meth:`XMLTreeBuilder.close` |
|
480 calls *target*\'s method :meth:`close`. |
|
481 :class:`XMLTreeBuilder` can be used not only for building a tree structure. |
|
482 This is an example of counting the maximum depth of an XML file:: |
|
483 |
|
484 >>> from xml.etree.ElementTree import XMLTreeBuilder |
|
485 >>> class MaxDepth: # The target object of the parser |
|
486 ... maxDepth = 0 |
|
487 ... depth = 0 |
|
488 ... def start(self, tag, attrib): # Called for each opening tag. |
|
489 ... self.depth += 1 |
|
490 ... if self.depth > self.maxDepth: |
|
491 ... self.maxDepth = self.depth |
|
492 ... def end(self, tag): # Called for each closing tag. |
|
493 ... self.depth -= 1 |
|
494 ... def data(self, data): |
|
495 ... pass # We do not need to do anything with data. |
|
496 ... def close(self): # Called when all data has been parsed. |
|
497 ... return self.maxDepth |
|
498 ... |
|
499 >>> target = MaxDepth() |
|
500 >>> parser = XMLTreeBuilder(target=target) |
|
501 >>> exampleXml = """ |
|
502 ... <a> |
|
503 ... <b> |
|
504 ... </b> |
|
505 ... <b> |
|
506 ... <c> |
|
507 ... <d> |
|
508 ... </d> |
|
509 ... </c> |
|
510 ... </b> |
|
511 ... </a>""" |
|
512 >>> parser.feed(exampleXml) |
|
513 >>> parser.close() |
|
514 4 |
|
515 |
|
516 |
|
517 .. rubric:: Footnotes |
|
518 |
|
519 .. [#] The encoding string included in XML output should conform to the |
|
520 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is |
|
521 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl |
|
522 and http://www.iana.org/assignments/character-sets . |
|
523 |