|
1 **************************** |
|
2 What's New in Python 2.0 |
|
3 **************************** |
|
4 |
|
5 :Author: A.M. Kuchling and Moshe Zadka |
|
6 |
|
7 .. |release| replace:: 1.02 |
|
8 |
|
9 .. $Id: whatsnew20.tex 50964 2006-07-30 03:03:43Z fred.drake $ |
|
10 |
|
11 |
|
12 Introduction |
|
13 ============ |
|
14 |
|
15 A new release of Python, version 2.0, was released on October 16, 2000. This |
|
16 article covers the exciting new features in 2.0, highlights some other useful |
|
17 changes, and points out a few incompatible changes that may require rewriting |
|
18 code. |
|
19 |
|
20 Python's development never completely stops between releases, and a steady flow |
|
21 of bug fixes and improvements are always being submitted. A host of minor fixes, |
|
22 a few optimizations, additional docstrings, and better error messages went into |
|
23 2.0; to list them all would be impossible, but they're certainly significant. |
|
24 Consult the publicly-available CVS logs if you want to see the full list. This |
|
25 progress is due to the five developers working for PythonLabs are now getting |
|
26 paid to spend their days fixing bugs, and also due to the improved communication |
|
27 resulting from moving to SourceForge. |
|
28 |
|
29 .. ====================================================================== |
|
30 |
|
31 |
|
32 What About Python 1.6? |
|
33 ====================== |
|
34 |
|
35 Python 1.6 can be thought of as the Contractual Obligations Python release. |
|
36 After the core development team left CNRI in May 2000, CNRI requested that a 1.6 |
|
37 release be created, containing all the work on Python that had been performed at |
|
38 CNRI. Python 1.6 therefore represents the state of the CVS tree as of May 2000, |
|
39 with the most significant new feature being Unicode support. Development |
|
40 continued after May, of course, so the 1.6 tree received a few fixes to ensure |
|
41 that it's forward-compatible with Python 2.0. 1.6 is therefore part of Python's |
|
42 evolution, and not a side branch. |
|
43 |
|
44 So, should you take much interest in Python 1.6? Probably not. The 1.6final |
|
45 and 2.0beta1 releases were made on the same day (September 5, 2000), the plan |
|
46 being to finalize Python 2.0 within a month or so. If you have applications to |
|
47 maintain, there seems little point in breaking things by moving to 1.6, fixing |
|
48 them, and then having another round of breakage within a month by moving to 2.0; |
|
49 you're better off just going straight to 2.0. Most of the really interesting |
|
50 features described in this document are only in 2.0, because a lot of work was |
|
51 done between May and September. |
|
52 |
|
53 .. ====================================================================== |
|
54 |
|
55 |
|
56 New Development Process |
|
57 ======================= |
|
58 |
|
59 The most important change in Python 2.0 may not be to the code at all, but to |
|
60 how Python is developed: in May 2000 the Python developers began using the tools |
|
61 made available by SourceForge for storing source code, tracking bug reports, |
|
62 and managing the queue of patch submissions. To report bugs or submit patches |
|
63 for Python 2.0, use the bug tracking and patch manager tools available from |
|
64 Python's project page, located at http://sourceforge.net/projects/python/. |
|
65 |
|
66 The most important of the services now hosted at SourceForge is the Python CVS |
|
67 tree, the version-controlled repository containing the source code for Python. |
|
68 Previously, there were roughly 7 or so people who had write access to the CVS |
|
69 tree, and all patches had to be inspected and checked in by one of the people on |
|
70 this short list. Obviously, this wasn't very scalable. By moving the CVS tree |
|
71 to SourceForge, it became possible to grant write access to more people; as of |
|
72 September 2000 there were 27 people able to check in changes, a fourfold |
|
73 increase. This makes possible large-scale changes that wouldn't be attempted if |
|
74 they'd have to be filtered through the small group of core developers. For |
|
75 example, one day Peter Schneider-Kamp took it into his head to drop K&R C |
|
76 compatibility and convert the C source for Python to ANSI C. After getting |
|
77 approval on the python-dev mailing list, he launched into a flurry of checkins |
|
78 that lasted about a week, other developers joined in to help, and the job was |
|
79 done. If there were only 5 people with write access, probably that task would |
|
80 have been viewed as "nice, but not worth the time and effort needed" and it |
|
81 would never have gotten done. |
|
82 |
|
83 The shift to using SourceForge's services has resulted in a remarkable increase |
|
84 in the speed of development. Patches now get submitted, commented on, revised |
|
85 by people other than the original submitter, and bounced back and forth between |
|
86 people until the patch is deemed worth checking in. Bugs are tracked in one |
|
87 central location and can be assigned to a specific person for fixing, and we can |
|
88 count the number of open bugs to measure progress. This didn't come without a |
|
89 cost: developers now have more e-mail to deal with, more mailing lists to |
|
90 follow, and special tools had to be written for the new environment. For |
|
91 example, SourceForge sends default patch and bug notification e-mail messages |
|
92 that are completely unhelpful, so Ka-Ping Yee wrote an HTML screen-scraper that |
|
93 sends more useful messages. |
|
94 |
|
95 The ease of adding code caused a few initial growing pains, such as code was |
|
96 checked in before it was ready or without getting clear agreement from the |
|
97 developer group. The approval process that has emerged is somewhat similar to |
|
98 that used by the Apache group. Developers can vote +1, +0, -0, or -1 on a patch; |
|
99 +1 and -1 denote acceptance or rejection, while +0 and -0 mean the developer is |
|
100 mostly indifferent to the change, though with a slight positive or negative |
|
101 slant. The most significant change from the Apache model is that the voting is |
|
102 essentially advisory, letting Guido van Rossum, who has Benevolent Dictator For |
|
103 Life status, know what the general opinion is. He can still ignore the result of |
|
104 a vote, and approve or reject a change even if the community disagrees with him. |
|
105 |
|
106 Producing an actual patch is the last step in adding a new feature, and is |
|
107 usually easy compared to the earlier task of coming up with a good design. |
|
108 Discussions of new features can often explode into lengthy mailing list threads, |
|
109 making the discussion hard to follow, and no one can read every posting to |
|
110 python-dev. Therefore, a relatively formal process has been set up to write |
|
111 Python Enhancement Proposals (PEPs), modelled on the Internet RFC process. PEPs |
|
112 are draft documents that describe a proposed new feature, and are continually |
|
113 revised until the community reaches a consensus, either accepting or rejecting |
|
114 the proposal. Quoting from the introduction to PEP 1, "PEP Purpose and |
|
115 Guidelines": |
|
116 |
|
117 |
|
118 .. epigraph:: |
|
119 |
|
120 PEP stands for Python Enhancement Proposal. A PEP is a design document |
|
121 providing information to the Python community, or describing a new feature for |
|
122 Python. The PEP should provide a concise technical specification of the feature |
|
123 and a rationale for the feature. |
|
124 |
|
125 We intend PEPs to be the primary mechanisms for proposing new features, for |
|
126 collecting community input on an issue, and for documenting the design decisions |
|
127 that have gone into Python. The PEP author is responsible for building |
|
128 consensus within the community and documenting dissenting opinions. |
|
129 |
|
130 Read the rest of PEP 1 for the details of the PEP editorial process, style, and |
|
131 format. PEPs are kept in the Python CVS tree on SourceForge, though they're not |
|
132 part of the Python 2.0 distribution, and are also available in HTML form from |
|
133 http://www.python.org/peps/. As of September 2000, there are 25 PEPS, ranging |
|
134 from PEP 201, "Lockstep Iteration", to PEP 225, "Elementwise/Objectwise |
|
135 Operators". |
|
136 |
|
137 .. ====================================================================== |
|
138 |
|
139 |
|
140 Unicode |
|
141 ======= |
|
142 |
|
143 The largest new feature in Python 2.0 is a new fundamental data type: Unicode |
|
144 strings. Unicode uses 16-bit numbers to represent characters instead of the |
|
145 8-bit number used by ASCII, meaning that 65,536 distinct characters can be |
|
146 supported. |
|
147 |
|
148 The final interface for Unicode support was arrived at through countless often- |
|
149 stormy discussions on the python-dev mailing list, and mostly implemented by |
|
150 Marc-André Lemburg, based on a Unicode string type implementation by Fredrik |
|
151 Lundh. A detailed explanation of the interface was written up as :pep:`100`, |
|
152 "Python Unicode Integration". This article will simply cover the most |
|
153 significant points about the Unicode interfaces. |
|
154 |
|
155 In Python source code, Unicode strings are written as ``u"string"``. Arbitrary |
|
156 Unicode characters can be written using a new escape sequence, ``\uHHHH``, where |
|
157 *HHHH* is a 4-digit hexadecimal number from 0000 to FFFF. The existing |
|
158 ``\xHHHH`` escape sequence can also be used, and octal escapes can be used for |
|
159 characters up to U+01FF, which is represented by ``\777``. |
|
160 |
|
161 Unicode strings, just like regular strings, are an immutable sequence type. |
|
162 They can be indexed and sliced, but not modified in place. Unicode strings have |
|
163 an ``encode( [encoding] )`` method that returns an 8-bit string in the desired |
|
164 encoding. Encodings are named by strings, such as ``'ascii'``, ``'utf-8'``, |
|
165 ``'iso-8859-1'``, or whatever. A codec API is defined for implementing and |
|
166 registering new encodings that are then available throughout a Python program. |
|
167 If an encoding isn't specified, the default encoding is usually 7-bit ASCII, |
|
168 though it can be changed for your Python installation by calling the |
|
169 :func:`sys.setdefaultencoding(encoding)` function in a customised version of |
|
170 :file:`site.py`. |
|
171 |
|
172 Combining 8-bit and Unicode strings always coerces to Unicode, using the default |
|
173 ASCII encoding; the result of ``'a' + u'bc'`` is ``u'abc'``. |
|
174 |
|
175 New built-in functions have been added, and existing built-ins modified to |
|
176 support Unicode: |
|
177 |
|
178 * ``unichr(ch)`` returns a Unicode string 1 character long, containing the |
|
179 character *ch*. |
|
180 |
|
181 * ``ord(u)``, where *u* is a 1-character regular or Unicode string, returns the |
|
182 number of the character as an integer. |
|
183 |
|
184 * ``unicode(string [, encoding] [, errors] )`` creates a Unicode string |
|
185 from an 8-bit string. ``encoding`` is a string naming the encoding to use. The |
|
186 ``errors`` parameter specifies the treatment of characters that are invalid for |
|
187 the current encoding; passing ``'strict'`` as the value causes an exception to |
|
188 be raised on any encoding error, while ``'ignore'`` causes errors to be silently |
|
189 ignored and ``'replace'`` uses U+FFFD, the official replacement character, in |
|
190 case of any problems. |
|
191 |
|
192 * The :keyword:`exec` statement, and various built-ins such as ``eval()``, |
|
193 ``getattr()``, and ``setattr()`` will also accept Unicode strings as well as |
|
194 regular strings. (It's possible that the process of fixing this missed some |
|
195 built-ins; if you find a built-in function that accepts strings but doesn't |
|
196 accept Unicode strings at all, please report it as a bug.) |
|
197 |
|
198 A new module, :mod:`unicodedata`, provides an interface to Unicode character |
|
199 properties. For example, ``unicodedata.category(u'A')`` returns the 2-character |
|
200 string 'Lu', the 'L' denoting it's a letter, and 'u' meaning that it's |
|
201 uppercase. ``unicodedata.bidirectional(u'\u0660')`` returns 'AN', meaning that |
|
202 U+0660 is an Arabic number. |
|
203 |
|
204 The :mod:`codecs` module contains functions to look up existing encodings and |
|
205 register new ones. Unless you want to implement a new encoding, you'll most |
|
206 often use the :func:`codecs.lookup(encoding)` function, which returns a |
|
207 4-element tuple: ``(encode_func, decode_func, stream_reader, stream_writer)``. |
|
208 |
|
209 * *encode_func* is a function that takes a Unicode string, and returns a 2-tuple |
|
210 ``(string, length)``. *string* is an 8-bit string containing a portion (perhaps |
|
211 all) of the Unicode string converted into the given encoding, and *length* tells |
|
212 you how much of the Unicode string was converted. |
|
213 |
|
214 * *decode_func* is the opposite of *encode_func*, taking an 8-bit string and |
|
215 returning a 2-tuple ``(ustring, length)``, consisting of the resulting Unicode |
|
216 string *ustring* and the integer *length* telling how much of the 8-bit string |
|
217 was consumed. |
|
218 |
|
219 * *stream_reader* is a class that supports decoding input from a stream. |
|
220 *stream_reader(file_obj)* returns an object that supports the :meth:`read`, |
|
221 :meth:`readline`, and :meth:`readlines` methods. These methods will all |
|
222 translate from the given encoding and return Unicode strings. |
|
223 |
|
224 * *stream_writer*, similarly, is a class that supports encoding output to a |
|
225 stream. *stream_writer(file_obj)* returns an object that supports the |
|
226 :meth:`write` and :meth:`writelines` methods. These methods expect Unicode |
|
227 strings, translating them to the given encoding on output. |
|
228 |
|
229 For example, the following code writes a Unicode string into a file, encoding |
|
230 it as UTF-8:: |
|
231 |
|
232 import codecs |
|
233 |
|
234 unistr = u'\u0660\u2000ab ...' |
|
235 |
|
236 (UTF8_encode, UTF8_decode, |
|
237 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8') |
|
238 |
|
239 output = UTF8_streamwriter( open( '/tmp/output', 'wb') ) |
|
240 output.write( unistr ) |
|
241 output.close() |
|
242 |
|
243 The following code would then read UTF-8 input from the file:: |
|
244 |
|
245 input = UTF8_streamreader( open( '/tmp/output', 'rb') ) |
|
246 print repr(input.read()) |
|
247 input.close() |
|
248 |
|
249 Unicode-aware regular expressions are available through the :mod:`re` module, |
|
250 which has a new underlying implementation called SRE written by Fredrik Lundh of |
|
251 Secret Labs AB. |
|
252 |
|
253 A ``-U`` command line option was added which causes the Python compiler to |
|
254 interpret all string literals as Unicode string literals. This is intended to be |
|
255 used in testing and future-proofing your Python code, since some future version |
|
256 of Python may drop support for 8-bit strings and provide only Unicode strings. |
|
257 |
|
258 .. ====================================================================== |
|
259 |
|
260 |
|
261 List Comprehensions |
|
262 =================== |
|
263 |
|
264 Lists are a workhorse data type in Python, and many programs manipulate a list |
|
265 at some point. Two common operations on lists are to loop over them, and either |
|
266 pick out the elements that meet a certain criterion, or apply some function to |
|
267 each element. For example, given a list of strings, you might want to pull out |
|
268 all the strings containing a given substring, or strip off trailing whitespace |
|
269 from each line. |
|
270 |
|
271 The existing :func:`map` and :func:`filter` functions can be used for this |
|
272 purpose, but they require a function as one of their arguments. This is fine if |
|
273 there's an existing built-in function that can be passed directly, but if there |
|
274 isn't, you have to create a little function to do the required work, and |
|
275 Python's scoping rules make the result ugly if the little function needs |
|
276 additional information. Take the first example in the previous paragraph, |
|
277 finding all the strings in the list containing a given substring. You could |
|
278 write the following to do it:: |
|
279 |
|
280 # Given the list L, make a list of all strings |
|
281 # containing the substring S. |
|
282 sublist = filter( lambda s, substring=S: |
|
283 string.find(s, substring) != -1, |
|
284 L) |
|
285 |
|
286 Because of Python's scoping rules, a default argument is used so that the |
|
287 anonymous function created by the :keyword:`lambda` statement knows what |
|
288 substring is being searched for. List comprehensions make this cleaner:: |
|
289 |
|
290 sublist = [ s for s in L if string.find(s, S) != -1 ] |
|
291 |
|
292 List comprehensions have the form:: |
|
293 |
|
294 [ expression for expr in sequence1 |
|
295 for expr2 in sequence2 ... |
|
296 for exprN in sequenceN |
|
297 if condition ] |
|
298 |
|
299 The :keyword:`for`...\ :keyword:`in` clauses contain the sequences to be |
|
300 iterated over. The sequences do not have to be the same length, because they |
|
301 are *not* iterated over in parallel, but from left to right; this is explained |
|
302 more clearly in the following paragraphs. The elements of the generated list |
|
303 will be the successive values of *expression*. The final :keyword:`if` clause |
|
304 is optional; if present, *expression* is only evaluated and added to the result |
|
305 if *condition* is true. |
|
306 |
|
307 To make the semantics very clear, a list comprehension is equivalent to the |
|
308 following Python code:: |
|
309 |
|
310 for expr1 in sequence1: |
|
311 for expr2 in sequence2: |
|
312 ... |
|
313 for exprN in sequenceN: |
|
314 if (condition): |
|
315 # Append the value of |
|
316 # the expression to the |
|
317 # resulting list. |
|
318 |
|
319 This means that when there are multiple :keyword:`for`...\ :keyword:`in` |
|
320 clauses, the resulting list will be equal to the product of the lengths of all |
|
321 the sequences. If you have two lists of length 3, the output list is 9 elements |
|
322 long:: |
|
323 |
|
324 seq1 = 'abc' |
|
325 seq2 = (1,2,3) |
|
326 >>> [ (x,y) for x in seq1 for y in seq2] |
|
327 [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1), |
|
328 ('c', 2), ('c', 3)] |
|
329 |
|
330 To avoid introducing an ambiguity into Python's grammar, if *expression* is |
|
331 creating a tuple, it must be surrounded with parentheses. The first list |
|
332 comprehension below is a syntax error, while the second one is correct:: |
|
333 |
|
334 # Syntax error |
|
335 [ x,y for x in seq1 for y in seq2] |
|
336 # Correct |
|
337 [ (x,y) for x in seq1 for y in seq2] |
|
338 |
|
339 The idea of list comprehensions originally comes from the functional programming |
|
340 language Haskell (http://www.haskell.org). Greg Ewing argued most effectively |
|
341 for adding them to Python and wrote the initial list comprehension patch, which |
|
342 was then discussed for a seemingly endless time on the python-dev mailing list |
|
343 and kept up-to-date by Skip Montanaro. |
|
344 |
|
345 .. ====================================================================== |
|
346 |
|
347 |
|
348 Augmented Assignment |
|
349 ==================== |
|
350 |
|
351 Augmented assignment operators, another long-requested feature, have been added |
|
352 to Python 2.0. Augmented assignment operators include ``+=``, ``-=``, ``*=``, |
|
353 and so forth. For example, the statement ``a += 2`` increments the value of the |
|
354 variable ``a`` by 2, equivalent to the slightly lengthier ``a = a + 2``. |
|
355 |
|
356 The full list of supported assignment operators is ``+=``, ``-=``, ``*=``, |
|
357 ``/=``, ``%=``, ``**=``, ``&=``, ``|=``, ``^=``, ``>>=``, and ``<<=``. Python |
|
358 classes can override the augmented assignment operators by defining methods |
|
359 named :meth:`__iadd__`, :meth:`__isub__`, etc. For example, the following |
|
360 :class:`Number` class stores a number and supports using += to create a new |
|
361 instance with an incremented value. |
|
362 |
|
363 .. The empty groups below prevent conversion to guillemets. |
|
364 |
|
365 :: |
|
366 |
|
367 class Number: |
|
368 def __init__(self, value): |
|
369 self.value = value |
|
370 def __iadd__(self, increment): |
|
371 return Number( self.value + increment) |
|
372 |
|
373 n = Number(5) |
|
374 n += 3 |
|
375 print n.value |
|
376 |
|
377 The :meth:`__iadd__` special method is called with the value of the increment, |
|
378 and should return a new instance with an appropriately modified value; this |
|
379 return value is bound as the new value of the variable on the left-hand side. |
|
380 |
|
381 Augmented assignment operators were first introduced in the C programming |
|
382 language, and most C-derived languages, such as :program:`awk`, C++, Java, Perl, |
|
383 and PHP also support them. The augmented assignment patch was implemented by |
|
384 Thomas Wouters. |
|
385 |
|
386 .. ====================================================================== |
|
387 |
|
388 |
|
389 String Methods |
|
390 ============== |
|
391 |
|
392 Until now string-manipulation functionality was in the :mod:`string` module, |
|
393 which was usually a front-end for the :mod:`strop` module written in C. The |
|
394 addition of Unicode posed a difficulty for the :mod:`strop` module, because the |
|
395 functions would all need to be rewritten in order to accept either 8-bit or |
|
396 Unicode strings. For functions such as :func:`string.replace`, which takes 3 |
|
397 string arguments, that means eight possible permutations, and correspondingly |
|
398 complicated code. |
|
399 |
|
400 Instead, Python 2.0 pushes the problem onto the string type, making string |
|
401 manipulation functionality available through methods on both 8-bit strings and |
|
402 Unicode strings. :: |
|
403 |
|
404 >>> 'andrew'.capitalize() |
|
405 'Andrew' |
|
406 >>> 'hostname'.replace('os', 'linux') |
|
407 'hlinuxtname' |
|
408 >>> 'moshe'.find('sh') |
|
409 2 |
|
410 |
|
411 One thing that hasn't changed, a noteworthy April Fools' joke notwithstanding, |
|
412 is that Python strings are immutable. Thus, the string methods return new |
|
413 strings, and do not modify the string on which they operate. |
|
414 |
|
415 The old :mod:`string` module is still around for backwards compatibility, but it |
|
416 mostly acts as a front-end to the new string methods. |
|
417 |
|
418 Two methods which have no parallel in pre-2.0 versions, although they did exist |
|
419 in JPython for quite some time, are :meth:`startswith` and :meth:`endswith`. |
|
420 ``s.startswith(t)`` is equivalent to ``s[:len(t)] == t``, while |
|
421 ``s.endswith(t)`` is equivalent to ``s[-len(t):] == t``. |
|
422 |
|
423 One other method which deserves special mention is :meth:`join`. The |
|
424 :meth:`join` method of a string receives one parameter, a sequence of strings, |
|
425 and is equivalent to the :func:`string.join` function from the old :mod:`string` |
|
426 module, with the arguments reversed. In other words, ``s.join(seq)`` is |
|
427 equivalent to the old ``string.join(seq, s)``. |
|
428 |
|
429 .. ====================================================================== |
|
430 |
|
431 |
|
432 Garbage Collection of Cycles |
|
433 ============================ |
|
434 |
|
435 The C implementation of Python uses reference counting to implement garbage |
|
436 collection. Every Python object maintains a count of the number of references |
|
437 pointing to itself, and adjusts the count as references are created or |
|
438 destroyed. Once the reference count reaches zero, the object is no longer |
|
439 accessible, since you need to have a reference to an object to access it, and if |
|
440 the count is zero, no references exist any longer. |
|
441 |
|
442 Reference counting has some pleasant properties: it's easy to understand and |
|
443 implement, and the resulting implementation is portable, fairly fast, and reacts |
|
444 well with other libraries that implement their own memory handling schemes. The |
|
445 major problem with reference counting is that it sometimes doesn't realise that |
|
446 objects are no longer accessible, resulting in a memory leak. This happens when |
|
447 there are cycles of references. |
|
448 |
|
449 Consider the simplest possible cycle, a class instance which has a reference to |
|
450 itself:: |
|
451 |
|
452 instance = SomeClass() |
|
453 instance.myself = instance |
|
454 |
|
455 After the above two lines of code have been executed, the reference count of |
|
456 ``instance`` is 2; one reference is from the variable named ``'instance'``, and |
|
457 the other is from the ``myself`` attribute of the instance. |
|
458 |
|
459 If the next line of code is ``del instance``, what happens? The reference count |
|
460 of ``instance`` is decreased by 1, so it has a reference count of 1; the |
|
461 reference in the ``myself`` attribute still exists. Yet the instance is no |
|
462 longer accessible through Python code, and it could be deleted. Several objects |
|
463 can participate in a cycle if they have references to each other, causing all of |
|
464 the objects to be leaked. |
|
465 |
|
466 Python 2.0 fixes this problem by periodically executing a cycle detection |
|
467 algorithm which looks for inaccessible cycles and deletes the objects involved. |
|
468 A new :mod:`gc` module provides functions to perform a garbage collection, |
|
469 obtain debugging statistics, and tuning the collector's parameters. |
|
470 |
|
471 Running the cycle detection algorithm takes some time, and therefore will result |
|
472 in some additional overhead. It is hoped that after we've gotten experience |
|
473 with the cycle collection from using 2.0, Python 2.1 will be able to minimize |
|
474 the overhead with careful tuning. It's not yet obvious how much performance is |
|
475 lost, because benchmarking this is tricky and depends crucially on how often the |
|
476 program creates and destroys objects. The detection of cycles can be disabled |
|
477 when Python is compiled, if you can't afford even a tiny speed penalty or |
|
478 suspect that the cycle collection is buggy, by specifying the |
|
479 :option:`--without-cycle-gc` switch when running the :program:`configure` |
|
480 script. |
|
481 |
|
482 Several people tackled this problem and contributed to a solution. An early |
|
483 implementation of the cycle detection approach was written by Toby Kelsey. The |
|
484 current algorithm was suggested by Eric Tiedemann during a visit to CNRI, and |
|
485 Guido van Rossum and Neil Schemenauer wrote two different implementations, which |
|
486 were later integrated by Neil. Lots of other people offered suggestions along |
|
487 the way; the March 2000 archives of the python-dev mailing list contain most of |
|
488 the relevant discussion, especially in the threads titled "Reference cycle |
|
489 collection for Python" and "Finalization again". |
|
490 |
|
491 .. ====================================================================== |
|
492 |
|
493 |
|
494 Other Core Changes |
|
495 ================== |
|
496 |
|
497 Various minor changes have been made to Python's syntax and built-in functions. |
|
498 None of the changes are very far-reaching, but they're handy conveniences. |
|
499 |
|
500 |
|
501 Minor Language Changes |
|
502 ---------------------- |
|
503 |
|
504 A new syntax makes it more convenient to call a given function with a tuple of |
|
505 arguments and/or a dictionary of keyword arguments. In Python 1.5 and earlier, |
|
506 you'd use the :func:`apply` built-in function: ``apply(f, args, kw)`` calls the |
|
507 function :func:`f` with the argument tuple *args* and the keyword arguments in |
|
508 the dictionary *kw*. :func:`apply` is the same in 2.0, but thanks to a patch |
|
509 from Greg Ewing, ``f(*args, **kw)`` as a shorter and clearer way to achieve the |
|
510 same effect. This syntax is symmetrical with the syntax for defining |
|
511 functions:: |
|
512 |
|
513 def f(*args, **kw): |
|
514 # args is a tuple of positional args, |
|
515 # kw is a dictionary of keyword args |
|
516 ... |
|
517 |
|
518 The :keyword:`print` statement can now have its output directed to a file-like |
|
519 object by following the :keyword:`print` with ``>> file``, similar to the |
|
520 redirection operator in Unix shells. Previously you'd either have to use the |
|
521 :meth:`write` method of the file-like object, which lacks the convenience and |
|
522 simplicity of :keyword:`print`, or you could assign a new value to |
|
523 ``sys.stdout`` and then restore the old value. For sending output to standard |
|
524 error, it's much easier to write this:: |
|
525 |
|
526 print >> sys.stderr, "Warning: action field not supplied" |
|
527 |
|
528 Modules can now be renamed on importing them, using the syntax ``import module |
|
529 as name`` or ``from module import name as othername``. The patch was submitted |
|
530 by Thomas Wouters. |
|
531 |
|
532 A new format style is available when using the ``%`` operator; '%r' will insert |
|
533 the :func:`repr` of its argument. This was also added from symmetry |
|
534 considerations, this time for symmetry with the existing '%s' format style, |
|
535 which inserts the :func:`str` of its argument. For example, ``'%r %s' % ('abc', |
|
536 'abc')`` returns a string containing ``'abc' abc``. |
|
537 |
|
538 Previously there was no way to implement a class that overrode Python's built-in |
|
539 :keyword:`in` operator and implemented a custom version. ``obj in seq`` returns |
|
540 true if *obj* is present in the sequence *seq*; Python computes this by simply |
|
541 trying every index of the sequence until either *obj* is found or an |
|
542 :exc:`IndexError` is encountered. Moshe Zadka contributed a patch which adds a |
|
543 :meth:`__contains__` magic method for providing a custom implementation for |
|
544 :keyword:`in`. Additionally, new built-in objects written in C can define what |
|
545 :keyword:`in` means for them via a new slot in the sequence protocol. |
|
546 |
|
547 Earlier versions of Python used a recursive algorithm for deleting objects. |
|
548 Deeply nested data structures could cause the interpreter to fill up the C stack |
|
549 and crash; Christian Tismer rewrote the deletion logic to fix this problem. On |
|
550 a related note, comparing recursive objects recursed infinitely and crashed; |
|
551 Jeremy Hylton rewrote the code to no longer crash, producing a useful result |
|
552 instead. For example, after this code:: |
|
553 |
|
554 a = [] |
|
555 b = [] |
|
556 a.append(a) |
|
557 b.append(b) |
|
558 |
|
559 The comparison ``a==b`` returns true, because the two recursive data structures |
|
560 are isomorphic. See the thread "trashcan and PR#7" in the April 2000 archives of |
|
561 the python-dev mailing list for the discussion leading up to this |
|
562 implementation, and some useful relevant links. Note that comparisons can now |
|
563 also raise exceptions. In earlier versions of Python, a comparison operation |
|
564 such as ``cmp(a,b)`` would always produce an answer, even if a user-defined |
|
565 :meth:`__cmp__` method encountered an error, since the resulting exception would |
|
566 simply be silently swallowed. |
|
567 |
|
568 .. Starting URL: |
|
569 .. http://www.python.org/pipermail/python-dev/2000-April/004834.html |
|
570 |
|
571 Work has been done on porting Python to 64-bit Windows on the Itanium processor, |
|
572 mostly by Trent Mick of ActiveState. (Confusingly, ``sys.platform`` is still |
|
573 ``'win32'`` on Win64 because it seems that for ease of porting, MS Visual C++ |
|
574 treats code as 32 bit on Itanium.) PythonWin also supports Windows CE; see the |
|
575 Python CE page at http://starship.python.net/crew/mhammond/ce/ for more |
|
576 information. |
|
577 |
|
578 Another new platform is Darwin/MacOS X; initial support for it is in Python 2.0. |
|
579 Dynamic loading works, if you specify "configure --with-dyld --with-suffix=.x". |
|
580 Consult the README in the Python source distribution for more instructions. |
|
581 |
|
582 An attempt has been made to alleviate one of Python's warts, the often-confusing |
|
583 :exc:`NameError` exception when code refers to a local variable before the |
|
584 variable has been assigned a value. For example, the following code raises an |
|
585 exception on the :keyword:`print` statement in both 1.5.2 and 2.0; in 1.5.2 a |
|
586 :exc:`NameError` exception is raised, while 2.0 raises a new |
|
587 :exc:`UnboundLocalError` exception. :exc:`UnboundLocalError` is a subclass of |
|
588 :exc:`NameError`, so any existing code that expects :exc:`NameError` to be |
|
589 raised should still work. :: |
|
590 |
|
591 def f(): |
|
592 print "i=",i |
|
593 i = i + 1 |
|
594 f() |
|
595 |
|
596 Two new exceptions, :exc:`TabError` and :exc:`IndentationError`, have been |
|
597 introduced. They're both subclasses of :exc:`SyntaxError`, and are raised when |
|
598 Python code is found to be improperly indented. |
|
599 |
|
600 |
|
601 Changes to Built-in Functions |
|
602 ----------------------------- |
|
603 |
|
604 A new built-in, :func:`zip(seq1, seq2, ...)`, has been added. :func:`zip` |
|
605 returns a list of tuples where each tuple contains the i-th element from each of |
|
606 the argument sequences. The difference between :func:`zip` and ``map(None, |
|
607 seq1, seq2)`` is that :func:`map` pads the sequences with ``None`` if the |
|
608 sequences aren't all of the same length, while :func:`zip` truncates the |
|
609 returned list to the length of the shortest argument sequence. |
|
610 |
|
611 The :func:`int` and :func:`long` functions now accept an optional "base" |
|
612 parameter when the first argument is a string. ``int('123', 10)`` returns 123, |
|
613 while ``int('123', 16)`` returns 291. ``int(123, 16)`` raises a |
|
614 :exc:`TypeError` exception with the message "can't convert non-string with |
|
615 explicit base". |
|
616 |
|
617 A new variable holding more detailed version information has been added to the |
|
618 :mod:`sys` module. ``sys.version_info`` is a tuple ``(major, minor, micro, |
|
619 level, serial)`` For example, in a hypothetical 2.0.1beta1, ``sys.version_info`` |
|
620 would be ``(2, 0, 1, 'beta', 1)``. *level* is a string such as ``"alpha"``, |
|
621 ``"beta"``, or ``"final"`` for a final release. |
|
622 |
|
623 Dictionaries have an odd new method, :meth:`setdefault(key, default)`, which |
|
624 behaves similarly to the existing :meth:`get` method. However, if the key is |
|
625 missing, :meth:`setdefault` both returns the value of *default* as :meth:`get` |
|
626 would do, and also inserts it into the dictionary as the value for *key*. Thus, |
|
627 the following lines of code:: |
|
628 |
|
629 if dict.has_key( key ): return dict[key] |
|
630 else: |
|
631 dict[key] = [] |
|
632 return dict[key] |
|
633 |
|
634 can be reduced to a single ``return dict.setdefault(key, [])`` statement. |
|
635 |
|
636 The interpreter sets a maximum recursion depth in order to catch runaway |
|
637 recursion before filling the C stack and causing a core dump or GPF.. |
|
638 Previously this limit was fixed when you compiled Python, but in 2.0 the maximum |
|
639 recursion depth can be read and modified using :func:`sys.getrecursionlimit` and |
|
640 :func:`sys.setrecursionlimit`. The default value is 1000, and a rough maximum |
|
641 value for a given platform can be found by running a new script, |
|
642 :file:`Misc/find_recursionlimit.py`. |
|
643 |
|
644 .. ====================================================================== |
|
645 |
|
646 |
|
647 Porting to 2.0 |
|
648 ============== |
|
649 |
|
650 New Python releases try hard to be compatible with previous releases, and the |
|
651 record has been pretty good. However, some changes are considered useful |
|
652 enough, usually because they fix initial design decisions that turned out to be |
|
653 actively mistaken, that breaking backward compatibility can't always be avoided. |
|
654 This section lists the changes in Python 2.0 that may cause old Python code to |
|
655 break. |
|
656 |
|
657 The change which will probably break the most code is tightening up the |
|
658 arguments accepted by some methods. Some methods would take multiple arguments |
|
659 and treat them as a tuple, particularly various list methods such as |
|
660 :meth:`.append` and :meth:`.insert`. In earlier versions of Python, if ``L`` is |
|
661 a list, ``L.append( 1,2 )`` appends the tuple ``(1,2)`` to the list. In Python |
|
662 2.0 this causes a :exc:`TypeError` exception to be raised, with the message: |
|
663 'append requires exactly 1 argument; 2 given'. The fix is to simply add an |
|
664 extra set of parentheses to pass both values as a tuple: ``L.append( (1,2) )``. |
|
665 |
|
666 The earlier versions of these methods were more forgiving because they used an |
|
667 old function in Python's C interface to parse their arguments; 2.0 modernizes |
|
668 them to use :func:`PyArg_ParseTuple`, the current argument parsing function, |
|
669 which provides more helpful error messages and treats multi-argument calls as |
|
670 errors. If you absolutely must use 2.0 but can't fix your code, you can edit |
|
671 :file:`Objects/listobject.c` and define the preprocessor symbol |
|
672 ``NO_STRICT_LIST_APPEND`` to preserve the old behaviour; this isn't recommended. |
|
673 |
|
674 Some of the functions in the :mod:`socket` module are still forgiving in this |
|
675 way. For example, :func:`socket.connect( ('hostname', 25) )` is the correct |
|
676 form, passing a tuple representing an IP address, but :func:`socket.connect( |
|
677 'hostname', 25 )` also works. :func:`socket.connect_ex` and :func:`socket.bind` |
|
678 are similarly easy-going. 2.0alpha1 tightened these functions up, but because |
|
679 the documentation actually used the erroneous multiple argument form, many |
|
680 people wrote code which would break with the stricter checking. GvR backed out |
|
681 the changes in the face of public reaction, so for the :mod:`socket` module, the |
|
682 documentation was fixed and the multiple argument form is simply marked as |
|
683 deprecated; it *will* be tightened up again in a future Python version. |
|
684 |
|
685 The ``\x`` escape in string literals now takes exactly 2 hex digits. Previously |
|
686 it would consume all the hex digits following the 'x' and take the lowest 8 bits |
|
687 of the result, so ``\x123456`` was equivalent to ``\x56``. |
|
688 |
|
689 The :exc:`AttributeError` and :exc:`NameError` exceptions have a more friendly |
|
690 error message, whose text will be something like ``'Spam' instance has no |
|
691 attribute 'eggs'`` or ``name 'eggs' is not defined``. Previously the error |
|
692 message was just the missing attribute name ``eggs``, and code written to take |
|
693 advantage of this fact will break in 2.0. |
|
694 |
|
695 Some work has been done to make integers and long integers a bit more |
|
696 interchangeable. In 1.5.2, large-file support was added for Solaris, to allow |
|
697 reading files larger than 2 GiB; this made the :meth:`tell` method of file |
|
698 objects return a long integer instead of a regular integer. Some code would |
|
699 subtract two file offsets and attempt to use the result to multiply a sequence |
|
700 or slice a string, but this raised a :exc:`TypeError`. In 2.0, long integers |
|
701 can be used to multiply or slice a sequence, and it'll behave as you'd |
|
702 intuitively expect it to; ``3L * 'abc'`` produces 'abcabcabc', and |
|
703 ``(0,1,2,3)[2L:4L]`` produces (2,3). Long integers can also be used in various |
|
704 contexts where previously only integers were accepted, such as in the |
|
705 :meth:`seek` method of file objects, and in the formats supported by the ``%`` |
|
706 operator (``%d``, ``%i``, ``%x``, etc.). For example, ``"%d" % 2L**64`` will |
|
707 produce the string ``18446744073709551616``. |
|
708 |
|
709 The subtlest long integer change of all is that the :func:`str` of a long |
|
710 integer no longer has a trailing 'L' character, though :func:`repr` still |
|
711 includes it. The 'L' annoyed many people who wanted to print long integers that |
|
712 looked just like regular integers, since they had to go out of their way to chop |
|
713 off the character. This is no longer a problem in 2.0, but code which does |
|
714 ``str(longval)[:-1]`` and assumes the 'L' is there, will now lose the final |
|
715 digit. |
|
716 |
|
717 Taking the :func:`repr` of a float now uses a different formatting precision |
|
718 than :func:`str`. :func:`repr` uses ``%.17g`` format string for C's |
|
719 :func:`sprintf`, while :func:`str` uses ``%.12g`` as before. The effect is that |
|
720 :func:`repr` may occasionally show more decimal places than :func:`str`, for |
|
721 certain numbers. For example, the number 8.1 can't be represented exactly in |
|
722 binary, so ``repr(8.1)`` is ``'8.0999999999999996'``, while str(8.1) is |
|
723 ``'8.1'``. |
|
724 |
|
725 The ``-X`` command-line option, which turned all standard exceptions into |
|
726 strings instead of classes, has been removed; the standard exceptions will now |
|
727 always be classes. The :mod:`exceptions` module containing the standard |
|
728 exceptions was translated from Python to a built-in C module, written by Barry |
|
729 Warsaw and Fredrik Lundh. |
|
730 |
|
731 .. Commented out for now -- I don't think anyone will care. |
|
732 The pattern and match objects provided by SRE are C types, not Python |
|
733 class instances as in 1.5. This means you can no longer inherit from |
|
734 \class{RegexObject} or \class{MatchObject}, but that shouldn't be much |
|
735 of a problem since no one should have been doing that in the first |
|
736 place. |
|
737 .. ====================================================================== |
|
738 |
|
739 |
|
740 Extending/Embedding Changes |
|
741 =========================== |
|
742 |
|
743 Some of the changes are under the covers, and will only be apparent to people |
|
744 writing C extension modules or embedding a Python interpreter in a larger |
|
745 application. If you aren't dealing with Python's C API, you can safely skip |
|
746 this section. |
|
747 |
|
748 The version number of the Python C API was incremented, so C extensions compiled |
|
749 for 1.5.2 must be recompiled in order to work with 2.0. On Windows, it's not |
|
750 possible for Python 2.0 to import a third party extension built for Python 1.5.x |
|
751 due to how Windows DLLs work, so Python will raise an exception and the import |
|
752 will fail. |
|
753 |
|
754 Users of Jim Fulton's ExtensionClass module will be pleased to find out that |
|
755 hooks have been added so that ExtensionClasses are now supported by |
|
756 :func:`isinstance` and :func:`issubclass`. This means you no longer have to |
|
757 remember to write code such as ``if type(obj) == myExtensionClass``, but can use |
|
758 the more natural ``if isinstance(obj, myExtensionClass)``. |
|
759 |
|
760 The :file:`Python/importdl.c` file, which was a mass of #ifdefs to support |
|
761 dynamic loading on many different platforms, was cleaned up and reorganised by |
|
762 Greg Stein. :file:`importdl.c` is now quite small, and platform-specific code |
|
763 has been moved into a bunch of :file:`Python/dynload_\*.c` files. Another |
|
764 cleanup: there were also a number of :file:`my\*.h` files in the Include/ |
|
765 directory that held various portability hacks; they've been merged into a single |
|
766 file, :file:`Include/pyport.h`. |
|
767 |
|
768 Vladimir Marangozov's long-awaited malloc restructuring was completed, to make |
|
769 it easy to have the Python interpreter use a custom allocator instead of C's |
|
770 standard :func:`malloc`. For documentation, read the comments in |
|
771 :file:`Include/pymem.h` and :file:`Include/objimpl.h`. For the lengthy |
|
772 discussions during which the interface was hammered out, see the Web archives of |
|
773 the 'patches' and 'python-dev' lists at python.org. |
|
774 |
|
775 Recent versions of the GUSI development environment for MacOS support POSIX |
|
776 threads. Therefore, Python's POSIX threading support now works on the |
|
777 Macintosh. Threading support using the user-space GNU ``pth`` library was also |
|
778 contributed. |
|
779 |
|
780 Threading support on Windows was enhanced, too. Windows supports thread locks |
|
781 that use kernel objects only in case of contention; in the common case when |
|
782 there's no contention, they use simpler functions which are an order of |
|
783 magnitude faster. A threaded version of Python 1.5.2 on NT is twice as slow as |
|
784 an unthreaded version; with the 2.0 changes, the difference is only 10%. These |
|
785 improvements were contributed by Yakov Markovitch. |
|
786 |
|
787 Python 2.0's source now uses only ANSI C prototypes, so compiling Python now |
|
788 requires an ANSI C compiler, and can no longer be done using a compiler that |
|
789 only supports K&R C. |
|
790 |
|
791 Previously the Python virtual machine used 16-bit numbers in its bytecode, |
|
792 limiting the size of source files. In particular, this affected the maximum |
|
793 size of literal lists and dictionaries in Python source; occasionally people who |
|
794 are generating Python code would run into this limit. A patch by Charles G. |
|
795 Waldman raises the limit from ``2^16`` to ``2^{32}``. |
|
796 |
|
797 Three new convenience functions intended for adding constants to a module's |
|
798 dictionary at module initialization time were added: :func:`PyModule_AddObject`, |
|
799 :func:`PyModule_AddIntConstant`, and :func:`PyModule_AddStringConstant`. Each |
|
800 of these functions takes a module object, a null-terminated C string containing |
|
801 the name to be added, and a third argument for the value to be assigned to the |
|
802 name. This third argument is, respectively, a Python object, a C long, or a C |
|
803 string. |
|
804 |
|
805 A wrapper API was added for Unix-style signal handlers. :func:`PyOS_getsig` gets |
|
806 a signal handler and :func:`PyOS_setsig` will set a new handler. |
|
807 |
|
808 .. ====================================================================== |
|
809 |
|
810 |
|
811 Distutils: Making Modules Easy to Install |
|
812 ========================================= |
|
813 |
|
814 Before Python 2.0, installing modules was a tedious affair -- there was no way |
|
815 to figure out automatically where Python is installed, or what compiler options |
|
816 to use for extension modules. Software authors had to go through an arduous |
|
817 ritual of editing Makefiles and configuration files, which only really work on |
|
818 Unix and leave Windows and MacOS unsupported. Python users faced wildly |
|
819 differing installation instructions which varied between different extension |
|
820 packages, which made administering a Python installation something of a chore. |
|
821 |
|
822 The SIG for distribution utilities, shepherded by Greg Ward, has created the |
|
823 Distutils, a system to make package installation much easier. They form the |
|
824 :mod:`distutils` package, a new part of Python's standard library. In the best |
|
825 case, installing a Python module from source will require the same steps: first |
|
826 you simply mean unpack the tarball or zip archive, and the run "``python |
|
827 setup.py install``". The platform will be automatically detected, the compiler |
|
828 will be recognized, C extension modules will be compiled, and the distribution |
|
829 installed into the proper directory. Optional command-line arguments provide |
|
830 more control over the installation process, the distutils package offers many |
|
831 places to override defaults -- separating the build from the install, building |
|
832 or installing in non-default directories, and more. |
|
833 |
|
834 In order to use the Distutils, you need to write a :file:`setup.py` script. For |
|
835 the simple case, when the software contains only .py files, a minimal |
|
836 :file:`setup.py` can be just a few lines long:: |
|
837 |
|
838 from distutils.core import setup |
|
839 setup (name = "foo", version = "1.0", |
|
840 py_modules = ["module1", "module2"]) |
|
841 |
|
842 The :file:`setup.py` file isn't much more complicated if the software consists |
|
843 of a few packages:: |
|
844 |
|
845 from distutils.core import setup |
|
846 setup (name = "foo", version = "1.0", |
|
847 packages = ["package", "package.subpackage"]) |
|
848 |
|
849 A C extension can be the most complicated case; here's an example taken from |
|
850 the PyXML package:: |
|
851 |
|
852 from distutils.core import setup, Extension |
|
853 |
|
854 expat_extension = Extension('xml.parsers.pyexpat', |
|
855 define_macros = [('XML_NS', None)], |
|
856 include_dirs = [ 'extensions/expat/xmltok', |
|
857 'extensions/expat/xmlparse' ], |
|
858 sources = [ 'extensions/pyexpat.c', |
|
859 'extensions/expat/xmltok/xmltok.c', |
|
860 'extensions/expat/xmltok/xmlrole.c', |
|
861 ] |
|
862 ) |
|
863 setup (name = "PyXML", version = "0.5.4", |
|
864 ext_modules =[ expat_extension ] ) |
|
865 |
|
866 The Distutils can also take care of creating source and binary distributions. |
|
867 The "sdist" command, run by "``python setup.py sdist``', builds a source |
|
868 distribution such as :file:`foo-1.0.tar.gz`. Adding new commands isn't |
|
869 difficult, "bdist_rpm" and "bdist_wininst" commands have already been |
|
870 contributed to create an RPM distribution and a Windows installer for the |
|
871 software, respectively. Commands to create other distribution formats such as |
|
872 Debian packages and Solaris :file:`.pkg` files are in various stages of |
|
873 development. |
|
874 |
|
875 All this is documented in a new manual, *Distributing Python Modules*, that |
|
876 joins the basic set of Python documentation. |
|
877 |
|
878 .. ====================================================================== |
|
879 |
|
880 |
|
881 XML Modules |
|
882 =========== |
|
883 |
|
884 Python 1.5.2 included a simple XML parser in the form of the :mod:`xmllib` |
|
885 module, contributed by Sjoerd Mullender. Since 1.5.2's release, two different |
|
886 interfaces for processing XML have become common: SAX2 (version 2 of the Simple |
|
887 API for XML) provides an event-driven interface with some similarities to |
|
888 :mod:`xmllib`, and the DOM (Document Object Model) provides a tree-based |
|
889 interface, transforming an XML document into a tree of nodes that can be |
|
890 traversed and modified. Python 2.0 includes a SAX2 interface and a stripped- |
|
891 down DOM interface as part of the :mod:`xml` package. Here we will give a brief |
|
892 overview of these new interfaces; consult the Python documentation or the source |
|
893 code for complete details. The Python XML SIG is also working on improved |
|
894 documentation. |
|
895 |
|
896 |
|
897 SAX2 Support |
|
898 ------------ |
|
899 |
|
900 SAX defines an event-driven interface for parsing XML. To use SAX, you must |
|
901 write a SAX handler class. Handler classes inherit from various classes |
|
902 provided by SAX, and override various methods that will then be called by the |
|
903 XML parser. For example, the :meth:`startElement` and :meth:`endElement` |
|
904 methods are called for every starting and end tag encountered by the parser, the |
|
905 :meth:`characters` method is called for every chunk of character data, and so |
|
906 forth. |
|
907 |
|
908 The advantage of the event-driven approach is that the whole document doesn't |
|
909 have to be resident in memory at any one time, which matters if you are |
|
910 processing really huge documents. However, writing the SAX handler class can |
|
911 get very complicated if you're trying to modify the document structure in some |
|
912 elaborate way. |
|
913 |
|
914 For example, this little example program defines a handler that prints a message |
|
915 for every starting and ending tag, and then parses the file :file:`hamlet.xml` |
|
916 using it:: |
|
917 |
|
918 from xml import sax |
|
919 |
|
920 class SimpleHandler(sax.ContentHandler): |
|
921 def startElement(self, name, attrs): |
|
922 print 'Start of element:', name, attrs.keys() |
|
923 |
|
924 def endElement(self, name): |
|
925 print 'End of element:', name |
|
926 |
|
927 # Create a parser object |
|
928 parser = sax.make_parser() |
|
929 |
|
930 # Tell it what handler to use |
|
931 handler = SimpleHandler() |
|
932 parser.setContentHandler( handler ) |
|
933 |
|
934 # Parse a file! |
|
935 parser.parse( 'hamlet.xml' ) |
|
936 |
|
937 For more information, consult the Python documentation, or the XML HOWTO at |
|
938 http://pyxml.sourceforge.net/topics/howto/xml-howto.html. |
|
939 |
|
940 |
|
941 DOM Support |
|
942 ----------- |
|
943 |
|
944 The Document Object Model is a tree-based representation for an XML document. A |
|
945 top-level :class:`Document` instance is the root of the tree, and has a single |
|
946 child which is the top-level :class:`Element` instance. This :class:`Element` |
|
947 has children nodes representing character data and any sub-elements, which may |
|
948 have further children of their own, and so forth. Using the DOM you can |
|
949 traverse the resulting tree any way you like, access element and attribute |
|
950 values, insert and delete nodes, and convert the tree back into XML. |
|
951 |
|
952 The DOM is useful for modifying XML documents, because you can create a DOM |
|
953 tree, modify it by adding new nodes or rearranging subtrees, and then produce a |
|
954 new XML document as output. You can also construct a DOM tree manually and |
|
955 convert it to XML, which can be a more flexible way of producing XML output than |
|
956 simply writing ``<tag1>``...\ ``</tag1>`` to a file. |
|
957 |
|
958 The DOM implementation included with Python lives in the :mod:`xml.dom.minidom` |
|
959 module. It's a lightweight implementation of the Level 1 DOM with support for |
|
960 XML namespaces. The :func:`parse` and :func:`parseString` convenience |
|
961 functions are provided for generating a DOM tree:: |
|
962 |
|
963 from xml.dom import minidom |
|
964 doc = minidom.parse('hamlet.xml') |
|
965 |
|
966 ``doc`` is a :class:`Document` instance. :class:`Document`, like all the other |
|
967 DOM classes such as :class:`Element` and :class:`Text`, is a subclass of the |
|
968 :class:`Node` base class. All the nodes in a DOM tree therefore support certain |
|
969 common methods, such as :meth:`toxml` which returns a string containing the XML |
|
970 representation of the node and its children. Each class also has special |
|
971 methods of its own; for example, :class:`Element` and :class:`Document` |
|
972 instances have a method to find all child elements with a given tag name. |
|
973 Continuing from the previous 2-line example:: |
|
974 |
|
975 perslist = doc.getElementsByTagName( 'PERSONA' ) |
|
976 print perslist[0].toxml() |
|
977 print perslist[1].toxml() |
|
978 |
|
979 For the *Hamlet* XML file, the above few lines output:: |
|
980 |
|
981 <PERSONA>CLAUDIUS, king of Denmark. </PERSONA> |
|
982 <PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA> |
|
983 |
|
984 The root element of the document is available as ``doc.documentElement``, and |
|
985 its children can be easily modified by deleting, adding, or removing nodes:: |
|
986 |
|
987 root = doc.documentElement |
|
988 |
|
989 # Remove the first child |
|
990 root.removeChild( root.childNodes[0] ) |
|
991 |
|
992 # Move the new first child to the end |
|
993 root.appendChild( root.childNodes[0] ) |
|
994 |
|
995 # Insert the new first child (originally, |
|
996 # the third child) before the 20th child. |
|
997 root.insertBefore( root.childNodes[0], root.childNodes[20] ) |
|
998 |
|
999 Again, I will refer you to the Python documentation for a complete listing of |
|
1000 the different :class:`Node` classes and their various methods. |
|
1001 |
|
1002 |
|
1003 Relationship to PyXML |
|
1004 --------------------- |
|
1005 |
|
1006 The XML Special Interest Group has been working on XML-related Python code for a |
|
1007 while. Its code distribution, called PyXML, is available from the SIG's Web |
|
1008 pages at http://www.python.org/sigs/xml-sig/. The PyXML distribution also used |
|
1009 the package name ``xml``. If you've written programs that used PyXML, you're |
|
1010 probably wondering about its compatibility with the 2.0 :mod:`xml` package. |
|
1011 |
|
1012 The answer is that Python 2.0's :mod:`xml` package isn't compatible with PyXML, |
|
1013 but can be made compatible by installing a recent version PyXML. Many |
|
1014 applications can get by with the XML support that is included with Python 2.0, |
|
1015 but more complicated applications will require that the full PyXML package will |
|
1016 be installed. When installed, PyXML versions 0.6.0 or greater will replace the |
|
1017 :mod:`xml` package shipped with Python, and will be a strict superset of the |
|
1018 standard package, adding a bunch of additional features. Some of the additional |
|
1019 features in PyXML include: |
|
1020 |
|
1021 * 4DOM, a full DOM implementation from FourThought, Inc. |
|
1022 |
|
1023 * The xmlproc validating parser, written by Lars Marius Garshol. |
|
1024 |
|
1025 * The :mod:`sgmlop` parser accelerator module, written by Fredrik Lundh. |
|
1026 |
|
1027 .. ====================================================================== |
|
1028 |
|
1029 |
|
1030 Module changes |
|
1031 ============== |
|
1032 |
|
1033 Lots of improvements and bugfixes were made to Python's extensive standard |
|
1034 library; some of the affected modules include :mod:`readline`, |
|
1035 :mod:`ConfigParser`, :mod:`cgi`, :mod:`calendar`, :mod:`posix`, :mod:`readline`, |
|
1036 :mod:`xmllib`, :mod:`aifc`, :mod:`chunk, wave`, :mod:`random`, :mod:`shelve`, |
|
1037 and :mod:`nntplib`. Consult the CVS logs for the exact patch-by-patch details. |
|
1038 |
|
1039 Brian Gallew contributed OpenSSL support for the :mod:`socket` module. OpenSSL |
|
1040 is an implementation of the Secure Socket Layer, which encrypts the data being |
|
1041 sent over a socket. When compiling Python, you can edit :file:`Modules/Setup` |
|
1042 to include SSL support, which adds an additional function to the :mod:`socket` |
|
1043 module: :func:`socket.ssl(socket, keyfile, certfile)`, which takes a socket |
|
1044 object and returns an SSL socket. The :mod:`httplib` and :mod:`urllib` modules |
|
1045 were also changed to support "https://" URLs, though no one has implemented FTP |
|
1046 or SMTP over SSL. |
|
1047 |
|
1048 The :mod:`httplib` module has been rewritten by Greg Stein to support HTTP/1.1. |
|
1049 Backward compatibility with the 1.5 version of :mod:`httplib` is provided, |
|
1050 though using HTTP/1.1 features such as pipelining will require rewriting code to |
|
1051 use a different set of interfaces. |
|
1052 |
|
1053 The :mod:`Tkinter` module now supports Tcl/Tk version 8.1, 8.2, or 8.3, and |
|
1054 support for the older 7.x versions has been dropped. The Tkinter module now |
|
1055 supports displaying Unicode strings in Tk widgets. Also, Fredrik Lundh |
|
1056 contributed an optimization which makes operations like ``create_line`` and |
|
1057 ``create_polygon`` much faster, especially when using lots of coordinates. |
|
1058 |
|
1059 The :mod:`curses` module has been greatly extended, starting from Oliver |
|
1060 Andrich's enhanced version, to provide many additional functions from ncurses |
|
1061 and SYSV curses, such as colour, alternative character set support, pads, and |
|
1062 mouse support. This means the module is no longer compatible with operating |
|
1063 systems that only have BSD curses, but there don't seem to be any currently |
|
1064 maintained OSes that fall into this category. |
|
1065 |
|
1066 As mentioned in the earlier discussion of 2.0's Unicode support, the underlying |
|
1067 implementation of the regular expressions provided by the :mod:`re` module has |
|
1068 been changed. SRE, a new regular expression engine written by Fredrik Lundh and |
|
1069 partially funded by Hewlett Packard, supports matching against both 8-bit |
|
1070 strings and Unicode strings. |
|
1071 |
|
1072 .. ====================================================================== |
|
1073 |
|
1074 |
|
1075 New modules |
|
1076 =========== |
|
1077 |
|
1078 A number of new modules were added. We'll simply list them with brief |
|
1079 descriptions; consult the 2.0 documentation for the details of a particular |
|
1080 module. |
|
1081 |
|
1082 * :mod:`atexit`: For registering functions to be called before the Python |
|
1083 interpreter exits. Code that currently sets ``sys.exitfunc`` directly should be |
|
1084 changed to use the :mod:`atexit` module instead, importing :mod:`atexit` and |
|
1085 calling :func:`atexit.register` with the function to be called on exit. |
|
1086 (Contributed by Skip Montanaro.) |
|
1087 |
|
1088 * :mod:`codecs`, :mod:`encodings`, :mod:`unicodedata`: Added as part of the new |
|
1089 Unicode support. |
|
1090 |
|
1091 * :mod:`filecmp`: Supersedes the old :mod:`cmp`, :mod:`cmpcache` and |
|
1092 :mod:`dircmp` modules, which have now become deprecated. (Contributed by Gordon |
|
1093 MacMillan and Moshe Zadka.) |
|
1094 |
|
1095 * :mod:`gettext`: This module provides internationalization (I18N) and |
|
1096 localization (L10N) support for Python programs by providing an interface to the |
|
1097 GNU gettext message catalog library. (Integrated by Barry Warsaw, from separate |
|
1098 contributions by Martin von Löwis, Peter Funk, and James Henstridge.) |
|
1099 |
|
1100 * :mod:`linuxaudiodev`: Support for the :file:`/dev/audio` device on Linux, a |
|
1101 twin to the existing :mod:`sunaudiodev` module. (Contributed by Peter Bosch, |
|
1102 with fixes by Jeremy Hylton.) |
|
1103 |
|
1104 * :mod:`mmap`: An interface to memory-mapped files on both Windows and Unix. A |
|
1105 file's contents can be mapped directly into memory, at which point it behaves |
|
1106 like a mutable string, so its contents can be read and modified. They can even |
|
1107 be passed to functions that expect ordinary strings, such as the :mod:`re` |
|
1108 module. (Contributed by Sam Rushing, with some extensions by A.M. Kuchling.) |
|
1109 |
|
1110 * :mod:`pyexpat`: An interface to the Expat XML parser. (Contributed by Paul |
|
1111 Prescod.) |
|
1112 |
|
1113 * :mod:`robotparser`: Parse a :file:`robots.txt` file, which is used for writing |
|
1114 Web spiders that politely avoid certain areas of a Web site. The parser accepts |
|
1115 the contents of a :file:`robots.txt` file, builds a set of rules from it, and |
|
1116 can then answer questions about the fetchability of a given URL. (Contributed |
|
1117 by Skip Montanaro.) |
|
1118 |
|
1119 * :mod:`tabnanny`: A module/script to check Python source code for ambiguous |
|
1120 indentation. (Contributed by Tim Peters.) |
|
1121 |
|
1122 * :mod:`UserString`: A base class useful for deriving objects that behave like |
|
1123 strings. |
|
1124 |
|
1125 * :mod:`webbrowser`: A module that provides a platform independent way to launch |
|
1126 a web browser on a specific URL. For each platform, various browsers are tried |
|
1127 in a specific order. The user can alter which browser is launched by setting the |
|
1128 *BROWSER* environment variable. (Originally inspired by Eric S. Raymond's patch |
|
1129 to :mod:`urllib` which added similar functionality, but the final module comes |
|
1130 from code originally implemented by Fred Drake as |
|
1131 :file:`Tools/idle/BrowserControl.py`, and adapted for the standard library by |
|
1132 Fred.) |
|
1133 |
|
1134 * :mod:`_winreg`: An interface to the Windows registry. :mod:`_winreg` is an |
|
1135 adaptation of functions that have been part of PythonWin since 1995, but has now |
|
1136 been added to the core distribution, and enhanced to support Unicode. |
|
1137 :mod:`_winreg` was written by Bill Tutt and Mark Hammond. |
|
1138 |
|
1139 * :mod:`zipfile`: A module for reading and writing ZIP-format archives. These |
|
1140 are archives produced by :program:`PKZIP` on DOS/Windows or :program:`zip` on |
|
1141 Unix, not to be confused with :program:`gzip`\ -format files (which are |
|
1142 supported by the :mod:`gzip` module) (Contributed by James C. Ahlstrom.) |
|
1143 |
|
1144 * :mod:`imputil`: A module that provides a simpler way for writing customised |
|
1145 import hooks, in comparison to the existing :mod:`ihooks` module. (Implemented |
|
1146 by Greg Stein, with much discussion on python-dev along the way.) |
|
1147 |
|
1148 .. ====================================================================== |
|
1149 |
|
1150 |
|
1151 IDLE Improvements |
|
1152 ================= |
|
1153 |
|
1154 IDLE is the official Python cross-platform IDE, written using Tkinter. Python |
|
1155 2.0 includes IDLE 0.6, which adds a number of new features and improvements. A |
|
1156 partial list: |
|
1157 |
|
1158 * UI improvements and optimizations, especially in the area of syntax |
|
1159 highlighting and auto-indentation. |
|
1160 |
|
1161 * The class browser now shows more information, such as the top level functions |
|
1162 in a module. |
|
1163 |
|
1164 * Tab width is now a user settable option. When opening an existing Python file, |
|
1165 IDLE automatically detects the indentation conventions, and adapts. |
|
1166 |
|
1167 * There is now support for calling browsers on various platforms, used to open |
|
1168 the Python documentation in a browser. |
|
1169 |
|
1170 * IDLE now has a command line, which is largely similar to the vanilla Python |
|
1171 interpreter. |
|
1172 |
|
1173 * Call tips were added in many places. |
|
1174 |
|
1175 * IDLE can now be installed as a package. |
|
1176 |
|
1177 * In the editor window, there is now a line/column bar at the bottom. |
|
1178 |
|
1179 * Three new keystroke commands: Check module (Alt-F5), Import module (F5) and |
|
1180 Run script (Ctrl-F5). |
|
1181 |
|
1182 .. ====================================================================== |
|
1183 |
|
1184 |
|
1185 Deleted and Deprecated Modules |
|
1186 ============================== |
|
1187 |
|
1188 A few modules have been dropped because they're obsolete, or because there are |
|
1189 now better ways to do the same thing. The :mod:`stdwin` module is gone; it was |
|
1190 for a platform-independent windowing toolkit that's no longer developed. |
|
1191 |
|
1192 A number of modules have been moved to the :file:`lib-old` subdirectory: |
|
1193 :mod:`cmp`, :mod:`cmpcache`, :mod:`dircmp`, :mod:`dump`, :mod:`find`, |
|
1194 :mod:`grep`, :mod:`packmail`, :mod:`poly`, :mod:`util`, :mod:`whatsound`, |
|
1195 :mod:`zmod`. If you have code which relies on a module that's been moved to |
|
1196 :file:`lib-old`, you can simply add that directory to ``sys.path`` to get them |
|
1197 back, but you're encouraged to update any code that uses these modules. |
|
1198 |
|
1199 |
|
1200 Acknowledgements |
|
1201 ================ |
|
1202 |
|
1203 The authors would like to thank the following people for offering suggestions on |
|
1204 various drafts of this article: David Bolen, Mark Hammond, Gregg Hauser, Jeremy |
|
1205 Hylton, Fredrik Lundh, Detlef Lannert, Aahz Maruch, Skip Montanaro, Vladimir |
|
1206 Marangozov, Tobias Polzin, Guido van Rossum, Neil Schemenauer, and Russ Schmidt. |
|
1207 |