|
1 :mod:`heapq` --- Heap queue algorithm |
|
2 ===================================== |
|
3 |
|
4 .. module:: heapq |
|
5 :synopsis: Heap queue algorithm (a.k.a. priority queue). |
|
6 .. moduleauthor:: Kevin O'Connor |
|
7 .. sectionauthor:: Guido van Rossum <guido@python.org> |
|
8 .. sectionauthor:: François Pinard |
|
9 |
|
10 .. versionadded:: 2.3 |
|
11 |
|
12 This module provides an implementation of the heap queue algorithm, also known |
|
13 as the priority queue algorithm. |
|
14 |
|
15 Heaps are arrays for which ``heap[k] <= heap[2*k+1]`` and ``heap[k] <= |
|
16 heap[2*k+2]`` for all *k*, counting elements from zero. For the sake of |
|
17 comparison, non-existing elements are considered to be infinite. The |
|
18 interesting property of a heap is that ``heap[0]`` is always its smallest |
|
19 element. |
|
20 |
|
21 The API below differs from textbook heap algorithms in two aspects: (a) We use |
|
22 zero-based indexing. This makes the relationship between the index for a node |
|
23 and the indexes for its children slightly less obvious, but is more suitable |
|
24 since Python uses zero-based indexing. (b) Our pop method returns the smallest |
|
25 item, not the largest (called a "min heap" in textbooks; a "max heap" is more |
|
26 common in texts because of its suitability for in-place sorting). |
|
27 |
|
28 These two make it possible to view the heap as a regular Python list without |
|
29 surprises: ``heap[0]`` is the smallest item, and ``heap.sort()`` maintains the |
|
30 heap invariant! |
|
31 |
|
32 To create a heap, use a list initialized to ``[]``, or you can transform a |
|
33 populated list into a heap via function :func:`heapify`. |
|
34 |
|
35 The following functions are provided: |
|
36 |
|
37 |
|
38 .. function:: heappush(heap, item) |
|
39 |
|
40 Push the value *item* onto the *heap*, maintaining the heap invariant. |
|
41 |
|
42 |
|
43 .. function:: heappop(heap) |
|
44 |
|
45 Pop and return the smallest item from the *heap*, maintaining the heap |
|
46 invariant. If the heap is empty, :exc:`IndexError` is raised. |
|
47 |
|
48 .. function:: heappushpop(heap, item) |
|
49 |
|
50 Push *item* on the heap, then pop and return the smallest item from the |
|
51 *heap*. The combined action runs more efficiently than :func:`heappush` |
|
52 followed by a separate call to :func:`heappop`. |
|
53 |
|
54 .. versionadded:: 2.6 |
|
55 |
|
56 .. function:: heapify(x) |
|
57 |
|
58 Transform list *x* into a heap, in-place, in linear time. |
|
59 |
|
60 |
|
61 .. function:: heapreplace(heap, item) |
|
62 |
|
63 Pop and return the smallest item from the *heap*, and also push the new *item*. |
|
64 The heap size doesn't change. If the heap is empty, :exc:`IndexError` is raised. |
|
65 This is more efficient than :func:`heappop` followed by :func:`heappush`, and |
|
66 can be more appropriate when using a fixed-size heap. Note that the value |
|
67 returned may be larger than *item*! That constrains reasonable uses of this |
|
68 routine unless written as part of a conditional replacement:: |
|
69 |
|
70 if item > heap[0]: |
|
71 item = heapreplace(heap, item) |
|
72 |
|
73 Example of use: |
|
74 |
|
75 >>> from heapq import heappush, heappop |
|
76 >>> heap = [] |
|
77 >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] |
|
78 >>> for item in data: |
|
79 ... heappush(heap, item) |
|
80 ... |
|
81 >>> ordered = [] |
|
82 >>> while heap: |
|
83 ... ordered.append(heappop(heap)) |
|
84 ... |
|
85 >>> print ordered |
|
86 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] |
|
87 >>> data.sort() |
|
88 >>> print data == ordered |
|
89 True |
|
90 |
|
91 The module also offers three general purpose functions based on heaps. |
|
92 |
|
93 |
|
94 .. function:: merge(*iterables) |
|
95 |
|
96 Merge multiple sorted inputs into a single sorted output (for example, merge |
|
97 timestamped entries from multiple log files). Returns an :term:`iterator` |
|
98 over the sorted values. |
|
99 |
|
100 Similar to ``sorted(itertools.chain(*iterables))`` but returns an iterable, does |
|
101 not pull the data into memory all at once, and assumes that each of the input |
|
102 streams is already sorted (smallest to largest). |
|
103 |
|
104 .. versionadded:: 2.6 |
|
105 |
|
106 |
|
107 .. function:: nlargest(n, iterable[, key]) |
|
108 |
|
109 Return a list with the *n* largest elements from the dataset defined by |
|
110 *iterable*. *key*, if provided, specifies a function of one argument that is |
|
111 used to extract a comparison key from each element in the iterable: |
|
112 ``key=str.lower`` Equivalent to: ``sorted(iterable, key=key, |
|
113 reverse=True)[:n]`` |
|
114 |
|
115 .. versionadded:: 2.4 |
|
116 |
|
117 .. versionchanged:: 2.5 |
|
118 Added the optional *key* argument. |
|
119 |
|
120 |
|
121 .. function:: nsmallest(n, iterable[, key]) |
|
122 |
|
123 Return a list with the *n* smallest elements from the dataset defined by |
|
124 *iterable*. *key*, if provided, specifies a function of one argument that is |
|
125 used to extract a comparison key from each element in the iterable: |
|
126 ``key=str.lower`` Equivalent to: ``sorted(iterable, key=key)[:n]`` |
|
127 |
|
128 .. versionadded:: 2.4 |
|
129 |
|
130 .. versionchanged:: 2.5 |
|
131 Added the optional *key* argument. |
|
132 |
|
133 The latter two functions perform best for smaller values of *n*. For larger |
|
134 values, it is more efficient to use the :func:`sorted` function. Also, when |
|
135 ``n==1``, it is more efficient to use the builtin :func:`min` and :func:`max` |
|
136 functions. |
|
137 |
|
138 |
|
139 Theory |
|
140 ------ |
|
141 |
|
142 (This explanation is due to François Pinard. The Python code for this module |
|
143 was contributed by Kevin O'Connor.) |
|
144 |
|
145 Heaps are arrays for which ``a[k] <= a[2*k+1]`` and ``a[k] <= a[2*k+2]`` for all |
|
146 *k*, counting elements from 0. For the sake of comparison, non-existing |
|
147 elements are considered to be infinite. The interesting property of a heap is |
|
148 that ``a[0]`` is always its smallest element. |
|
149 |
|
150 The strange invariant above is meant to be an efficient memory representation |
|
151 for a tournament. The numbers below are *k*, not ``a[k]``:: |
|
152 |
|
153 0 |
|
154 |
|
155 1 2 |
|
156 |
|
157 3 4 5 6 |
|
158 |
|
159 7 8 9 10 11 12 13 14 |
|
160 |
|
161 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
162 |
|
163 In the tree above, each cell *k* is topping ``2*k+1`` and ``2*k+2``. In an usual |
|
164 binary tournament we see in sports, each cell is the winner over the two cells |
|
165 it tops, and we can trace the winner down the tree to see all opponents s/he |
|
166 had. However, in many computer applications of such tournaments, we do not need |
|
167 to trace the history of a winner. To be more memory efficient, when a winner is |
|
168 promoted, we try to replace it by something else at a lower level, and the rule |
|
169 becomes that a cell and the two cells it tops contain three different items, but |
|
170 the top cell "wins" over the two topped cells. |
|
171 |
|
172 If this heap invariant is protected at all time, index 0 is clearly the overall |
|
173 winner. The simplest algorithmic way to remove it and find the "next" winner is |
|
174 to move some loser (let's say cell 30 in the diagram above) into the 0 position, |
|
175 and then percolate this new 0 down the tree, exchanging values, until the |
|
176 invariant is re-established. This is clearly logarithmic on the total number of |
|
177 items in the tree. By iterating over all items, you get an O(n log n) sort. |
|
178 |
|
179 A nice feature of this sort is that you can efficiently insert new items while |
|
180 the sort is going on, provided that the inserted items are not "better" than the |
|
181 last 0'th element you extracted. This is especially useful in simulation |
|
182 contexts, where the tree holds all incoming events, and the "win" condition |
|
183 means the smallest scheduled time. When an event schedule other events for |
|
184 execution, they are scheduled into the future, so they can easily go into the |
|
185 heap. So, a heap is a good structure for implementing schedulers (this is what |
|
186 I used for my MIDI sequencer :-). |
|
187 |
|
188 Various structures for implementing schedulers have been extensively studied, |
|
189 and heaps are good for this, as they are reasonably speedy, the speed is almost |
|
190 constant, and the worst case is not much different than the average case. |
|
191 However, there are other representations which are more efficient overall, yet |
|
192 the worst cases might be terrible. |
|
193 |
|
194 Heaps are also very useful in big disk sorts. You most probably all know that a |
|
195 big sort implies producing "runs" (which are pre-sorted sequences, which size is |
|
196 usually related to the amount of CPU memory), followed by a merging passes for |
|
197 these runs, which merging is often very cleverly organised [#]_. It is very |
|
198 important that the initial sort produces the longest runs possible. Tournaments |
|
199 are a good way to that. If, using all the memory available to hold a |
|
200 tournament, you replace and percolate items that happen to fit the current run, |
|
201 you'll produce runs which are twice the size of the memory for random input, and |
|
202 much better for input fuzzily ordered. |
|
203 |
|
204 Moreover, if you output the 0'th item on disk and get an input which may not fit |
|
205 in the current tournament (because the value "wins" over the last output value), |
|
206 it cannot fit in the heap, so the size of the heap decreases. The freed memory |
|
207 could be cleverly reused immediately for progressively building a second heap, |
|
208 which grows at exactly the same rate the first heap is melting. When the first |
|
209 heap completely vanishes, you switch heaps and start a new run. Clever and |
|
210 quite effective! |
|
211 |
|
212 In a word, heaps are useful memory structures to know. I use them in a few |
|
213 applications, and I think it is good to keep a 'heap' module around. :-) |
|
214 |
|
215 .. rubric:: Footnotes |
|
216 |
|
217 .. [#] The disk balancing algorithms which are current, nowadays, are more annoying |
|
218 than clever, and this is a consequence of the seeking capabilities of the disks. |
|
219 On devices which cannot seek, like big tape drives, the story was quite |
|
220 different, and one had to be very clever to ensure (far in advance) that each |
|
221 tape movement will be the most effective possible (that is, will best |
|
222 participate at "progressing" the merge). Some tapes were even able to read |
|
223 backwards, and this was also used to avoid the rewinding time. Believe me, real |
|
224 good tape sorts were quite spectacular to watch! From all times, sorting has |
|
225 always been a Great Art! :-) |
|
226 |