|
1 Intro |
|
2 ===== |
|
3 |
|
4 The basic rule for dealing with weakref callbacks (and __del__ methods too, |
|
5 for that matter) during cyclic gc: |
|
6 |
|
7 Once gc has computed the set of unreachable objects, no Python-level |
|
8 code can be allowed to access an unreachable object. |
|
9 |
|
10 If that can happen, then the Python code can resurrect unreachable objects |
|
11 too, and gc can't detect that without starting over. Since gc eventually |
|
12 runs tp_clear on all unreachable objects, if an unreachable object is |
|
13 resurrected then tp_clear will eventually be called on it (or may already |
|
14 have been called before resurrection). At best (and this has been an |
|
15 historically common bug), tp_clear empties an instance's __dict__, and |
|
16 "impossible" AttributeErrors result. At worst, tp_clear leaves behind an |
|
17 insane object at the C level, and segfaults result (historically, most |
|
18 often by setting a new-style class's mro pointer to NULL, after which |
|
19 attribute lookups performed by the class can segfault). |
|
20 |
|
21 OTOH, it's OK to run Python-level code that can't access unreachable |
|
22 objects, and sometimes that's necessary. The chief example is the callback |
|
23 attached to a reachable weakref W to an unreachable object O. Since O is |
|
24 going away, and W is still alive, the callback must be invoked. Because W |
|
25 is still alive, everything reachable from its callback is also reachable, |
|
26 so it's also safe to invoke the callback (although that's trickier than it |
|
27 sounds, since other reachable weakrefs to other unreachable objects may |
|
28 still exist, and be accessible to the callback -- there are lots of painful |
|
29 details like this covered in the rest of this file). |
|
30 |
|
31 Python 2.4/2.3.5 |
|
32 ================ |
|
33 |
|
34 The "Before 2.3.3" section below turned out to be wrong in some ways, but |
|
35 I'm leaving it as-is because it's more right than wrong, and serves as a |
|
36 wonderful example of how painful analysis can miss not only the forest for |
|
37 the trees, but also miss the trees for the aphids sucking the trees |
|
38 dry <wink>. |
|
39 |
|
40 The primary thing it missed is that when a weakref to a piece of cyclic |
|
41 trash (CT) exists, then any call to any Python code whatsoever can end up |
|
42 materializing a strong reference to that weakref's CT referent, and so |
|
43 possibly resurrect an insane object (one for which cyclic gc has called-- or |
|
44 will call before it's done --tp_clear()). It's not even necessarily that a |
|
45 weakref callback or __del__ method does something nasty on purpose: as |
|
46 soon as we execute Python code, threads other than the gc thread can run |
|
47 too, and they can do ordinary things with weakrefs that end up resurrecting |
|
48 CT while gc is running. |
|
49 |
|
50 http://www.python.org/sf/1055820 |
|
51 |
|
52 shows how innocent it can be, and also how nasty. Variants of the three |
|
53 focussed test cases attached to that bug report are now part of Python's |
|
54 standard Lib/test/test_gc.py. |
|
55 |
|
56 Jim Fulton gave the best nutshell summary of the new (in 2.4 and 2.3.5) |
|
57 approach: |
|
58 |
|
59 Clearing cyclic trash can call Python code. If there are weakrefs to |
|
60 any of the cyclic trash, then those weakrefs can be used to resurrect |
|
61 the objects. Therefore, *before* clearing cyclic trash, we need to |
|
62 remove any weakrefs. If any of the weakrefs being removed have |
|
63 callbacks, then we need to save the callbacks and call them *after* all |
|
64 of the weakrefs have been cleared. |
|
65 |
|
66 Alas, doing just that much doesn't work, because it overlooks what turned |
|
67 out to be the much subtler problems that were fixed earlier, and described |
|
68 below. We do clear all weakrefs to CT now before breaking cycles, but not |
|
69 all callbacks encountered can be run later. That's explained in horrid |
|
70 detail below. |
|
71 |
|
72 Older text follows, with a some later comments in [] brackets: |
|
73 |
|
74 Before 2.3.3 |
|
75 ============ |
|
76 |
|
77 Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs. |
|
78 Segfaults in Zope3 resulted. |
|
79 |
|
80 weakrefs in Python are designed to, at worst, let *other* objects learn |
|
81 that a given object has died, via a callback function. The weakly |
|
82 referenced object itself is not passed to the callback, and the presumption |
|
83 is that the weakly referenced object is unreachable trash at the time the |
|
84 callback is invoked. |
|
85 |
|
86 That's usually true, but not always. Suppose a weakly referenced object |
|
87 becomes part of a clump of cyclic trash. When enough cycles are broken by |
|
88 cyclic gc that the object is reclaimed, the callback is invoked. If it's |
|
89 possible for the callback to get at objects in the cycle(s), then it may be |
|
90 possible for those objects to access (via strong references in the cycle) |
|
91 the weakly referenced object being torn down, or other objects in the cycle |
|
92 that have already suffered a tp_clear() call. There's no guarantee that an |
|
93 object is in a sane state after tp_clear(). Bad things (including |
|
94 segfaults) can happen right then, during the callback's execution, or can |
|
95 happen at any later time if the callback manages to resurrect an insane |
|
96 object. |
|
97 |
|
98 [That missed that, in addition, a weakref to CT can exist outside CT, and |
|
99 any callback into Python can use such a non-CT weakref to resurrect its CT |
|
100 referent. The same bad kinds of things can happen then.] |
|
101 |
|
102 Note that if it's possible for the callback to get at objects in the trash |
|
103 cycles, it must also be the case that the callback itself is part of the |
|
104 trash cycles. Else the callback would have acted as an external root to |
|
105 the current collection, and nothing reachable from it would be in cyclic |
|
106 trash either. |
|
107 |
|
108 [Except that a non-CT callback can also use a non-CT weakref to get at |
|
109 CT objects.] |
|
110 |
|
111 More, if the callback itself is in cyclic trash, then the weakref to which |
|
112 the callback is attached must also be trash, and for the same kind of |
|
113 reason: if the weakref acted as an external root, then the callback could |
|
114 not have been cyclic trash. |
|
115 |
|
116 So a problem here requires that a weakref, that weakref's callback, and the |
|
117 weakly referenced object, all be in cyclic trash at the same time. This |
|
118 isn't easy to stumble into by accident while Python is running, and, indeed, |
|
119 it took quite a while to dream up failing test cases. Zope3 saw segfaults |
|
120 during shutdown, during the second call of gc in Py_Finalize, after most |
|
121 modules had been torn down. That creates many trash cycles (esp. those |
|
122 involving new-style classes), making the problem much more likely. Once you |
|
123 know what's required to provoke the problem, though, it's easy to create |
|
124 tests that segfault before shutdown. |
|
125 |
|
126 In 2.3.3, before breaking cycles, we first clear all the weakrefs with |
|
127 callbacks in cyclic trash. Since the weakrefs *are* trash, and there's no |
|
128 defined-- or even predictable --order in which tp_clear() gets called on |
|
129 cyclic trash, it's defensible to first clear weakrefs with callbacks. It's |
|
130 a feature of Python's weakrefs too that when a weakref goes away, the |
|
131 callback (if any) associated with it is thrown away too, unexecuted. |
|
132 |
|
133 [In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not |
|
134 those weakrefs are themselves CT, and whether or not they have callbacks. |
|
135 The callbacks (if any) on non-CT weakrefs (if any) are invoked later, |
|
136 after all weakrefs-to-CT have been cleared. The callbacks (if any) on CT |
|
137 weakrefs (if any) are never invoked, for the excruciating reasons |
|
138 explained here.] |
|
139 |
|
140 Just that much is almost enough to prevent problems, by throwing away |
|
141 *almost* all the weakref callbacks that could get triggered by gc. The |
|
142 problem remaining is that clearing a weakref with a callback decrefs the |
|
143 callback object, and the callback object may *itself* be weakly referenced, |
|
144 via another weakref with another callback. So the process of clearing |
|
145 weakrefs can trigger callbacks attached to other weakrefs, and those |
|
146 latter weakrefs may or may not be part of cyclic trash. |
|
147 |
|
148 So, to prevent any Python code from running while gc is invoking tp_clear() |
|
149 on all the objects in cyclic trash, |
|
150 |
|
151 [That was always wrong: we can't stop Python code from running when gc |
|
152 is breaking cycles. If an object with a __del__ method is not itself in |
|
153 a cycle, but is reachable only from CT, then breaking cycles will, as a |
|
154 matter of course, drop the refcount on that object to 0, and its __del__ |
|
155 will run right then. What we can and must stop is running any Python |
|
156 code that could access CT.] |
|
157 it's not quite enough just to invoke |
|
158 tp_clear() on weakrefs with callbacks first. Instead the weakref module |
|
159 grew a new private function (_PyWeakref_ClearRef) that does only part of |
|
160 tp_clear(): it removes the weakref from the weakly-referenced object's list |
|
161 of weakrefs, but does not decref the callback object. So calling |
|
162 _PyWeakref_ClearRef(wr) ensures that wr's callback object will never |
|
163 trigger, and (unlike weakref's tp_clear()) also prevents any callback |
|
164 associated *with* wr's callback object from triggering. |
|
165 |
|
166 [Although we may trigger such callbacks later, as explained below.] |
|
167 |
|
168 Then we can call tp_clear on all the cyclic objects and never trigger |
|
169 Python code. |
|
170 |
|
171 [As above, not so: it means never trigger Python code that can access CT.] |
|
172 |
|
173 After we do that, the callback objects still need to be decref'ed. Callbacks |
|
174 (if any) *on* the callback objects that were also part of cyclic trash won't |
|
175 get invoked, because we cleared all trash weakrefs with callbacks at the |
|
176 start. Callbacks on the callback objects that were not part of cyclic trash |
|
177 acted as external roots to everything reachable from them, so nothing |
|
178 reachable from them was part of cyclic trash, so gc didn't do any damage to |
|
179 objects reachable from them, and it's safe to call them at the end of gc. |
|
180 |
|
181 [That's so. In addition, now we also invoke (if any) the callbacks on |
|
182 non-CT weakrefs to CT objects, during the same pass that decrefs the |
|
183 callback objects.] |
|
184 |
|
185 An alternative would have been to treat objects with callbacks like objects |
|
186 with __del__ methods, refusing to collect them, appending them to gc.garbage |
|
187 instead. That would have been much easier. Jim Fulton gave a strong |
|
188 argument against that (on Python-Dev): |
|
189 |
|
190 There's a big difference between __del__ and weakref callbacks. |
|
191 The __del__ method is "internal" to a design. When you design a |
|
192 class with a del method, you know you have to avoid including the |
|
193 class in cycles. |
|
194 |
|
195 Now, suppose you have a design that makes has no __del__ methods but |
|
196 that does use cyclic data structures. You reason about the design, |
|
197 run tests, and convince yourself you don't have a leak. |
|
198 |
|
199 Now, suppose some external code creates a weakref to one of your |
|
200 objects. All of a sudden, you start leaking. You can look at your |
|
201 code all you want and you won't find a reason for the leak. |
|
202 |
|
203 IOW, a class designer can out-think __del__ problems, but has no control |
|
204 over who creates weakrefs to his classes or class instances. The class |
|
205 user has little chance either of predicting when the weakrefs he creates |
|
206 may end up in cycles. |
|
207 |
|
208 Callbacks on weakref callbacks are executed in an arbitrary order, and |
|
209 that's not good (a primary reason not to collect cycles with objects with |
|
210 __del__ methods is to avoid running finalizers in an arbitrary order). |
|
211 However, a weakref callback on a weakref callback has got to be rare. |
|
212 It's possible to do such a thing, so gc has to be robust against it, but |
|
213 I doubt anyone has done it outside the test case I wrote for it. |
|
214 |
|
215 [The callbacks (if any) on non-CT weakrefs to CT objects are also executed |
|
216 in an arbitrary order now. But they were before too, depending on the |
|
217 vagaries of when tp_clear() happened to break enough cycles to trigger |
|
218 them. People simply shouldn't try to use __del__ or weakref callbacks to |
|
219 do fancy stuff.] |