---------------------------------------- from gc import commonsense - Finish Him! ---------------------------------------- .. class:: endnote +---------------------------------------------------------+-------------------------------------------+ | .. image:: ../hacking_python_s1/claudiofreire.jpg |**Author:** Claudio Freire | | :class: right foto | | | | | +---------------------------------------------------------+-------------------------------------------+ .. raw:: pdf Spacer 0 1cm I'm not sure if everyone does, but many of those of us who use python (and any other high level language in fact) feel drawn towards their elegant abstractions. And not the least of which is the one that abstracts memory management, known as the ``garbage collector``. This weird thing, as revered as ignored, lore tells, allows us to program without worrying about memory. There's no need to reserve it, there's no need to free it... the ``garbage collector`` takes care of it all. And, as any tale of lore, there's *some* truth to it. In this column we'll analyze the myths and truths of automated memory management. Much of what we'll cover applies to many languages - all those that use some kind of automated memory management - but, of course, we'll focus on the kind of memory management used by Python. And not any python flavour, since they're many.. CPython. Finalization ------------ Getting some distance from the menial for a while, reserving and freeing bytes, because before getting into those gritty visceral details of CPython we must know the surface that covers them, we'll take a look at something that has profound repercussions over memory and resource management in general. If the reader has ever programmed various object-oriented languages (not just python), he/she'll know a lot about constructors. Little wee functions that, well, construct instances of objects of some particular class. For example: .. code-block:: Pycon >>> class UselessClass: ... def __init__(self, value): ... self.value = value ... >>> uselessObject = UselessClass(3) >>> uselessObject.value 3 The same reader will also remember something much less common in Python: destructors. Destructors (so called in many languages, but known by other names as well) are functions that are invoked to *free* resources often associated to an instance. For example, if our ``UselessClass`` had a file for value, a socket, or anything that needs to be *"closed"* or *"freed"*, we'd want a *destructor* that does so when the instance ceases to exist. That's called finalization, and in python it's written like so: .. code-block:: Pycon >>> class UselessClass: ... def __init__(self, filey): ... print "opening" ... self.value = open(filey, "r") ... def __del__(self): ... print "closing" ... self.value.close() ... >>> uselessObject = UselessClass("filey.txt") opening >>> uselessObject = None closing Another reader would say *Huh... interesting.* I won't say otherwise. And yes, the class is called useless because file objects in python already have their built-in destructor, that closes the file. But it's just an example. Lifetime of a class ------------------- Now comes *the* question to ask oneself. When does an instance, then, cease to exist? When is ``__del__`` called? In most high level languages that manage memory for us, the definition is vague: at some point, when there's no reachable references left to it. In that little phrase is a world of sub-specification. What is a reachable reference? When exactly? Immediately after the remaining references become unreachable? A minute afterwards? An hour? A day? When? As the first question is tough, we'll see it the next time. And for the second, third, fourth, fifth and sixth question... well.. there's no precise answer **coming from the language's specification**. The specification is, thus, vague, and intentionally so. The usefulness of a vague specification (not being clear about when an instance has to be finalized) is big indeed, believe it or not. If it weren't for that, Jython would not exist. For those that don't know Jython, it's an implementation of the Python language, but made in Java - because nothing says all implementations must be done in C, and because nothing is stopping it. If the specification had said that all objects are finalized immediately after becoming unreachable, an implementation made in Java would have been incredibly less efficient. This is so because such a requirement is very different from the requirements imposed on java's ``garbage collector``. Being vague, Python's specification allows Jython to reuse Java's ``garbage collector``, which makes Jython viable. And if any reader coded finalizers in Java, he/she'd already be noting the issue: Python, as a language, doesn't give us any guarantee about when our ``__del__`` destructor runs, only that it's ran. Sometime. Today, tomorrow, the next day... or when the computer is turned off. Whatever. The specification doesn't specify, any of those options is good for Python. Actually, it's worse: since Python's specification actually says there's not even a guarantee that the destructor will be called for objects alive when the interpreter shuts down. That is, if I call ``sys.exit(0)``, objects alive at the time may or may not be finalized. So there's not even the guarantee that the destructor is eventually called for all cases. But CPython, as opposed to Jython, implements a type of ``garbage collector`` that is much more immediate in detecting unreachable references - at least in most cases. This makes destructors seem magic, immediate, almost like C++'s destructors. And that's the reason why destructors in CPython are ten times more useful than they are in, say, Java. Or Jython. Many Python programmers will wrongfully hold that immediate nature as something of Python (the language), instead of CPython (the implementation), which is what it is. Sadly I'm one of them. It's very comfy, one has to admit, so if we're going to base our code in that comfiness, lets do it in good conscience, knowing fully well what we're doing and what the limits are. Circular references ------------------- Our useless class uses a destructor to close the file... something that is considered incorrect in Python. Why, so many people ask. So lets see: .. code-block:: Pycon >>> uselessObject = UselessClass("filey.txt") opening >>> uselessObject2 = UselessClass("filey.txt") closing >>> uselessObject.circle = uselessObject2 >>> uselessObject2.circle = uselessObject >>> uselessObject = uselessObject2 = None Now, exercise for the reader: think about what would come out the console after that last sentence. It's not uncommon to go wrong here and say: *it prints "closing" twice*. Nope. Does not. Go ahead, try it out. For us to understand what's going on, type in the console ``import gc ; gc.garbage``. There they are, our two instances of ``UselessClass``. What happened? We'll see it in detail in another installment. The important thing to remember here is that destructors don't get along very well with circular references. And there's many, many ways for us to unknowingly create circular references, and they're not always easy to spot, and they're always harder to get rid of. ``gc.garbage`` will be our best friend when we suspect of this kind of problem. Reviving objects ---------------- People aren't the only ones to get CPR. Objects in python can too. Honestly, I never found it useful. For absolutely anything. But someone must have thought it was cool, because it's part of the language. If a destructor, in the process of destructing, creates a *new reachable reference to itself*, the destruction is cancelled, and the object lives on. Maybe it's useful for debugging, or to do crazy stuff. Lets imagine a resource that just has to be destroyed in the main thread (it's not unheard of, happens quite a few times). The destructor will, then, ask for ``thread.get_ident()`` and compare against the main thread, if it's not running in the right thread, it will queue the instance's destruction for the proper thread to process. Upon queuing, a new reachable reference is created, and CPython will detect this. It's perfectly legal. It could also happen by accident, and this is the important thing to remember, because I doubt many readers will want to do it on purpose. So it's important then not to let a reference to ``self`` escape from a destructor, or we'll end up with ugly situations. Memory leaks, unclosed resources, exceptions. Ugly things. Lets see precisely a case where we'll get away with it, because Python itself handles it its own way: .. code-block:: Pycon >>> class UselessClass: ... def __init__(self, filey): ... print "opening" ... self.value = open(filey, "r") ... def __del__(self): ... raise RuntimeError, "I wanna break ya'" ... >>> try: ... x = UselessClass("filey.txt") ... # stuff ... x = None ... except: ... pass ... opening Exception RuntimeError: RuntimeError("I wanna break ya'",) in > ignored The fun part of the code above isn't that it blows up. It's obvious, after all I threw a ``RuntimeError`` quite explicitly. The fun part is that it **does not**. One would expect it to throw a ``RuntimeError``, that will be caught by the except statement, and ignored **silently**. But no printout about the matter. If it did that, though, the reference would not disappear, because when the exception is thrown, a reference to ``self`` would be stored in the ``Traceback`` of the exception. And when coming out of the ``except`` block it would try to destroy it again, raising another exception, which revives the object one more time... and so on and so on. Infinite fun. **Note**: *It so happens that all exceptions have a reference to the local variables where they were risen, because it's useful for debuggers, and that can keep instances alive or even revive them.* So CPython, quite aware of the matter, ignores exceptions that try to escape a destructor. If the destructor doesn't catch an exception, it won't be elevated to the "caller". What makes sense, if you think about it, because the code that called the destructor did so implicitly, by pure chance, and would rarely know how to handle the exception. Another common way to let a reference to ``self`` escape that tends to go unnoticed is when using ``closures``. Lambda expressions like ``lambda x : self.attribute + x``, they have an implicit reference to self, and if that expression escapes self also does. Context managers ---------------- Concluding, destructors are useful, comfortable, and hard to predict. They have to be used with care, and whenever assuming that destructors are called with any immediate quality after dereferencing an instance, we'll be creating code that only works properly on CPython. For reliable file closing, Python provides us with a better, more predictable and more uniformly supported tool: the ``with`` statement: .. code-block:: Pycon >>> with open("archivito.txt", "r") as f: ... # do something ... # no need to call f.close(), ... # it's called automatically when exiting the 'with' block We won't go into the ``with`` statement, but it's worth mentioning that it doesn't replace destructors. Only the use we've been giving them along this article, that is, to close files. The ``with`` statement also has many more uses, so I invite you to do some research yourselves.