Memory debugging and defragmentation

../images/memory_fragmentation/claudiofreire.jpg Author: Claudio Freire

Introduction

This lecture was planned to be given in 20 minutes. If you can't read it in 20 minutes, reread, reread and re-reread until it takes 20 minutes :-p

Well, actually the text here is only slightly based in the real lecture. I'm one known to improvise in lectures, all I have are the slides as a guide, the general guidelines for the lecture, and this format doesn't lend itself to improvisation. Besides, no freaking way I'm going to remember the exact words I used ;)

Now, really... this is a deep convoluted subject. Reread and re-reread until you get it.

When I started looking for lecture subjects, I decided for memory fragmentation because not long ago I though it was a myth. Ok, yes, it can happen... but, does that happen to anybody? Is it a problem? How naive of me to doubt that could answers could be "yes".

In PyConAr 2010 (and here in this space now), not only I had the chance to transmit around 4 months of picking Python's guts apart, trying to find out what was going on with my applications (where I suffered from memory fragmentation), I also learned that this issue is much, much more common than I'd expected.

When the lecture started, I asked: Does anybody know what memory fragmentation is?. Exactly zero attendants raised their hand. Exactly the amount I had expected. But, in the end, many approached me saying that the same thing that was happening to me (and that I described, and will describe below) was happening to them.

That, according to my post-PyCon estimations, means a 10% of everyone present (assuming they're Python developers) suffer from that issue, and to a 3% of them it means serious operative problems. So the issue was far more relevant than I had expected.

How does Python manage memory?

Before going into this fragmentation thing, we must study some internal details of how memory management works in Python.

How would you say Python manages memory? Malloc? (a C function that reserves memory for those unfamiliar with C)

Well, no. Python has unusual and very specific requirements, if it used malloc for all of its memory needs, it would be too inefficient. Python uses, instead, a series of techniques and strategies designed to minimize malloc calls, which are very slow to be used in the core of Python's virtual machine.

  • Pools
    • Integer
    • Floating point
  • Free lists
    • Tuples
    • Lists
    • Frames (yes, the frames are objects and need to be managed too)
  • Arenas

Lets see what each is about, one by one

Pools

Not for swimming.

They're arrays, object vectors of the same type, that Python uses usually to accelerate the creation and destruction of common objects, like integers and floating point numbers.

When python needs one of these objects, it doesn't create one, it creates many (as many as they fit in a pool block), and keeps a linked list of which objects are free within each block.

../images/memory_fragmentation/pools.jpg

Structure of an object pool

In a pool, the ob_type field (present in all PyObject) is used to link free objects. When a new one is needed, the field is restored (trivial to do), and in general there's no need of further initialization, so it is very fast.

In an object pool, creation and destruction is very fast, and, depending on the type of object, it can save a lot of initialization (in the case of integers and floating point numbers, this is especially true), the objects remain well pack in memory, all close together, and everything works very well.

Free lists

The idea of not having to ask for memory to create or destroy objects that are frequently created and destroyed is something that can be generalized from pools to any type of object (not only those of a particular type), even objects of variable size (where having them all packed in an array is infeasible).

When you do that, you have a free list:

../images/memory_fragmentation/freelists.jpg

Structure of a free list

Free lists are very similar to a pool, but instead of keeping an array of objects, it simply keeps a linked list of free objects. When an object is destroyed, it can simply go into the free list instead of freeing its memory, to be able to reuse it later.

This idea of having free lists saves tons of calls to malloc, and is particularly useful for strings, lists and tuples, which are variable-sized objects very intensely used in Python, and where a pool wouldn't be practical. They're also the main reason why Python is particularly sensitive to memory fragmentation, and we'll soon see why.

Note also that frames use free lists. Frames are objects, also of variable size (because they need a stack, temporal space for local variables and objects our code will generate), also very intensely used in Python (one gets "created" every time a function call is made). Free lists of frames are a very important optimization (they save a lot of time initializing the frames, a costly operation), but it also contributes to memory fragmentation (as every free list does).

One important thing to remember about Python free lists is that the decision whether to destroy an object or put it into the free list is done at the moment of dereferencing it (when its reference count reaches zero). Once in the free list, it remains there until reused. Python, in order to take this decision, contains a series of limits - X frames, Y 1-sized tuples, Z 2-sized tuples, W strings, etc... (this limit is usually 100 in each case).

Arenas

So... what about the rest? Malloc?

Well, yes and no.

It uses arenas. Which is not spanish for where the glass comes, but a hybrid between pools, free lists and malloc.

For small objects, Python keeps a list of pools for each concrete size (remember that pools need objects of the same size, for they're vectors). Each pool has its free list, and each pool block is 8Kb in size. This is called an arena.

For big objects (more than 256 bytes), Python calls malloc directly.

Like Python object sizes grow in 8-byte increments (due to their structure), then there are exactly 32 arenas.

All Python objects are created in this way, even those that use free lists.

Arenas also induce an inner memory fragmentation problem, since no arena block can be freed until all the objects residing in it are freed.

Fragmentation

So, lets see what this memory fragmentation thing is:

../images/memory_fragmentation/memoria.jpg

Map of fragmented memory

If black is used space, and white is free space, you can see here how there's a lot of free space, but unusable for objects beyond some size, because it's not contiguous space.

Put simply, memory fragmentation happens when there's lots of free space, but it's not contiguous. Like in the above map, much of it is free space, but it's unusable for big objects because, in contrast with a file you can split into several blocks on disk, an object's memory region needs to be contiguous.

So, in contrast with fragmentation in a file system, memory fragmentation makes portions of it unusable. If I needed memory for a big object, say a few megabytes, I would have to use the area towards the end of the map (that is, extend the virtual image of the process). This is effectively what malloc does when it's faced with this situation.

The immediate, visible effect here, is an inefficiency in the use of available memory. If my program needed 2GB of memory in theory, it could be reserving 4GB from the operating system (because it has many small pieces reserved that it cannot use). If I have bad luck, this could make my system swap. If I have too much bad luck, it could trash, and die.

Lets see code that fragments memory:

>>> l = []
>>> for i in xrange(1,100):
...   ll = [ " " * i * 16 for j in xrange(1000000 / i) ]
...   ll = ll[::2]
...   l.extend(ll)
...   print sum(map(len,l))
...
8000000
16000000
…
792005616
>>>

After this, top says:

10467 claudiof  20   0 1676m 1.6g 1864 S    0 82.7   1:17.07 python

Meaning, even though according to our calculations the program had to consume 800M of memory, it effectively consumes 1.6G. Double that.

Why?

Well the example was specifically tailored to create 50% of unusable holes. The memory is fragmented, then, by 50%.

But there's something worse. Suppose I do:

>>> del l
>>> del ll

I get from top:

10467 claudiof  20   0 1532m 1.5g 1864 S    0 75.6   1:17.96 python

If I repeat the fragmentation example, I can confirm that those 1.5G are effectively free for python:

10467 claudiof  20   0 1676m 1.6g 1864 S    0 82.8   2:33.39 python

But if I try to free them (to the operating system), I can't.

¿WTF?

Enter Guppy

Guppy is a little red fish commonly seen in fish tanks everywhere. Those little fish there, those are called guppy.

Really.

It's also an extension library for Python that contains a module, heapy, which allows me to do memory diagnostics.

Really.

../images/memory_fragmentation/guppy.jpg

Guppy

So, let's try to use it:

>>> from guppy import hpy
>>> hp = hpy()
>>> hp.heap()
Partition of a set of 23778 objects. Total size = 1760692 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0  10642  45   696652  40    696652  40 str
    1   5432  23   196864  11    893516  51 tuple
    2    345   1   112968   6   1006484  57 dict (no owner)
    3   1546   7   105128   6   1111612  63 types.CodeType
    4     66   0   104592   6   1216204  69 dict of module
    5    174   1    93168   5   1309372  74 dict of type
    6    194   1    86040   5   1395412  79 type
    7   1472   6    82432   5   1477844  84 function
    8    124   1    67552   4   1545396  88 dict of class
    9   1027   4    36972   2   1582368  90 __builtin__.wrapper_descriptor

That is, Python (just by launching it) already consumes 1.7MB. In python objects. Heapy doesn't count what is not Python objects, so all that is reported there is memory used directly by python objects.

This means strings, lists, dictionaries, arrays, but not mmap objects, or memory used by extension libraries (eg: SDL surfaces in pygame).

Lets diagnose then:

>>> l = []
>>> for i in xrange(1,100):
...    

>>> hp.heap()
Partition of a set of 2612542 objects. Total size = 866405844 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0 2599386  99 854833304  99 854833304  99 str
    1    132   0 10516640   1 865349944 100 list
    2   5433   0   197064   0 865547008 100 tuple
    3    351   0   113784   0 865660792 100 dict (no owner)
    4   1547   0   105196   0 865765988 100 types.CodeType
    5     67   0   105112   0 865871100 100 dict of module
    6    174   0    93168   0 865964268 100 dict of type
    7    194   0    86040   0 866050308 100 type
    8   1472   0    82432   0 866132740 100 function
    9    124   0    67552   0 866200292 100 dict of class

So, just as we had calculated, more or less 800M (850M) in Python objects. That's what heapy says.

>>> del l
>>> del ll
>>> hp.heap()
Partition of a set of 23844 objects. Total size = 1765236 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0  10690  45   698996  40    698996  40 str
    1   5433  23   197068  11    896064  51 tuple
    2    351   1   113784   6   1009848  57 dict (no owner)
    3   1547   6   105196   6   1115044  63 types.CodeType
    4     67   0   105112   6   1220156  69 dict of module
    5    174   1    93168   5   1313324  74 dict of type
    6    194   1    86040   5   1399364  79 type
    7   1472   6    82432   5   1481796  84 function
    8    124   1    67552   4   1549348  88 dict of class
    9   1027   4    36972   2   1586320  90 __builtin__.wrapper_descriptor

¿WTF?

Heapy tells us that Python uses 1.7MB again. Top keeps saying 1.6G. I believe top.

It happens that, in fact, the rest is “free” space (free for Python, not for the operating system).

Doing a differential analysis, we'll get some perspective into the matter:

>>> from guppy import hpy
>>> hp = hpy()
>>> heap1 = hp.heap()
>>> # experimento
>>> heap2 = hp.heap()
>>> new_stuff = heap2  heap1
>>> del l, ll
>>> garbage = heap3  heap1

That results in 3 heap snapshots. heap1, as it is after launching Python. heap2, after the experiment, and heap3 after "freeing" everything, and two "differentials", new_stuff, what's in heap2 that is new (not in heap1), and garbage, what is in heap3 that is not in heap1 (that is, what wasn't freed).

>>> new_stuff
Partition of a set of 2588725 objects. Total size = 864642976 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0 2588706 100 854134668  99 854134668  99 str
    1      2   0 10506304   1 864640972 100 list
    2      6   0      816   0 864641788 100 dict (no owner)
    3      2   0      676   0 864642464 100 types.FrameType
    4      2   0      272   0 864642736 100 dict of guppy.etc.Glue.Owner
    5      1   0       68   0 864642804 100 types.CodeType
    6      2   0       64   0 864642868 100 guppy.etc.Glue.Owner
    7      2   0       64   0 864642932 100 tuple
    8      1   0       32   0 864642964 100 exceptions.KeyboardInterrupt
    9      1   0       12   0 864642976 100 int

It's worth questioning: Only 850M in strings? And the other 800M to complete the 1.6G?

Well, it happens that the memory looks a bit like gruyere cheese at this time. There's 800M in relatively small strings, but as in each step we'd been freeing half of them (ll = ll[::2]), we also have 800M in free but unusable space. Because in each step, also, I need strings a bit bigger than before, so the free holes cannot be reused.

Lets see what happens after dereferencing everything:

>>> garbage
Partition of a set of 29 objects. Total size = 2520 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0      6  21      816  32       816  32 dict (no owner)
    1      2   7      748  30      1564  62 types.FrameType
    2     10  34      364  14      1928  77 str
    3      2   7      272  11      2200  87 dict of guppy.etc.Glue.Owner
    4      2   7       80   3      2280  90 __builtin__.weakref
    5      1   3       68   3      2348  93 types.CodeType
    6      2   7       64   3      2412  96 guppy.etc.Glue.Owner
    7      2   7       64   3      2476  98 tuple
    8      1   3       32   1      2508 100 exceptions.KeyboardInterrupt
    9      1   3       12   0      2520 100 int

Gotcha!

That's important!

These 29 objects stop the heap from shrinking. What reminds me...

The Heap

...of the heap.

Normally, the heap gets bigger and smaller.

../images/memory_fragmentation/heap.jpg

Heap lifecycle

The heap expands and contracts, but in each cycle there can be leftover "garbage", or maybe useful live objects, that prevent it from fully contracting. The memory that is left trapped in the middle cannot be reused by other processes, it's only free to Python.

As you can see on the figure, every time it shrinks, it doesn't completely. Sometimes, objects remain alive in high addresses - since the heap cannot be fragmented (you cannot free space in the middle of the heap, it can only expand or contract), these objects keep memory in the middle reserved for Python. Python can reuse it, but not the rest of the operating system.

This hurts disk cache, other processes (maybe other Python processes, in a webserver it can happen that we have more than one worker running Python), it damages the performance of the whole system.

Guess who have the habit of leaving objects alive in high addresses...

...that's right. Free lists. Here, with guppy, we found 29 objects, probably all kept alive thanks to some free list that keeps them alive. We see a pair of them are Frames, as I said, Frames cause this kind of issues.

We all want to know how to avoid this, so:

Guppy tips

  • Don't leave garbage lying around on the floor

    • If you'll be creating lots of small objects, create persistent ones first, transient ones last.
      • Compiling code (eg: using eval or doing imports) generates permanent strings, called interned strings, so compiling on-demand is something to avoid just as well.
      • SQLAlchemy and many other libraries have internal caches, research them and be aware of them and their eviction politics.
    • Every time it is possible, prefer a few big objects over many small ones:
      • String lists → comma-separated strings. Or pipe-separated. Or newlines. Whatever.
      • Number lists → array.array or numpy.array
  • Sweep from time to time

    • If you keep caches with expiration times, clean the cache regularly to remove expired elements. Don't be lazy.

    • Sometimes you can “desfragment” memory, rebuilding persistent structures like caches.

      In fact, java's garbage collector does exactly this, automatically, and many projects are seeking to implement a similar garbage collector for Python, but Python's extension API, Python/C, makes it hard since it allows unmanaged pointers to PyObject, structures that represent Python objects)*.

  • Change is good

    • Don't create eternal structures.
    • Caches always expire.
    • Threads get renewed.
  • The house is for living, the office for working

    • Whenever possible, perform memory intensive tasks in a subprocess, which will free all memory and tidily clean up when finished.
    • The subprocess is the office. You work there.
    • The main process is the house. You live there.

Help PET: Donate

blog comments powered by Disqus

Last Change: Sat Jul 9 15:00:35 2011.  -  This magazine is under a Creative Commons license