Thursday, May 19, 2011

Proving Rails memory leak with gdb

This is a follow up to Rails3 leaks (more) memory?.

I am no gdb expert by any means, but I thought I'd give it a try to see if I can identify the cause behind the controller instances. The basic approach is to catch the GC in the act of marking a controller instance and follow the chain to try to identify why it is being marked (as marked stuff is not swept).

Couple setup steps. First, I re-compiled ruby with debugging symbols which turned out to be trivial thanks to rvm. Second, I removed all memprof stuff from my simple test app.

With that out of the way, I can start the app with rails server -e production and get it in the right state with,
curl http://localhost:3000/authors/new curl http://localhost:3000/authors/new curl http://localhost:3000/authors/new curl http://localhost:3000/final_gc Finally I can attach to this running process with gdb.

I know there is an instance of the AuthorsController somewhere on the heap. So, I can look up the class that is the AuthorsController with,
(gdb) call rb_const_get(rb_cObject, rb_intern("AuthorsController")) $14 = 4315109640 and set a conditional breakpoint in rc_gc_mark function in gc.c to look for objects of the class i'm interested in,
(gdb) break gc.c:1361 (gdb) cond 1 (rb_type(ptr) == 2 && rb_class_real(rb_class_of(obj)) == 4315109640) (gdb) c Now I hit the server with curl http://localhost:3000/final_gc and wait ... the condition on the breakpoint makes the gc process much slower, but eventually it stopped,
Breakpoint 1, rb_gc_mark (ptr=4377820520) at gc.c:1361 warning: Source file is more recent than executable. /Users/pkmiec/.rvm/src/ree-1.8.7-2010.02/source/gc.c:1361:30666:beg:0x100043cb4 (gdb) bt #0 rb_gc_mark (ptr=4377820520) at gc.c:1361 #1 0x000000010003064b in blk_mark (data=0x102a50b00) at eval.c:8596 #2 0x00000001000443e4 in gc_mark_children (ptr=4377672960) at gc.c:1570 #3 0x0000000100043ce7 in rb_gc_mark (ptr=4377673120) at gc.c:1366 #4 0x0000000100043b6f in mark_entry (key=66906, value=4377673120) at gc.c:1293 #5 0x00000001000a959c in st_foreach (table=0x10250df70, func=0x100043b56 , arg=0) at st.c:487 #6 0x0000000100043b9e in rb_mark_tbl (tbl=0x10250df70) at gc.c:1302 #7 0x00000001000442ce in gc_mark_children (ptr=4315109640) at gc.c:1538 #8 0x0000000100043ce7 in rb_gc_mark (ptr=4315109640) at gc.c:1366 #9 0x000000010004434d in gc_mark_children (ptr=4315134280) at gc.c:1551 ... First, I verified the breakpoint condition,
(gdb) call ((struct RString *)classname(rb_class_real(4315109640)))->ptr $5 = 0x10250df50 "AuthorsController" What caught my eye next is line #8 ... rb_gc_mark of 4315109640 ... but wait 4315109640 is the class. So luckily the "distance" between something known and the instance is very short! Then I tried to decipher what line #4 means,
(gdb) call rb_id2name(66906) $8 = 0x102731700 "@_config" (gdb) call rb_type(4377673120) $2 = 11 # <- Hash (gdb) p ((RVALUE *)4377673120)->as.hash $15 = { basic = { flags = 16971, klass = 4315911160 }, tbl = 0x10047aa40, iter_lev = 0, ifnone = 4377672960 } So, the class has an instance variable called @_config and it's value is a hash. You can see this is in fact true by looking in active_support (3.0.7) at configurable.rb line 12. Line #3 says this hash is being marked and line #2 says 4377672960 is marked. Notice 4377672960 is the ifnone of the hash ... so in ruby this is the block passed to Hash.new. This is exactly how ActiveSupport::InheritableOptions.new works (which is what is assigned to @_config).
(gdb) call rb_type(4377672960) $7 = 34 (gdb) p ((RVALUE *)4377672960)->as.data $11 = { basic = { flags = 98, klass = 4331614640 }, dmark = 0x1000305f1 , dfree = 0x1000306ea , data = 0x102a94c90 } I didn't know what data is :(, so I looked at blk_mark function at eval.c:8596,
static void blk_mark(data) struct BLOCK *data; { while (data) { rb_gc_mark_frame(&data->frame); rb_gc_mark((VALUE)data->scope); rb_gc_mark((VALUE)data->var); rb_gc_mark((VALUE)data->body); rb_gc_mark((VALUE)data->self); <---------- line 8596 !!! rb_gc_mark((VALUE)data->dyna_vars); rb_gc_mark((VALUE)data->cref); rb_gc_mark(data->wrapper); rb_gc_mark(data->block_obj); data = data->prev; } } Bingo! A request comes to /authors/new and is routed to an instance of AuthorsController. That instance renders a view and that view asks the controller instance for it's config. The controller instance asks the controller class for its config. The AuthorsController hasn't initialized its config yet so it creates an instance of ActiveSupport::InheritableOptions (which extends OrderedOptions which extends OrderedHash). ActiveSupport::InheritableOptions.new is implemented as super() { |h,k| parent[k] }. That block creates a closure which captures self which is the controller instance. Here is where I smell a ruby bug as I don't understand why self at this point is the controller instance.

This makes sense. The controller instances left in memory are the ones that perform lazy initialization and that lazy initialization creates scopes which are assigned to global or class level variables. This is why eager loading at start-up works.

5 Comments:

Blogger Unknown said...

Nice job tracking this down.

When you know the address of a specific object that's being kept alive, you can also use hardware breakpoints to watch for changes to its mark bit: https://gist.github.com/355820

`blk_mark` has a `while(data)` that follows the `data->prev` linked list up. Do you know if the hash's `ifnone->as.data.data->self` is holding a reference, or is it further up the chain? I would guess it is further up, and that `data->prev` is incorrectly being set to a non-NULL value somewhere in the linked list.

May 19, 2011 at 10:15 PM  
Blogger Paul Kmiec said...

Thanks for the tips Aman. I'll do more research this weekend. One question: initially i was trying to use the object_id in gdb (for example, with id2ref) but i was just getting crashes. Can the object_id be tied back to memory location in gdb?

May 20, 2011 at 11:06 AM  
Blogger Unknown said...

You basically need to do the opposite of `rb_obj_id()`. On 64-bit systems, this is usually as simple as `obj_id >> 1`.

May 20, 2011 at 8:13 PM  
Blogger Paul Kmiec said...

Ah .. sorry, I didn't look at the gist yet when I asked about using object_id. The gist is doing exactly that :).

Your intuition is correct (https://gist.github.com/985014). The "chain" has two blocks in it and it's the second one holding the reference to the controller instance.

I'm going try to figure out how it ends up like that. Looks like blocks are created with PUSH_BLOCK and the usage in rb_f_binding seems like where it happens in this case.

May 21, 2011 at 5:11 PM  
Blogger Unknown said...

Also, I noticed the comment form on your blog is messed up, the publish button is hidden, I think below a div. I had to hack the css styles to get it show up.

June 5, 2012 at 2:29 PM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home