Memory profiling in hooks
This tutorial involves hacking core. If you aren't comfortable with doing that, you're probably in the wrong place :). I created a dropbucket.org drop called Memory profiling in hooks which has all the code details in case you want to dig in yourself. You'll need to modify includes/module.inc and also have devel module enabled for this to work.
I've been trying to do some low-level tuning while developing a course site in ELMSLN. As part of the project I'm constantly trying to push as many users into one system as I can prior to scaling resources. You can read about past efforts to push drupal core in this and other linked posts: Drupal speed tuning: analyzing and further optimizing Pressflow
Today, I wanted to get a better sense of why a page was taking 17.75megs of ram to render. The better we can understand what's running to accomplish page delivery, the better we can dig in and help drop that number. The lower that number per page load, the more users we can support on the same resources and the more we can keep costs down while increasing service offering to our users.
After applying the code from dropbucket, you'll see something like this whenever you load any page (see right). The hook modification monitors the memory in use at the start and end of every call to module_invoke_all. This tells you how much memory was required to fire off that hook.
It does the same thing for each function invoking that hook and then repeats the same thing for drupal_alter statements as well. This gives you a ton of data to sort through but as you do, you can start to get a clearer picture of both what it takes to deliver the page as well as where bottlenecks in RAM usage might be.
XDebug is a php library that can also be useful for finding bottlenecks if you are interested in digging even deeper. I wanted per page load stats and to start to do so per role so that's why I went the modifying drupal core route (this time).
We know what's bloated, now what?
In my testing I was looking for a few things specifically; what was slow but more importantly, was it anything custom I had written. Because a major library that I use in ELMSLN has a lot of potential performance implications, I was looking for memory usage by functions starting with "cis".
As it turns out, memory usage was pretty low because I followed this great lullabot article by Jeff Eaton while developing the calls that happen quite often. I put in the artificial metric that if it was a function under 50KB I didn't care about it so that I got less data (otherwise you get things that are like 48 bytes all over the place).
I did find a few functions that I need to do some additional tuning on / make decisions about as a result of this test. The functions in question were:
- book_node_view (584 KB)
- special_menu_items_init (514 KB)
- og_context_init (475 KB)
- context_init (448 KB)
- token_module_implements_alter (262 KB )
- feeds_forms (145 KB)
- regions_page_alter (116 KB)
- stream_wrappers (115 KB)
- boxes_page_alter (110 KB)
In total, this is telling me that of my 17.75 MB page load, around 2.7 MBs can be attributed to 9 functions. There are hundreds / thousands of functions that build any given page and now I know that 9 of those account for 15% of the memory required to deliver any given page.
From here, I have some decisions to make now that I have this information. First, I have to look at the low hanging fruit functions that I can get rid of. Feeds (in the MOOC distribution) is only used to initially import content, after that it's there for convenience but that convenience is apparently (feeds_forms) adding 145 KB to every page for every user regardless of if they are using feeds. That's a bit ridiculous right? So, when in production, turn off Feeds (or at least I can). We also only user Boxes module for 1 block at the footer of the page; is that really worth having Boxes load on every page for one block to be loaded and be exportable. special_menu_items is used to prettify one menu system, which definitely can't be justified with it's half a meg footprint so it will most likely be dropped from the core supported package.
What can't I really change?
The second decision. Well, all systems in ELMSLN are built on top of Context, OG, Tokens, and (MOOC at least) the Book module. This makes it rather difficult to get rid of og_context, context_init, book_node_view, and token_module_implements_alter. Steam_wrappers is also something built into core which just takes time to run so I don't have much of a choice there.
I may have to live with book_node_view or look and see if I'm invoking it to load in such a way that it's called when it doesn't have to be. If you look at the API docs, it's basically just adding previous / next navigation to the bottom of pages and styling it. Maybe we can write a custom replacement to cache this call in memory so that after it initially builds previous / next links that it just has access to this data in memory. It might seem like a lot of effort, but to save .5 megs per page load it could be a big win.
What can I fix without much effort?
regions_page_alter is provided as part of the Regions module, one of my own. I can either fix this or, as we're starting to do in MOOC / the new ELMSLN UX; transition away from it (for the most part). This is something I have total control over though so it might be worth investigating if I can fix it quickly or at least reduce its footprint. I may also just have to remove some modules like boxes, regions, and feeds which are all sure to drop the page load.
What should I explore "fixing" if its even possible?
Well, my top for calls account for almost 2 megs on every page load, those are obviously where I have to start. The problem I may find though is that Context and OG are heavily used by a ton of people. I doubt I'll find inefficiencies there but it's possible. Special Menu Items is worth investigation though it might be better served to just remove it from the core package. Book links I already outlined a work around which would be to cache the call to the data needed from it. As book is core and we're this late in the game I'm not going to see a patch accepted to core to fix this in my life time :).
What's 2 megs in the grand scheme of life?
Let's say I have 4 GB of ram on my server, 2 GB set aside for Apache transactions. From there we can do some simple math to see how many concurrent connections we can support. Again, the goal here is do as much as possible with as little as possible (yes pushing this into the cloud / NGINX / advanced caching techniques for auth'ed traffic can drop it even more).
Me logged in
2048 Megs / 17.75 MB/user = 115 users
I'm not the target audience though, so me logged in as a student (when giving the role ability to view devel output)
2048 Megs / 16.75 MB/user = 122 users
Right there, by switching roles and analyzing I was able to drop 1 meg which translated to 7 additional users on the same resources. Now let's factor in a theoretical reduction of that bloated 2.7 MB.
2048 Megs / 14.05 MB/user = 146 users
This is where low level performance tuning can have big impact. If I can clean up just 2.7 megs from the page load this would increase out load potential by 24 users or a 16% gain in scale without actually scaling up our systems.
By tuning for the least common demoninator, when we start to scale up and get into distributed memcache servers or adding a more realistic amount of RAM, these percentages all can add up to a lot more users on a lot less resources. With a theoretical 16 GB of RAM allocated to Apache we could serve 1,166 concurrent users. As past numbers have shown though, even with 2,200 - 3,500 students a semester, we're never anywhere close to this number of concurrent users.
Since performing this audit I've been able to decouple our system (in dev testing) from og_context, regions and boxes modules. Then, using hook_module_implements_alter I was able to further optimize the book_node_view by creating a new module called Book Cache and completely removed special_menu_items_init which wasn't serving the purpose we have for the module. I also discovered a performance enhancement for the og_ui module which will increase the performance of every node load once accepted! https://github.com/Gizra/og/pull/38
After applying these techniques and disabling feeds I was able to get things down to the following numbers when logged in as me:
14.5 MB on the test page which is 3.25 MB smaller then when I started!
As the target audience role the drop was from 16.75 MB to 14.75 MB for a 2 MB gain for typical users!
2048 Megs / 14.75 MB/user = 139 users which is a 12.25% gain over the original configuration
Page delivery times I've optimized in the past via the APC user bin so authenticated page delivery times are usually around 250ms for a site running around 145 modules