Do It With Drupal: Drupal Under Pressure: Performance and Scalability
- Browser | Apache | PHP | -SQL Queries | MySQL
- Common pattern for optimization: inspect each layer, add little buckets of caches everywhere
- "Fast track" through the different layers to get out requests more efficiently
- On browser side: Mod Expires, sends a message to the browser and says "I've got this info, you've already looked at it, we're good"
- Firebug will show you all the individual requests- says how many kb it takes to download (if you only have to download a little bit when you refresh, that's good)
- CDN - Content Delivery Networks and reverse proxy caches: any stuff that hasn't changed, you don't have to ask your internal infrastructure to handle that (hand it off to geolocated servers optimized to quickly serve out that info)
- Proxy cache can be in front of your infrastructure (offload things Drupal would keep doing over and over)
- PHP level: OpCode cache
- MySQL level: query cache - takes all the read queries (most of the select statements) and stores the results in memory
- Query cache, OpCode cache: half hour or less, significant improvements
- Proxy caches and CDNs are a bit larger of a task
- Component between database and PHP: MemCache - clone of some of Drupal's tables
- MemCache: take all the cached tables, hold it in memory
- MemCache also used for sessions - if your sessions table is locking up, your site is about to implode
- MemCache also used to speed up path aliasing stuff
Apache Requirements
- Apache 1.3.x or 2.x, ability to read .htaccess fiels, AllowOverrideALL
- If we take information in .htaccess and put it in main Apache config file - it's faster, it might not be a huge bump in performance, turn off dynamic configuration of Apache
- mod_rewrite (clean URLs), mod_php (Apache integration), mod_expires
- MaxClients- number of connections you can have to Apache at once; if you set it too high for your server, you'll run out of memory
- RAM / AvgApache mem size = # max clients
mod_expires
- ExpiresDefault A1209600 (AKA "two weeks")
- ExpiresByType text/html A1 (all images, CSS, javascript: they get cached for two weeks, except the text/html)
- We can't cache html in Drupal because that's dynamic
- This is telling Apache to send the headers to the browser that tell the browser it's ok to cache it
KeepAlive
- There's overhead to opening TCP/IP connections
- "We can have a conversation this long" - Apache and browser can keep a conversation going long enough to download an entire page
- KeepAliveTimeout 2 (but you can monitor Apache threads to determine when a process turns into a wait process, refine it)
- Resources: linuxgazette.net/123/vishnu.html
PHP requirements
- 5.2.x, XMl extension, GD image library, Curl support, register_globals:off, safe_mode:off
- PHP Opcode Cache: removes "compile to operation codes" steps - go right from parse PHP to execute
- APC: http://pecl.php.net/package/APC
- php.ini: max_execution_time = 60, memory_limit = 96M
- If you're uploading big things, you might need more; if you're doing image handling/image manipulating (image cache to dynamically create image derivatives) may need to increase memory
- Opcode cache is going to increase size of each Apache process? Or maybe not? (Debate ensues)
- In any case, check and see if Apache is holding onto more memory
- Use PHP best practice (don't count things over and over - store that count and then move on)
True or False?
- The more modules you enable, the slower your site becomes (TRUE!)
- Sometimes you may not need a module for that - 5 lines of code and it's done (don't need a birthday module with candles, etc if you just need the number)
- "Do I really need to enable this module?"
- When my site is getting hammered, I should increase the MaxClients option to handle more traffic (FALSE!)
- You'll run out of memory, start swapping, and die
- echo() is faster than print() (WHO CARES?)
- This is taking things a little too far
Database server
- MySQL 5.0.x or 5.1.33 or higher (there's some problems before 5.1.33 with CCK)
- MyISAM by default
- In Drupal 7, there are changes - MyISAM locks the entire table from writing when one thing is getting written somewhere; the access column, user table, session table is getting written to on every page request - this can cause problems
- Drupal 7 uses InnoDB - row-level locking, transactions, foreign key support, more robustness (less likely to get corrupted tables)
- If you have a table that's primarily read, MyISAM is a little faster
- Query caching - specify query_cache_size (64M?), max_allowed_packet (16M?)
- Is query cache size relative to table size? - yes, basically a bucket for read queries; how many result sets do you want to store in query cache
Query optimization
- Find a slow query (can look at slow query log in MySQL), debug the query using EXPLAIN, it shows what's getting joined together and all sorts of other details; save the query, save the world
- log-slow-queries = /var/log/slow_query.log
- log_query_time = 5 (5 milliseconds)
- #log-queries-not-using-indexes: little ones that get run a ton, if you tweak that, you'll optimize the site (voting API, casting a vote)
- Add an index to reduce the number of rows it has to look through (tradeoff: it adds a little bit of time before a write can happen)
Drupal
- Use Pressflow: same APIs as Drupal core but supports MySQL replication, reverse proxy caching, PHP 5 optimizatinos
- pressflow.org
- Almost all Pressflow changes make it back to core Drupal for the next release
- Cron is serious business - run it
- Drupal performance screen (/admin/settings/performance)
- We can't cache HTML like we can cache other things... but there's a way to do it
- It's disabled by default; the normal version takes requests (stores anonymous-user-viewing-a-page and stores it in the database)
- Aggressive cache bypasses some of the normal startup-y kind of things
- Aggressive cache lets you know if there's any modules that might be affected by enabling aggressive caching (such as Devel module)
- MTV runs on 4 web servers and a database server - and has TON of caching/CDN
- CDN is great for a huge spike in traffic
- If you don't have $$ for a CDN, use a reverse proxy like Varnish: don't ask Drupal to keep generating stuff for anonymous traffic
- Block caching is good
- Optimize CSS = aggregate and merge (20 requests for CSS files can go to 2)
- JSAggregator does compression for javascript (but be sure that you've got all the right semicolons)
Tools of the trade
- Reverse proxy caches: like your own mini mini CDN; Varnish (varnish-cache.com)
- Set time to live for your content - this leads to regulated traffic off the originating server
- whitehouse.gov is being served all through Akamai; when you do a search, or post something you start to hit the original Drupal
- Apache Benchmark - impact of your code on your site
- It's built-in with Apache (ab from command line)
- ab -n 10 -c 10 http://www.example.com/ (10 requests, 10 at a time)
- You get back a number (requests per second your site can handle)
- More complicated for authenticated users; first, turn off all caching (for worst case scenario), look at the cookie and get the session ID, and do: ab -n 10 -c -C PHPSESSID=[whatever it is] http://www.example.com
devel module
- Not suggested for a production site; Masquerade module is for switching users on a live site
- Print out database queries for each page
- Switch users
- View session information
- dsm()
- db_queryd()
- timer_start(), timer_stop()
MySQL Tuning Scripts
- blog.mysqltuner.com
- www.maatkit.org - makes human-friendly reports from slow query report
Kinds of scalability
- Scalability - how long can you survive the load
- Scaling: viral widgets, there, the mantra isn't "protect the database", it's "protect the web servers" - get more web servers
- Spike in anonymous user traffic (getting Slashdotted): site is a place for authenticated users, offload anonymous user traffic
- Tons of authenticated users: 100k employees logging into an infrastructure from 9 to 5 - big, beefy servers in a hosting location
Where do you start?
- Do the quick wins first
- Save time for load testing
- RAM is cheap, MemCache is a nice solution
- If you get a warning about upcoming spikes in traffic, that triggers reverse proxy cache, CDN
- Work with hosting companies that know their infrastructure; build a relationship with them early on to have these kinds of conversations
- Some crashes are just a misunderstanding about what Drupal needs (going from a static site to Drupal without making changes)
When your server's on fire
- Always have breathing room if you can
- If you've done MemCache, query caching, gone through all of that... add another box
- Add another virtual server
- Scalability = redundancy; back yourself up
- If the site goes down, will you lose money? If yes, invest in infrastructure