• Browser | Apache | PHP | -SQL Queries | MySQL
  • Common pattern for optimization: inspect each layer, add little buckets of caches everywhere
  • "Fast track" through the different layers to get out requests more efficiently
  • On browser side: Mod Expires, sends a message to the browser and says "I've got this info, you've already looked at it, we're good"
  • Firebug will show you all the individual requests- says how many kb it takes to download (if you only have to download a little bit when you refresh, that's good)
  • CDN - Content Delivery Networks and reverse proxy caches: any stuff that hasn't changed, you don't have to ask your internal infrastructure to handle that (hand it off to geolocated servers optimized to quickly serve out that info)
  • Proxy cache can be in front of your infrastructure (offload things Drupal would keep doing over and over)
  • PHP level: OpCode cache
  • MySQL level: query cache - takes all the read queries (most of the select statements) and stores the results in memory
  • Query cache, OpCode cache: half hour or less, significant improvements
  • Proxy caches and CDNs are a bit larger of a task
  • Component between database and PHP: MemCache - clone of some of Drupal's tables
  • MemCache: take all the cached tables, hold it in memory
  • MemCache also used for sessions - if your sessions table is locking up, your site is about to implode
  • MemCache also used to speed up path aliasing stuff

Apache Requirements

  • Apache 1.3.x or 2.x, ability to read .htaccess fiels, AllowOverrideALL
  • If we take information in .htaccess and put it in main Apache config file - it's faster, it might not be a huge bump in performance, turn off dynamic configuration of Apache
  • mod_rewrite (clean URLs), mod_php (Apache integration), mod_expires
  • MaxClients- number of connections you can have to Apache at once; if you set it too high for your server, you'll run out of memory
  • RAM / AvgApache mem size = # max clients

mod_expires

  • ExpiresDefault A1209600 (AKA "two weeks")
  • ExpiresByType text/html A1 (all images, CSS, javascript: they get cached for two weeks, except the text/html)
  • We can't cache html in Drupal because that's dynamic
  • This is telling Apache to send the headers to the browser that tell the browser it's ok to cache it

KeepAlive

  • There's overhead to opening TCP/IP connections
  • "We can have a conversation this long" - Apache and browser can keep a conversation going long enough to download an entire page
  • KeepAliveTimeout 2 (but you can monitor Apache threads to determine when a process turns into a wait process, refine it)
  • Resources: linuxgazette.net/123/vishnu.html

PHP requirements

  • 5.2.x, XMl extension, GD image library, Curl support, register_globals:off, safe_mode:off
  • PHP Opcode Cache: removes "compile to operation codes" steps - go right from parse PHP to execute
  • APC: http://pecl.php.net/package/APC
  • php.ini: max_execution_time = 60, memory_limit = 96M
  • If you're uploading big things, you might need more; if you're doing image handling/image manipulating (image cache to dynamically create image derivatives) may need to increase memory
  • Opcode cache is going to increase size of each Apache process? Or maybe not? (Debate ensues)
  • In any case, check and see if Apache is holding onto more memory
  • Use PHP best practice (don't count things over and over - store that count and then move on)

True or False?

  • The more modules you enable, the slower your site becomes (TRUE!)
    • Sometimes you may not need a module for that - 5 lines of code and it's done (don't need a birthday module with candles, etc if you just need the number)
    • "Do I really need to enable this module?"
  • When my site is getting hammered, I should increase the MaxClients option to handle more traffic (FALSE!)
    • You'll run out of memory, start swapping, and die
  • echo() is faster than print() (WHO CARES?)
    • This is taking things a little too far

Database server

  • MySQL 5.0.x or 5.1.33 or higher (there's some problems before 5.1.33 with CCK)
  • MyISAM by default
  • In Drupal 7, there are changes - MyISAM locks the entire table from writing when one thing is getting written somewhere; the access column, user table, session table is getting written to on every page request - this can cause problems
  • Drupal 7 uses InnoDB - row-level locking, transactions, foreign key support, more robustness (less likely to get corrupted tables)
  • If you have a table that's primarily read, MyISAM is a little faster
  • Query caching - specify query_cache_size (64M?), max_allowed_packet (16M?)
  • Is query cache size relative to table size? - yes, basically a bucket for read queries; how many result sets do you want to store in query cache

Query optimization

  • Find a slow query (can look at slow query log in MySQL), debug the query using EXPLAIN, it shows what's getting joined together and all sorts of other details; save the query, save the world
  • log-slow-queries = /var/log/slow_query.log
  • log_query_time = 5 (5 milliseconds)
  • #log-queries-not-using-indexes: little ones that get run a ton, if you tweak that, you'll optimize the site (voting API, casting a vote)
  • Add an index to reduce the number of rows it has to look through (tradeoff: it adds a little bit of time before a write can happen)

Drupal

  • Use Pressflow: same APIs as Drupal core but supports MySQL replication, reverse proxy caching, PHP 5 optimizatinos
  • pressflow.org
  • Almost all Pressflow changes make it back to core Drupal for the next release
  • Cron is serious business - run it
  • Drupal performance screen (/admin/settings/performance)
  • We can't cache HTML like we can cache other things... but there's a way to do it
  • It's disabled by default; the normal version takes requests (stores anonymous-user-viewing-a-page and stores it in the database)
  • Aggressive cache bypasses some of the normal startup-y kind of things
  • Aggressive cache lets you know if there's any modules that might be affected by enabling aggressive caching (such as Devel module)
  • MTV runs on 4 web servers and a database server - and has TON of caching/CDN
  • CDN is great for a huge spike in traffic
  • If you don't have $$ for a CDN, use a reverse proxy like Varnish: don't ask Drupal to keep generating stuff for anonymous traffic
  • Block caching is good
  • Optimize CSS = aggregate and merge (20 requests for CSS files can go to 2)
  • JSAggregator does compression for javascript (but be sure that you've got all the right semicolons)

Tools of the trade

  • Reverse proxy caches: like your own mini mini CDN; Varnish (varnish-cache.com)
  • Set time to live for your content - this leads to regulated traffic off the originating server
  • whitehouse.gov is being served all through Akamai; when you do a search, or post something you start to hit the original Drupal
  • Apache Benchmark - impact of your code on your site
  • It's built-in with Apache (ab from command line)
  • ab -n 10 -c 10 http://www.example.com/ (10 requests, 10 at a time)
  • You get back a number (requests per second your site can handle)
  • More complicated for authenticated users; first, turn off all caching (for worst case scenario), look at the cookie and get the session ID, and do: ab -n 10 -c -C PHPSESSID=[whatever it is] http://www.example.com

devel module

  • Not suggested for a production site; Masquerade module is for switching users on a live site
  • Print out database queries for each page
  • Switch users
  • View session information
  • dsm()
  • db_queryd()
  • timer_start(), timer_stop()

MySQL Tuning Scripts

  • blog.mysqltuner.com
  • www.maatkit.org - makes human-friendly reports from slow query report

Kinds of scalability

  • Scalability - how long can you survive the load
  • Scaling: viral widgets, there, the mantra isn't "protect the database", it's "protect the web servers" - get more web servers
  • Spike in anonymous user traffic (getting Slashdotted): site is a place for authenticated users, offload anonymous user traffic
  • Tons of authenticated users: 100k employees logging into an infrastructure from 9 to 5 - big, beefy servers in a hosting location

Where do you start?

  • Do the quick wins first
  • Save time for load testing
  • RAM is cheap, MemCache is a nice solution
  • If you get a warning about upcoming spikes in traffic, that triggers reverse proxy cache, CDN
  • Work with hosting companies that know their infrastructure; build a relationship with them early on to have these kinds of conversations
  • Some crashes are just a misunderstanding about what Drupal needs (going from a static site to Drupal without making changes)

When your server's on fire

  • Always have breathing room if you can
  • If you've done MemCache, query caching, gone through all of that... add another box
  • Add another virtual server
  • Scalability = redundancy; back yourself up
  • If the site goes down, will you lose money? If yes, invest in infrastructure