Do It With Drupal: Drupal Under Pressure: Performance and Scalability

  • Browser | Apache | PHP | -SQL Queries | MySQL
  • Common pattern for optimization: inspect each layer, add little buckets of caches everywhere
  • "Fast track" through the different layers to get out requests more efficiently
  • On browser side: Mod Expires, sends a message to the browser and says "I've got this info, you've already looked at it, we're good"
  • Firebug will show you all the individual requests- says how many kb it takes to download (if you only have to download a little bit when you refresh, that's good)
  • CDN - Content Delivery Networks and reverse proxy caches: any stuff that hasn't changed, you don't have to ask your internal infrastructure to handle that (hand it off to geolocated servers optimized to quickly serve out that info)
  • Proxy cache can be in front of your infrastructure (offload things Drupal would keep doing over and over)
  • PHP level: OpCode cache
  • MySQL level: query cache - takes all the read queries (most of the select statements) and stores the results in memory
  • Query cache, OpCode cache: half hour or less, significant improvements
  • Proxy caches and CDNs are a bit larger of a task
  • Component between database and PHP: MemCache - clone of some of Drupal's tables
  • MemCache: take all the cached tables, hold it in memory
  • MemCache also used for sessions - if your sessions table is locking up, your site is about to implode
  • MemCache also used to speed up path aliasing stuff

Apache Requirements

  • Apache 1.3.x or 2.x, ability to read .htaccess fiels, AllowOverrideALL
  • If we take information in .htaccess and put it in main Apache config file - it's faster, it might not be a huge bump in performance, turn off dynamic configuration of Apache
  • mod_rewrite (clean URLs), mod_php (Apache integration), mod_expires
  • MaxClients- number of connections you can have to Apache at once; if you set it too high for your server, you'll run out of memory
  • RAM / AvgApache mem size = # max clients


  • ExpiresDefault A1209600 (AKA "two weeks")
  • ExpiresByType text/html A1 (all images, CSS, javascript: they get cached for two weeks, except the text/html)
  • We can't cache html in Drupal because that's dynamic
  • This is telling Apache to send the headers to the browser that tell the browser it's ok to cache it


  • There's overhead to opening TCP/IP connections
  • "We can have a conversation this long" - Apache and browser can keep a conversation going long enough to download an entire page
  • KeepAliveTimeout 2 (but you can monitor Apache threads to determine when a process turns into a wait process, refine it)
  • Resources:

PHP requirements

  • 5.2.x, XMl extension, GD image library, Curl support, register_globals:off, safe_mode:off
  • PHP Opcode Cache: removes "compile to operation codes" steps - go right from parse PHP to execute
  • APC:
  • php.ini: max_execution_time = 60, memory_limit = 96M
  • If you're uploading big things, you might need more; if you're doing image handling/image manipulating (image cache to dynamically create image derivatives) may need to increase memory
  • Opcode cache is going to increase size of each Apache process? Or maybe not? (Debate ensues)
  • In any case, check and see if Apache is holding onto more memory
  • Use PHP best practice (don't count things over and over - store that count and then move on)

True or False?

  • The more modules you enable, the slower your site becomes (TRUE!)
    • Sometimes you may not need a module for that - 5 lines of code and it's done (don't need a birthday module with candles, etc if you just need the number)
    • "Do I really need to enable this module?"
  • When my site is getting hammered, I should increase the MaxClients option to handle more traffic (FALSE!)
    • You'll run out of memory, start swapping, and die
  • echo() is faster than print() (WHO CARES?)
    • This is taking things a little too far

Database server

  • MySQL 5.0.x or 5.1.33 or higher (there's some problems before 5.1.33 with CCK)
  • MyISAM by default
  • In Drupal 7, there are changes - MyISAM locks the entire table from writing when one thing is getting written somewhere; the access column, user table, session table is getting written to on every page request - this can cause problems
  • Drupal 7 uses InnoDB - row-level locking, transactions, foreign key support, more robustness (less likely to get corrupted tables)
  • If you have a table that's primarily read, MyISAM is a little faster
  • Query caching - specify query_cache_size (64M?), max_allowed_packet (16M?)
  • Is query cache size relative to table size? - yes, basically a bucket for read queries; how many result sets do you want to store in query cache

Query optimization

  • Find a slow query (can look at slow query log in MySQL), debug the query using EXPLAIN, it shows what's getting joined together and all sorts of other details; save the query, save the world
  • log-slow-queries = /var/log/slow_query.log
  • log_query_time = 5 (5 milliseconds)
  • #log-queries-not-using-indexes: little ones that get run a ton, if you tweak that, you'll optimize the site (voting API, casting a vote)
  • Add an index to reduce the number of rows it has to look through (tradeoff: it adds a little bit of time before a write can happen)


  • Use Pressflow: same APIs as Drupal core but supports MySQL replication, reverse proxy caching, PHP 5 optimizatinos
  • Almost all Pressflow changes make it back to core Drupal for the next release
  • Cron is serious business - run it
  • Drupal performance screen (/admin/settings/performance)
  • We can't cache HTML like we can cache other things... but there's a way to do it
  • It's disabled by default; the normal version takes requests (stores anonymous-user-viewing-a-page and stores it in the database)
  • Aggressive cache bypasses some of the normal startup-y kind of things
  • Aggressive cache lets you know if there's any modules that might be affected by enabling aggressive caching (such as Devel module)
  • MTV runs on 4 web servers and a database server - and has TON of caching/CDN
  • CDN is great for a huge spike in traffic
  • If you don't have $$ for a CDN, use a reverse proxy like Varnish: don't ask Drupal to keep generating stuff for anonymous traffic
  • Block caching is good
  • Optimize CSS = aggregate and merge (20 requests for CSS files can go to 2)
  • JSAggregator does compression for javascript (but be sure that you've got all the right semicolons)

Tools of the trade

  • Reverse proxy caches: like your own mini mini CDN; Varnish (
  • Set time to live for your content - this leads to regulated traffic off the originating server
  • is being served all through Akamai; when you do a search, or post something you start to hit the original Drupal
  • Apache Benchmark - impact of your code on your site
  • It's built-in with Apache (ab from command line)
  • ab -n 10 -c 10 (10 requests, 10 at a time)
  • You get back a number (requests per second your site can handle)
  • More complicated for authenticated users; first, turn off all caching (for worst case scenario), look at the cookie and get the session ID, and do: ab -n 10 -c -C PHPSESSID=[whatever it is]

devel module

  • Not suggested for a production site; Masquerade module is for switching users on a live site
  • Print out database queries for each page
  • Switch users
  • View session information
  • dsm()
  • db_queryd()
  • timer_start(), timer_stop()

MySQL Tuning Scripts

  • - makes human-friendly reports from slow query report

Kinds of scalability

  • Scalability - how long can you survive the load
  • Scaling: viral widgets, there, the mantra isn't "protect the database", it's "protect the web servers" - get more web servers
  • Spike in anonymous user traffic (getting Slashdotted): site is a place for authenticated users, offload anonymous user traffic
  • Tons of authenticated users: 100k employees logging into an infrastructure from 9 to 5 - big, beefy servers in a hosting location

Where do you start?

  • Do the quick wins first
  • Save time for load testing
  • RAM is cheap, MemCache is a nice solution
  • If you get a warning about upcoming spikes in traffic, that triggers reverse proxy cache, CDN
  • Work with hosting companies that know their infrastructure; build a relationship with them early on to have these kinds of conversations
  • Some crashes are just a misunderstanding about what Drupal needs (going from a static site to Drupal without making changes)

When your server's on fire

  • Always have breathing room if you can
  • If you've done MemCache, query caching, gone through all of that... add another box
  • Add another virtual server
  • Scalability = redundancy; back yourself up
  • If the site goes down, will you lose money? If yes, invest in infrastructure


Do It With Drupal: Drupal In The Cloud

Josh Koenig
josh - at -
About the cloud

  • "Cloud" as new model for hosting
  • Traditional hosting = real estate (rack space)
  • Most real estate customers are renters, few love their landlord - landlords sometimes cut corners and do the bare minimum to keep you happy... but you need this
  • Owning comes with lots of responsibilities and hidden costs
  • Large scale projects are expensive, slow, and prone to setbacks
  • "The Cloud" = hosting as an API: on-demand availability
  • Hourly pricing
  • Reliable, reusable start-states: people make mistakes vs. programs that do things and you know exactly what they're going to give you
  • You can say: I want a new server, here's the distro, here's the information, here's the configuration - and I want five of them
  • The cloud = less waste, more freedom, flexibility... but not a silver bullet
  • Performance can vary (don't use it for scientifically accurate benchmarks)
  • Abstractions aren't the same as the real thing (not the same as physical servers - but for what it's worth this hasn't been a problem for Drupal)
  • New tricks to learn - power of API
  • The Cloud is Drupal's destiny - increasing Drupal's reach; you can start with pennies, scale to millions
  • Create products cheaply
  • Grow organically, but still grow fast

Launch a server in the cloud

  • ElasticFox - Amazon control panel for Firefox
  • Amazon just added locations for US west coast
  • Pantheon project: create images for cloud services that are targeted towards Drupal
  • Three images: high performance production hosting image (all the tricks already done), another for an Aegir, another for a continuous integration environment for Drupal
  • Grand vision for world-class Drupal infrastructure for pennies an hour
  • High performance production has the most work since people have been the most interested
  • Ubuntu 9.04 base config, whole LAMP stack, Pressflow pre-installed, memcached, APC, all of it is already there
  • Can monitor processes, do everything you like to do as root
  • v0.8.1 beta - but people are using it in production (in spite of disclaimer)

Who are the cloud providers

  • AWS: most mature, a lot of features, still moving quickly, added a load balancer earlier in the year; they're a utility, not interested in your particular use case; they don't tell people what they're working on or how it works
  • AWS has infrastructure for giving away free images - most don't
  • Rackspace - has Rackspace Cloud Sites (you don't get root, you put your Drupal in there, they scale it for you with mixed results); scaling any particular site requires deep knowledge of it; Rackspace Cloud Servers is better (Slicehost is built on top of Rackspace Cloud Servers)
  • Rackspace is looking to break into the space; willing to do deals, talk to you, etc
  • Voxel: smaller/smarter, also in Asia; cloud product just emerging from beta, but it's good - also lets you intermingle cloud and physical infrastructure
  • And more every day!
  • VPS is becoming quite cloudy (, slicehost,
  • Custom/managed cloud services (security, regulatory compliance issues - people will build a cloud for you: Eucalyptus, Neospire, others)
  • Cloud value-adders: Rightscale, Scalr - cloud/cluster management services
  • Cloudkick - cross-cloud services, managing different cloud providers (want to be able to move servers from one service to another); it's free; open-source LibCloud project to prevent people from getting locked into one provider
  • Cloud tools for Drupal -


  • How do you do a cost-analysis? You probably won't see the financial benefits right away, if you're going to leave it on all the time. But scaling with changing use patterns, adding/removing new instances.
  • Cost/benefit comes in disk speed performance - most cloud providers have poorer I/O performance than a physical server
  • How do you solve that problem for Drupal? - All performance/scalability work is about making Drupal do less work
  • Oriented around Drupal doing only what it needs to, and not bogging it down with things like showing the user the same page he saw a minute ago
  • Database replication for read-only queries
  • Use other tools that are better at repeated-action type jobs for those things

What is it good for

  • Testing/continuous integration
  • (Drupal testing Drupal) - not in the cloud, but will soon release cloud image of it
  • People can spin these up if Drupal finds itself in a testing bottleneck, just for the day
  • Development infrastructure: new server for each site
  • Putting things like version control (unfuddle, beanstalk)
  • Products and services: Lefora (forums), crowdfactory, olark (start with pennies, scale to millions)
  • Database layer for Drupal can be a choke point - you can duplicate it
  • High availability production hosting: Acquia is on EC2
  • Most cloud infrastructure isn't cheap at this level (running many servers, keeping them always on-line), if you're really big you'll find yourself at the top end going to traditional managed hosting because there's some levels of performance that are capped by the virtualization layer
  • Control costs for traffic patterns - geographically centralized audience for most people
  • Turning things on and off to deal with daily peaks - two more servers only on during the day
  • Instances fail, though not much more often than real servers (and remember that instances exist on real servers that do break)
  • Performance can be impacted by other local activity
  • Virtual disks tend to have relatively poor I/O performance
  • Accept the inevitability of failure, embrace the paradigm of "rapid recovery", develop architecture with modular, replaceable parts (images for each server), minimize disk/CPU utilization for menial tasks
  • "RAM is cheap" - the more you can push to things that read/write out of memory, the better

Production hosting in the cloud

  • Monitor your load - you have to look more carefully than just hits
  • Spin up more instances (scale horizontally) as you need more power
    • How does this work?
    • Could be manual process ("we need a server, let's do it") - does need some manual intervention somewhere, though in theory you could script it
    • Amazon offers an auto-scaling feature (when we need more, add servers, up to X number of features - Amazon AutoScale)
    • AutoScale is simple (doesn't cost anything, too)
    • How does this work? How do the pieces work together?
    • You need to have an image with all the pieces needed at the system level; you should use version control and have a boot script as part of the image (when the image start, the script checks out the current code base from the database and all the necessary connections), then AutoScale makes the pieces aware of what's out there
    • You can also do load balancing more manually
    • Role of sysadmin is changing - new set of things where now you don't have to worry about hard drives, but scaling up/down, saving money
    • When you're doing horizontal scaling, you trigger your image to be built, it checks out the code; Amazon also offers virtual drive service (if you're working with an application with a lot of data in file system) - can connect that data quickly
    • Bake in as much as you can to the image, then have automatic processes that fire that get the latest information, check it into infrastructure, start distributing load there
  • Add layers (scale vertically) when bottlenecks emerge
  • Create images for each layer in your infrastructure
  • Use best practices to keep things speedy

About best practices

  • Front-side caching: use Pressflow with Varnish and/or NgineX (Drupal 7 will support some of this natively)
  • Drupal is slow: complex, wonderful, brainy tool - if you're looking at the same thing over and over again, go get a tool that does only that, and quickly
  • Use APC and/or Memcached to minimize queries and the database to eliminate costly unserialize() calls
  • Drupal's native caches are good, but it does it in the database (this isn't the highest performance option, serializing/unserializing big arrays/objects)
  • Architect for vertical scaling by utilizing all service layers, even if it's one box
  • This is what "Mercury" is about
  • CREAM: Cache rules everything around me


  • Freely available on Amazon, as VMWare image, in as many ways as we can
  • Also on-demand as a service
  • "Drupal hosting, 200 times faster"
  • Standardized high-performance stack: single server image with everything you want for cluster infrastructure
  • Features: Varnish, HTTP/PHP, APC Cache, Apache Solr, MySQL
  • Make Drupal run fast, hold up under large traffic spikes
  • From one box to cluster
  • If you're running all four layers and are still falling down, or you're doing something horribly write (Twitter) or horribly wrong (all code embedded in php content nodes)


  • Mercury: going to implement configuration management system (BCFG2, probably)
  • Mercury/Pantheon - not Amazon-centric, can roll the stack out anywhere (physical hardware, whatever)
  • You'd probably make your own variant image, and sync as necessary using the configuration management system
  • If you haven't customized things heavily, you can take the latest version of Mercury, re-apply changes, and you're done (if you don't want to use the config management)
  • You can keep old images around for pennies a month


Do It With Drupal: Drupal Under Fire: Website Security

  • Your site is vulnerable (really, it is)
  • GVS offers security review service for Drupal
  • Bad things: abusing resources, stealing data, altering data
    • Abusing resources: DDOS (extorting money from site owner), using open relay in a mail sending module for spam
    • Stealing data: from users (their passwords, e-mail address)
    • Altering data: defacement
  • You don't hear about security vulnerabilities much; Drupal core mentions vulnerabilities (and updates) but not so much for modules
  • Worry in a prioritized way
  • Choose your strategy: stay ahead of the pack, or protect valuable assets?
  • Attacks focus on sites that are out of date
  • Know about releases, have a method to update your site, do it
  • Look into Aegir if you're running multiple sites


  • Available updates- settings, e-mail notifications when modules you use are updated
  • Security review module:
  • Will show if your site is under attack with a SQL injection
  • Part of security review: check off which roles are considered "trusted" - trust-checks and points out which permissions are bad to give to untrusted users
  • Can skip some of the checks so they don't nag you (if this is on a dev server, and it's not relevant)
  • There is a hook to be able to run additional checks, but not sure whether modules should be able to declare their things (do we trust module developers to come up with the right set of rules?)
  • If there's something that can take an action on your site, accessible via a link - that could be a vulnerability (i.e. the "turn off this check" feature of the security module)
  • Run it before you launch, after you make big config changes; could do it as a periodic check and e-mail the report. Is ok to have always-on for a live site though

Vulnerabilities by type

  • Announcements from; most sites have custom modules, almost always have custom themes
  • Analysis of one site: 3-4 vulnerabilities in Drupal core, 20 in contrib modules, 100 vulnerabilities in custom theme/modules (no one else is reviewing that stuff except for you)
  • XSS (cross-site scripting) - one of the hardest to fix
  • Access bypass - good ways to fix this
  • Cross site referral forgeries
  • SQL injection - easy to protect against, only getting easier


  • Anything you can do XSS can do (better)
  • XSS can change password for user 1
  • Most people don't know they've been a victim of XSS; it's in your browser, browser just executes javascript, don't know until you try to log back in
  • XSS tools exist to probe your network - even if your Drupal is on the intranet
  • Automated tools are a great way to get started, but not all that valuable in actually identifying things (false positives, false negatives)

Insecure configuration of input formats

  • Input formats and filters are confusing - people do what they need, and forget about it, and open themselves up to XSS
  • Anonymous users: shouldn't be allowed to have more than one input format
  • Filtered HTML is the right thing for untrusted roles
  • To this day, WYSIWYG modules say "give everyone access to full HTML and tinymce will just work" - NO! DON'T DO THAT!
  • Defaults are good: filtered HTML is a good thing
  • It's tempting to add images, spans, divs, etc - but different browsers have different vulnerabilities that way
  • There's a page on that talks about what's safe to put in (there's some gray area - depends on your users and their browsers)
  • Weights: HTML corrector needs to go last

XSS for Themers/Coders (and reviewers)

  • Browscap module: analyzes user agents for people who go to the site
  • Firefox extension (default user agent) - used to be for Firefox to pretend to be IE, but now people use it for other things
  • Hackers can take normal user agent and replace it with jQuery that will be sent as the user agent - PWNED
  • Is there a module that will strip in javascript from the input box? - Filtered HTML does that
  • You can strip the script, or you can escape it (so it shows up as harmless text)
  • Filtered HTML also gets rid of attributes
  • There's a module that says which attributes can come through on which tags - well, the admin screen for it is huge, the input format area is a problem because it's confusing, so do you want to add an even more confusing module?
  • Themers: Read tpl.php and default implementations; rely on your module developer for variables that are ready to be printed (hook pre-process)
  • Developers: where does the text come from, is there a way a user can change it, in what context is it being used?
  • More is from the user than you think (user agents are from the user)
  • Filtered HTML makes things safe for the browser context
  • When data leaves Drupal and goes into MySQL - need to escape the data to make it safe for putting into the database
  • Contexts: mail (some clients sorta support javascript, need to specify plaintext), database, web, server
  • Take an hour:
  • Drupal philosophy: make things secure by default
  • Escape variables using the checkplain function
  • If your site is translatable, it's also probably secure
  • If you're using the API properly, you probably don't need to worry about security (but it takes a while to learn how to use the API property)

Cross Site Request Forgeries (CSRF)

  • Taking an action without confirming the intent of that action
  • User Protect module - makes it harder to delete user 1; protects anonymous user, user 1, can add other users
  • Drupal's form API has protection from this - using links doesn't
  • An anonymous user can insert an "image" (the browser goes to look for it, and if that "image" is the link for User Protect that deletes the protection for user 1, that's bad)
  • In the case of User Protect, there's now a confirmation form - browser would just fetch confirmation form and throw that away-- requirement that you have to click on "submit" button would save you from anything bad happening
  • If you really want to use links like User Protect does, create a token based on something unique to the site, the user, and the action (and validate the token when the action is requested)
  • User session ID (unique key private to site, generated randomly at login) + form ID
  • When the action is submitted, Drupal will validate that it's still there
  • Is it possible to give permissions to manage everything EXCEPT user 1? - that's what User Protect does
  • Or, just use the form API - it includes this protection by default

Security and usability

  • Confirmation forms suck
  • BUT, truly destructive actions should be hard to do
  • Don't delete, archive and provide undo
  • Choose links or forms for usability, not security


  • - XSS Cheat Sheet
  • - CSRF


  • Rainbow tables - MD5 values for every possible password up to 6 characters
  • - has resources including list of security modules (Salt module has salting of passwords)
  • Any way to hide you're running Drupal? - data in the CSS files, standard Drupal jQuery, a few files in the root directory, expiration date for anonymous is Dries's birthday; there's all sorts of things that fingerprint a Drupal site, trying to hide you're running Drupal takes more time than it's worth if you just keep up with updates


Do It With Drupal: The Power of Features

See also Features on

  • Jeff Miccolis & Eric Gundersen - Development Seed, building a lot of products (things like Open Atrium)
  • Drupal is very configurable - but that's also a weakness: no distinction between what's configuration (views settings) and what's content
  • Workflow problem: when you build a site, you build in a dev environment, but client/boss wants to see what it looks like before it goes live
    • So, you stage it somewhere, then move it over
    • Development: where the action happens (possibly your laptop)
    • Staging: where it's reviewed (much closer to where it's going to live)
    • Production: where it's live. (developing on the live site is always a bad idea)
  • Three people working on a project that needs to go live
    • Musician, developer, themer
    • Round 1 goes great - everyone works together and the site goes live
    • Round two is a PITA: new views build on dev, rebuild on staging, rebuild on dev, rebuild on staging, over and over, rebuild on prod
    • Extensive note taking, prone to human error, loads of repeated tasks
  • The solution? Make a distinction between config and content - views and settings are heavily and clearly distinguished from the actual content - then write this configuration to code and get it out of the database
  • You can do version control with your config - this lets you track changes
  • Node types, CCK fields, menu, blocks, views - these are config
  • You can say "these components taken together define a feature" - something the site does
  • "Features" module - Feature = Drupal parts that do something specific (Views, ImageCache presets, content types, fields, etc.)
  • Features = Drupal module that allows for the capture of configuration into code
  • (Sorry about the name; the Feature module makes Feature modules which have things)
  • Feature modules have Core exportables: content types, permissions, input filters, menu items
  • Contrib support: contexts, views, ImageCache, Ctools (panels, feeds, etc.)
  • Features is a system to capture the various components that describe how your site behaves
  • Features should be used throughout the development process - you can take a live site and capture existing features, but it requires you to change your thinking about how users interact with the site
  • Concepting what's part of which feature, what's shared, etc. gives you stronger features

Making Features

  • Create a Feature: you can add components, cycle through various elements, clickthe ones you want in your module
  • Features come as a nice tarball - turn it on in your website, you get all the stuff that comes with it
  • But then people start changing the view - you can see the status in the Features module (has it been changed?)
  • If something has been changed, it'll show you what
  • "Recreate" button will give you another tarball, with the current state of things

Create, Update, Revert

  • Drush commands - features, features export, features update, features revert
  • Views changes are made only once, each change has a commit log, if you check it into SVN like you should
  • If you move your development to a real dev environment, and leave the staging site as a staging site (that you can show clients, etc without worrying it broke in the last five minutes) this is good

Distributing Features

  • Are your features appropriate for
  • Is the configuration an IP issue?
  • How can I get that nifty update status thing behind the firewall?
  • If you can't/don't want to send it to, but want to manage it internally over time: Features server
  • Create projects, make new releases, subscribe to updates, etc
  • For automatic packaging, try the Project module
  • Feature server is much simpler, lets you get off the ground fast
  • Based on implicit standards: update status xml, exportables, drush make



Subscribe to Blog