Do It With Drupal: Drupal In The Cloud

Josh Koenig
drupal.org/user/3313
josh - at - chapterthree.com
getpantheon.com
About the cloud

"Cloud" as new model for hosting
Traditional hosting = real estate (rack space)
Most real estate customers are renters, few love their landlord - landlords sometimes cut corners and do the bare minimum to keep you happy... but you need this
Owning comes with lots of responsibilities and hidden costs
Large scale projects are expensive, slow, and prone to setbacks
"The Cloud" = hosting as an API: on-demand availability
Hourly pricing
Reliable, reusable start-states: people make mistakes vs. programs that do things and you know exactly what they're going to give you
You can say: I want a new server, here's the distro, here's the information, here's the configuration - and I want five of them
The cloud = less waste, more freedom, flexibility... but not a silver bullet
Performance can vary (don't use it for scientifically accurate benchmarks)
Abstractions aren't the same as the real thing (not the same as physical servers - but for what it's worth this hasn't been a problem for Drupal)
New tricks to learn - power of API
The Cloud is Drupal's destiny - increasing Drupal's reach; you can start with pennies, scale to millions
Create products cheaply
Grow organically, but still grow fast

Launch a server in the cloud

ElasticFox - Amazon control panel for Firefox
Amazon just added locations for US west coast
Pantheon project: create images for cloud services that are targeted towards Drupal
Three images: high performance production hosting image (all the tricks already done), another for an Aegir, another for a continuous integration environment for Drupal
Grand vision for world-class Drupal infrastructure for pennies an hour
High performance production has the most work since people have been the most interested
Ubuntu 9.04 base config, whole LAMP stack, Pressflow pre-installed, memcached, APC, all of it is already there
Can monitor processes, do everything you like to do as root
v0.8.1 beta - but people are using it in production (in spite of disclaimer)

Who are the cloud providers

AWS: most mature, a lot of features, still moving quickly, added a load balancer earlier in the year; they're a utility, not interested in your particular use case; they don't tell people what they're working on or how it works
AWS has infrastructure for giving away free images - most don't
Rackspace - has Rackspace Cloud Sites (you don't get root, you put your Drupal in there, they scale it for you with mixed results); scaling any particular site requires deep knowledge of it; Rackspace Cloud Servers is better (Slicehost is built on top of Rackspace Cloud Servers)
Rackspace is looking to break into the space; willing to do deals, talk to you, etc
Voxel: smaller/smarter, also in Asia; cloud product just emerging from beta, but it's good - also lets you intermingle cloud and physical infrastructure
And more every day!
VPS is becoming quite cloudy (linode.com, slicehost, vps.net)
Custom/managed cloud services (security, regulatory compliance issues - people will build a cloud for you: Eucalyptus, Neospire, others)
Cloud value-adders: Rightscale, Scalr - cloud/cluster management services
Cloudkick - cross-cloud services, managing different cloud providers (want to be able to move servers from one service to another); it's free; open-source LibCloud project to prevent people from getting locked into one provider
Cloud tools for Drupal - getpantheon.com

Questions

How do you do a cost-analysis? You probably won't see the financial benefits right away, if you're going to leave it on all the time. But scaling with changing use patterns, adding/removing new instances.
Cost/benefit comes in disk speed performance - most cloud providers have poorer I/O performance than a physical server
How do you solve that problem for Drupal? - All performance/scalability work is about making Drupal do less work
Oriented around Drupal doing only what it needs to, and not bogging it down with things like showing the user the same page he saw a minute ago
Database replication for read-only queries
Use other tools that are better at repeated-action type jobs for those things

What is it good for

Testing/continuous integration
testing.drupal.org (Drupal testing Drupal) - not in the cloud, but will soon release cloud image of it
People can spin these up if Drupal finds itself in a testing bottleneck, just for the day
Development infrastructure: new server for each site
Putting things like version control (unfuddle, beanstalk)
Products and services: Lefora (forums), crowdfactory, olark (start with pennies, scale to millions)
Database layer for Drupal can be a choke point - you can duplicate it
High availability production hosting: Acquia is on EC2
Most cloud infrastructure isn't cheap at this level (running many servers, keeping them always on-line), if you're really big you'll find yourself at the top end going to traditional managed hosting because there's some levels of performance that are capped by the virtualization layer
Control costs for traffic patterns - geographically centralized audience for most people
Turning things on and off to deal with daily peaks - two more servers only on during the day
Instances fail, though not much more often than real servers (and remember that instances exist on real servers that do break)
Performance can be impacted by other local activity
Virtual disks tend to have relatively poor I/O performance
Accept the inevitability of failure, embrace the paradigm of "rapid recovery", develop architecture with modular, replaceable parts (images for each server), minimize disk/CPU utilization for menial tasks
"RAM is cheap" - the more you can push to things that read/write out of memory, the better

Production hosting in the cloud

Monitor your load - you have to look more carefully than just hits
Spin up more instances (scale horizontally) as you need more power
- How does this work?
- Could be manual process ("we need a server, let's do it") - does need some manual intervention somewhere, though in theory you could script it
- Amazon offers an auto-scaling feature (when we need more, add servers, up to X number of features - Amazon AutoScale)
- AutoScale is simple (doesn't cost anything, too)
- How does this work? How do the pieces work together?
- You need to have an image with all the pieces needed at the system level; you should use version control and have a boot script as part of the image (when the image start, the script checks out the current code base from the database and all the necessary connections), then AutoScale makes the pieces aware of what's out there
- You can also do load balancing more manually
- Role of sysadmin is changing - new set of things where now you don't have to worry about hard drives, but scaling up/down, saving money
- When you're doing horizontal scaling, you trigger your image to be built, it checks out the code; Amazon also offers virtual drive service (if you're working with an application with a lot of data in file system) - can connect that data quickly
- Bake in as much as you can to the image, then have automatic processes that fire that get the latest information, check it into infrastructure, start distributing load there
Add layers (scale vertically) when bottlenecks emerge
Create images for each layer in your infrastructure
Use best practices to keep things speedy

About best practices

Front-side caching: use Pressflow with Varnish and/or NgineX (Drupal 7 will support some of this natively)
Drupal is slow: complex, wonderful, brainy tool - if you're looking at the same thing over and over again, go get a tool that does only that, and quickly
Use APC and/or Memcached to minimize queries and the database to eliminate costly unserialize() calls
Drupal's native caches are good, but it does it in the database (this isn't the highest performance option, serializing/unserializing big arrays/objects)
Architect for vertical scaling by utilizing all service layers, even if it's one box
This is what "Mercury" is about
CREAM: Cache rules everything around me

Mercury

Freely available on Amazon, as VMWare image, in as many ways as we can
Also on-demand as a service
"Drupal hosting, 200 times faster"
Standardized high-performance stack: single server image with everything you want for cluster infrastructure
Features: Varnish, HTTP/PHP, APC Cache, Apache Solr, MySQL
Make Drupal run fast, hold up under large traffic spikes
From one box to cluster
If you're running all four layers and are still falling down, or you're doing something horribly write (Twitter) or horribly wrong (all code embedded in php content nodes)

Questions

Mercury: going to implement configuration management system (BCFG2, probably)
Mercury/Pantheon - not Amazon-centric, can roll the stack out anywhere (physical hardware, whatever)
You'd probably make your own variant image, and sync as necessary using the configuration management system
If you haven't customized things heavily, you can take the latest version of Mercury, re-apply changes, and you're done (if you don't want to use the config management)
You can keep old images around for pennies a month