Josh Koenig
drupal.org/user/3313
josh - at - chapterthree.com
getpantheon.com
About the cloud

  • "Cloud" as new model for hosting
  • Traditional hosting = real estate (rack space)
  • Most real estate customers are renters, few love their landlord - landlords sometimes cut corners and do the bare minimum to keep you happy... but you need this
  • Owning comes with lots of responsibilities and hidden costs
  • Large scale projects are expensive, slow, and prone to setbacks
  • "The Cloud" = hosting as an API: on-demand availability
  • Hourly pricing
  • Reliable, reusable start-states: people make mistakes vs. programs that do things and you know exactly what they're going to give you
  • You can say: I want a new server, here's the distro, here's the information, here's the configuration - and I want five of them
  • The cloud = less waste, more freedom, flexibility... but not a silver bullet
  • Performance can vary (don't use it for scientifically accurate benchmarks)
  • Abstractions aren't the same as the real thing (not the same as physical servers - but for what it's worth this hasn't been a problem for Drupal)
  • New tricks to learn - power of API
  • The Cloud is Drupal's destiny - increasing Drupal's reach; you can start with pennies, scale to millions
  • Create products cheaply
  • Grow organically, but still grow fast

Launch a server in the cloud

  • ElasticFox - Amazon control panel for Firefox
  • Amazon just added locations for US west coast
  • Pantheon project: create images for cloud services that are targeted towards Drupal
  • Three images: high performance production hosting image (all the tricks already done), another for an Aegir, another for a continuous integration environment for Drupal
  • Grand vision for world-class Drupal infrastructure for pennies an hour
  • High performance production has the most work since people have been the most interested
  • Ubuntu 9.04 base config, whole LAMP stack, Pressflow pre-installed, memcached, APC, all of it is already there
  • Can monitor processes, do everything you like to do as root
  • v0.8.1 beta - but people are using it in production (in spite of disclaimer)

Who are the cloud providers

  • AWS: most mature, a lot of features, still moving quickly, added a load balancer earlier in the year; they're a utility, not interested in your particular use case; they don't tell people what they're working on or how it works
  • AWS has infrastructure for giving away free images - most don't
  • Rackspace - has Rackspace Cloud Sites (you don't get root, you put your Drupal in there, they scale it for you with mixed results); scaling any particular site requires deep knowledge of it; Rackspace Cloud Servers is better (Slicehost is built on top of Rackspace Cloud Servers)
  • Rackspace is looking to break into the space; willing to do deals, talk to you, etc
  • Voxel: smaller/smarter, also in Asia; cloud product just emerging from beta, but it's good - also lets you intermingle cloud and physical infrastructure
  • And more every day!
  • VPS is becoming quite cloudy (linode.com, slicehost, vps.net)
  • Custom/managed cloud services (security, regulatory compliance issues - people will build a cloud for you: Eucalyptus, Neospire, others)
  • Cloud value-adders: Rightscale, Scalr - cloud/cluster management services
  • Cloudkick - cross-cloud services, managing different cloud providers (want to be able to move servers from one service to another); it's free; open-source LibCloud project to prevent people from getting locked into one provider
  • Cloud tools for Drupal - getpantheon.com

Questions

  • How do you do a cost-analysis? You probably won't see the financial benefits right away, if you're going to leave it on all the time. But scaling with changing use patterns, adding/removing new instances.
  • Cost/benefit comes in disk speed performance - most cloud providers have poorer I/O performance than a physical server
  • How do you solve that problem for Drupal? - All performance/scalability work is about making Drupal do less work
  • Oriented around Drupal doing only what it needs to, and not bogging it down with things like showing the user the same page he saw a minute ago
  • Database replication for read-only queries
  • Use other tools that are better at repeated-action type jobs for those things

What is it good for

  • Testing/continuous integration
  • testing.drupal.org (Drupal testing Drupal) - not in the cloud, but will soon release cloud image of it
  • People can spin these up if Drupal finds itself in a testing bottleneck, just for the day
  • Development infrastructure: new server for each site
  • Putting things like version control (unfuddle, beanstalk)
  • Products and services: Lefora (forums), crowdfactory, olark (start with pennies, scale to millions)
  • Database layer for Drupal can be a choke point - you can duplicate it
  • High availability production hosting: Acquia is on EC2
  • Most cloud infrastructure isn't cheap at this level (running many servers, keeping them always on-line), if you're really big you'll find yourself at the top end going to traditional managed hosting because there's some levels of performance that are capped by the virtualization layer
  • Control costs for traffic patterns - geographically centralized audience for most people
  • Turning things on and off to deal with daily peaks - two more servers only on during the day
  • Instances fail, though not much more often than real servers (and remember that instances exist on real servers that do break)
  • Performance can be impacted by other local activity
  • Virtual disks tend to have relatively poor I/O performance
  • Accept the inevitability of failure, embrace the paradigm of "rapid recovery", develop architecture with modular, replaceable parts (images for each server), minimize disk/CPU utilization for menial tasks
  • "RAM is cheap" - the more you can push to things that read/write out of memory, the better

Production hosting in the cloud

  • Monitor your load - you have to look more carefully than just hits
  • Spin up more instances (scale horizontally) as you need more power
    • How does this work?
    • Could be manual process ("we need a server, let's do it") - does need some manual intervention somewhere, though in theory you could script it
    • Amazon offers an auto-scaling feature (when we need more, add servers, up to X number of features - Amazon AutoScale)
    • AutoScale is simple (doesn't cost anything, too)
    • How does this work? How do the pieces work together?
    • You need to have an image with all the pieces needed at the system level; you should use version control and have a boot script as part of the image (when the image start, the script checks out the current code base from the database and all the necessary connections), then AutoScale makes the pieces aware of what's out there
    • You can also do load balancing more manually
    • Role of sysadmin is changing - new set of things where now you don't have to worry about hard drives, but scaling up/down, saving money
    • When you're doing horizontal scaling, you trigger your image to be built, it checks out the code; Amazon also offers virtual drive service (if you're working with an application with a lot of data in file system) - can connect that data quickly
    • Bake in as much as you can to the image, then have automatic processes that fire that get the latest information, check it into infrastructure, start distributing load there
  • Add layers (scale vertically) when bottlenecks emerge
  • Create images for each layer in your infrastructure
  • Use best practices to keep things speedy

About best practices

  • Front-side caching: use Pressflow with Varnish and/or NgineX (Drupal 7 will support some of this natively)
  • Drupal is slow: complex, wonderful, brainy tool - if you're looking at the same thing over and over again, go get a tool that does only that, and quickly
  • Use APC and/or Memcached to minimize queries and the database to eliminate costly unserialize() calls
  • Drupal's native caches are good, but it does it in the database (this isn't the highest performance option, serializing/unserializing big arrays/objects)
  • Architect for vertical scaling by utilizing all service layers, even if it's one box
  • This is what "Mercury" is about
  • CREAM: Cache rules everything around me

Mercury

  • Freely available on Amazon, as VMWare image, in as many ways as we can
  • Also on-demand as a service
  • "Drupal hosting, 200 times faster"
  • Standardized high-performance stack: single server image with everything you want for cluster infrastructure
  • Features: Varnish, HTTP/PHP, APC Cache, Apache Solr, MySQL
  • Make Drupal run fast, hold up under large traffic spikes
  • From one box to cluster
  • If you're running all four layers and are still falling down, or you're doing something horribly write (Twitter) or horribly wrong (all code embedded in php content nodes)

Questions

  • Mercury: going to implement configuration management system (BCFG2, probably)
  • Mercury/Pantheon - not Amazon-centric, can roll the stack out anywhere (physical hardware, whatever)
  • You'd probably make your own variant image, and sync as necessary using the configuration management system
  • If you haven't customized things heavily, you can take the latest version of Mercury, re-apply changes, and you're done (if you don't want to use the config management)
  • You can keep old images around for pennies a month