<p>Rob <a href="mailto:robpurdie@economist.com">robpurdie@economist.com</a> - Scrum Practice Leader<br /> twitter.com/robpurdie<br /> facebook.com/robpurdie</p> <p><strong>Overview</strong></p> <ul> <li>Moving incrementally and iteratively to Drupal- making improvements as you move bit by bit</li> <li>User comments and recommendations served from Drupal, along with comment history pages, article comments pages</li> <li>Syncing data to Drupal every 5 minutes-- all content and comments</li> <li>Soon, article pages served from Drupal-- running into a few performance problems</li> <li>Next: channel pages served from Drupal, third-party services, registration</li> <li>We benefit from Drupal sooner by taking this approach; rather than building the whole site in the background and not benefitting until the end, this way we benefit from improved functionality sooner</li> <li>"The Economist is so old that the guy who started it had to be painted rather than photographed"</li> </ul> <p><strong>The old way</strong></p> <ul> <li>20-30 mil page views, 3-4 million unique visitors per month - lots of performance and scalability issues</li> <li>Want to build the foremost destination online for analyzing and debating global agenda; want to bring visitors into that debate; current system isn't enough to support this vision, that's why they moved to Drupal particularly for comments</li> <li>Increase publishing volume with user-generated content (more content w/o more costs)</li> <li>The old way: custom CMS built on proprietary stack (MS, ColdFusion, Oracle)</li> <li>Blogs were originally MovableType, now are all Drupal</li> <li>Broken waterfall processes meant frequent fire-fighting</li> <li>Needed to be more responsive to change, deliver business value sooner (projects take a long time to deliver value to organization), more sustainable, happier</li> <li>Making these changes incrementally and iteratively; "perfect is the enemy of better"</li> </ul> <p><strong>Why Drupal?</strong></p> <ul> <li>Looked at OpenCMS, Alfresco, Joombla, met with other newspapers, considered building a custom system, buying a proprietary system, or going open source</li> <li>Drupal as strategic fit: community and content publishing, robust development framework, development language, free software</li> <li>Strength of Drupal community</li> <li>Selling Drupal internally was a challenge: no suit-wearing Drupal sales force</li> <li>Attended DrupalCon Boston 2008, networking within community, engaging w/ Lullabot for workshops and training</li> <li>Proof-of-concept to reproduce article page in Drupal; how to use CCK fields to make a rich article content type</li> </ul> <p><strong>Using Scrum</strong></p> <ul> <li>3 million registered users, articles - data migration is daunting</li> <li>Manage the move using Scrum - selling it was easy with charts (developing business value sooner and throughout, management can see progress throughout, shining a spotlight on issues/dysfunction and attacking them along the way - risk decreases a lot faster)</li> <li>Take requirements, prioritize based on business value: which are the most important to organization, do those first</li> <li>Trained management team in Scrum, development team in Drupal, then started sprinting with help from consultants (2-week sprints, delivering something of value at the end)</li> <li>"Maybe not the largest Drupal project, but the most expensive" - lots of consultants</li> </ul> <p><strong>Integrating CMS's</strong></p> <ul> <li>Proxy approach: Drupal sends JSON over HTTP back and forth with Existing ColdFusion system</li> <li>Using native Drupal comments; comments have to be attached for nodes - there has to be a node for every piece of content on the legacy system</li> <li>Create nodes on the fly for every ColdFusion request that comes in</li> <li>Notion of proxy nodes is a pattern that comes up during integration of Drupal with other systems</li> <li>Voting API votes used for recommends; these are also attached to proxy nodes</li> <li>Started with proxy approach only; then moved to doing some with subdomain approach - hope to be doing neither soon after moving entirely to Drupal</li> </ul> <p><strong>Migrating data</strong></p> <ul> <li>Migrating and syncing data every 5 minutes - don't wait until the end to figure out that piece</li> <li>Table Wizard and Migrate modules</li> <li>Table Wizard writes Views integration for MySQL tables</li> <li>Migrate lets you migrate certain views, push into Drupal as nodes/users/taxonomy terms/etc</li> <li>Client is involved in how legacy data gets organized in Drupal</li> <li>Sat down with client to browse through content and decide what data needs to be moved and what it means</li> <li>Migrate keeps track of everything you've done, gives you a dashboard, tells you how far along you are - keeps a mapping table, legacy ID, you can check and see what came across and fix things; does your bookkeeping for you</li> <li>Drupal expects to have all the info it needs in its database; something getting published in Oracle needs to be in Drupal promptly - synchronization</li> </ul> <p><strong>Questions</strong></p> <ul> <li>How did you decide what to put into Drupal first? <ul> <li>Business value: comments, user profiles, recommends</li> </ul> </li> <li>How many Drupal servers does it take to scale that big? <ul> <li>Not entirely sure how many servers we have; let's say +/- 12</li> <li>Master MySQL server, a few slave MySQL servers - more important aspects have to do with Pressflow</li> <li>Pressflow = high performance variant of Drupal 6, completely API compatible with Drupal, but it takes some patches that are in Drupal 7 and moves them in to Drupal 6</li> <li>Use Varnish's full capability; Varnish = reverse proxy server, takes load off Drupal/PHP/MySQL</li> </ul> </li> <li>How do you stop people from trying to shove their emergencies into Scrum process? <ul> <li>Don't want people going directly to the team like they traditionally do</li> <li>Team, Scrum Master, product owner - customer, person who represents the client, has to have power to make decisions on behalf of organization, responsible for managing stakeholders</li> <li>Product owner comes to team w/ prioritized list of features for next sprint</li> <li>Had two teams in New York and one team in London all doing 3-week iterations in parallel</li> <li>Split up site into component parts: profiles, article pages, channel pages, had three product owners who had to manage stakeholders</li> <li>Works reasonably well; now we're doing two teams, one system that shows what all teams will do; someone has to keep "product backlog" in order, stopping people from shoving in their "one little thing"</li> </ul> </li> </ul> <p><strong>Features</strong></p> <ul> <li>Base theme is 960 px grid - laying out themes as a series of columns, all sections have to fit into the grid</li> <li>Selenium for "user journey" testing; building environments to help manage configurations</li> <li>Continuous integration using Hudson - needed a shared place where user tests could run</li> <li>Set of servers running on Amazon; Hudson sets off user tests every time there's a commit to the SVN repository</li> <li>Apache SOLR search hosted by Acquia- 100,000k articles that have to be available through site search</li> <li>People were unhappy with relevance of matches in old site search</li> <li>Acquia's hosted search service: really fast, good results</li> <li>Apache SOLR: can start filtering results further and further - faceting</li> <li>"How do I get SOLR running on my website?" - can self-host, but we went with Acquia</li> </ul> <p><strong>Questions</strong></p> <ul> <li>Other tools for managing people/process? <ul> <li>In Scrum, less about resource management - we just want dedicated co-located teams, don't worry about availability because of multiple projects: single focus</li> <li>Redundancy of function - generalizing specialists, specialists can create bottlenecks/risks</li> <li>"How many people need to be hit by a bus before your project fails?"</li> <li>agilemanifesto.org</li> <li>Use Google Docs a lot - project backlogs are all spreadsheets, a big wiki, project dashboards that "radiate information to the rest of the organization"</li> <li>Focus is on people, not tools</li> <li>Test-driven development, writing tests first can sometime be hard with Drupal</li> </ul> </li> </ul> <p><strong>Impediments to progress</strong></p> <ul> <li>Previous processes/structure/culture: command and control - hard habit to break</li> <li>Project manager telling people what to do and when to do it by - this is bad management; it has an impact on people</li> <li>We want self-organizing teams</li> <li>Previously, black box development: low visibility during the project process</li> <li>For Scrum, everything needs to be transparent, frequently inspect outcomes, adapt as we go - can't have a postmortem after everything's done, need to do that every day</li> <li>Hero developers who go off and solve problems heroically aren't compatible with Scrum</li> <li>Previously, developmental silos - departments based on function, these have been removed, but people still want to exist within their old silos</li> <li>People want to work on multiple projects like they used to, rather than working on a single project in a dedicated manner</li> <li>Previously, traditional line management: where you stack up in the line doesn't matter now, this was a big change</li> <li>Engineering practices (specifically quality) - big issue; Scrum is a wrapper for your existing engineering practices, doesn't say anything about testing</li> <li>Scrum assumes your engineering practices are great, or you'll make them great quickly</li> <li>You can say "we're going to do Scrum" but old habits die hard - focusing on what "done" means and providing a deliverable at the end of each sprint, have to deliver quality too-- have to go live successfully</li> <li>Want to deliver "potentially shippable code" at the end of each session - have to have a testing environment that's representative of live environment; been bitten by differences in configuration</li> <li>Everything has to be identical in the test environment (just with a scaled down number of servers) - same data center, same network issues, etc</li> <li>Hard to bite the bullet on the costs involved in building a testing environment, but it's important</li> <li>Hard to simulate kinds of traffic you get in production - plus, have to keep track of session cookies</li> <li>Form fields can hurt you - replaying post requests</li> <li>Cron jobs that run all the time - cron jobs can stack up and site starts to decay</li> </ul> <p><strong>Questions</strong></p> <ul> <li>Migration of real-time data: code changes are easier to migrate than content changes, what's the process for moving bits of content from development to production? <ul> <li>When there's content you need to work on for a while before it goes live, work on the live servers but make sure end-users can't see it</li> <li>Can use the unpublished flag on a Drupal node to do that; use "views" to see everything unpublished in sports category</li> <li>For a small team, that's a reasonable solution</li> <li>For bigger organizations with a lot of people working together, use "Workflow" module - nodes step through a series of states</li> <li>If it's a business requirement that content has to start off on staging servers and only then push to live, use module "Deploy" - push-button way to push nodes and their dependencies-- users, taxonomy terms, etc-- to another environment</li> </ul> </li> <li>Technical reason for using external searching - why use SOLR at all? What about Drupal search? <ul> <li>Drupal 6 is better than previous search mechanisms, but falls apart at a certain scale</li> <li>Slow queries, sub-optimal results</li> <li>A lot of non-Drupal people have worked on Apache SOLR, Drupal has integrated it well</li> <li>Self-hosting, or with Acquia - if you have the talent to run Java apps in your data center and keep it running, self-hosting is a great idea; will reduce latency</li> <li>Most of us are struggling to keep PHP/MySQL up as it is, this is where Acquia comes in</li> <li>Acquia service is pretty much plug-and-play</li> <li>Built-in search doesn't come with facets; can add on facets with the "Faceted Search" module</li> <li>SOLR is an enterprise search system; used by Netflix, Expedia, etc.</li> </ul> </li> <li>Could you use Views instead of facets? <ul> <li>There's a lot of overlap there, and different possible approaches.</li> <li>Full-text searches need SOLR rather than Views</li> </ul> </li> <li>Some of the wins you've had with Scrum/Drupal, and some weaknesses <ul> <li>Wins by development teams - prefer this way of working, where business people are only concerned with relative priority of requirements, have no say in how long it takes to implement</li> <li>Product owners prioritize "stories", developers size those stories relative to each other, rather than in hours of effort</li> <li>Stops the cycle of cutting corners on quality in order to get it done in a shorter timeframe</li> <li>Can't get productivity gains w/o changing the way you work</li> <li>Product owners need to be involved, can't change requirements mid-sprint</li> <li>Have "working agreements" - a kind of social contract</li> <li>Scrum isn't a prescription - you can pick and choose the parts that you want that meet your organization's needs</li> <li>Specific processes layered on top of simple framework of transparency, working together, and adapting to testing results, can vary</li> </ul> </li> <li>When will the Economist be fully on Drupal? <ul> <li>Description says "this month" - that was the plan</li> <li>People paying the bills get to make decisions; is it most important for us to go all-Drupal ASAP, or extend functionality of site to be competitive?</li> <li>Recent decision was for the latter</li> <li>Don't know when</li> </ul> </li> </ul>