Building a virtual research environment (VRE) in Drupal (in under 5 hours)

I recently sat down with a fresh Drupal install, and by the time the winter sun was setting, I'd made a lightweight custom virtual research environment (VRE) for a collaborative project. It took about eight hours, but if you cut out the e-mail answering, eating, false starts, and rethinking how I wanted to structure things, I bet I could re-do it in under five hours-- and so can you.

It was a familiar story-- two scholars, separated by geography and a time zone, gathering data from a variety of sources to illuminate a common set of real-world things from different angles. They had been passing Word documents back and forth, and had recently switched to Google Docs, but that wasn't doing what they needed, either. They had a data problem.

I've been using Drupal for website projects for years. It has a way of making my programmers sob when they look at the database. Sometimes it refuses to cooperate. I've never done a major version update (I abandoned my 4.x project before 5.x was released, and my 5.x project was winding down when 6.x came out), but I hear it's agony. That said, it's amazing what you can spin up given a single day of work, without writing a line of PHP, and keep running for quite some time with little attention or effort.

The custom VRE is I built very emphatically custom, with content types, taxonomies, and data views specific to a particular project. As such, I don't think it would be a good candidate for an installation profile, but my goal here is to write out the general steps for what I did, so anyone can repeat them with some tweaks for a different project.

Drupal

Drupal 7 has been released, but not all the modules I needed were available. I also found 7.0 to be a little buggy, and the new admin UI threw me for a loop when I first tried i t. Wrangling a major release of Drupal not even a month old yet would've taken far too much of my day, so I opted for Drupal 6; the latest stable release was 6.20.

Modules

I installed the following modules:

  • Administration menu: I find it makes it easier to jump between configuration sections
  • Content Construction Kit (CCK): a must for defining custom content types; I enabled Content, Node Reference, Option Widgets, and Text
    • FileField: the data included audio files, and I wanted to have an audio upload field
    • FileField Paths: to make the /sites/default/files directory a little less messy
    • ImageField: the data included images
  • ImageCache: for resizing images; also enable ImageCache UI
  • ImageAPI: a prerequisite for ImageCache; also enable ImageAPI ImageMagick
  • Glossary: provides tool-tip full text for abbreviations (or other text that could use further explanation)
  • Core: the core modules I enabled were Database logging (in case any debugging was needed), Help (didn't see a reason to turn it off), Menu (to create custo menus), Path (to rename URLs), and Update status (for my administrative convenience, to be able to tell when to update modules). I turned off Comment-- this was for a private VRE between two people, and adding comments wouldn't have been the easiest way for them to communicate with each other
  • Token: another essential module, necessary for a variety of other modules
  • Pathauto: to allow for automatic renaming of URLs
  • Automatic Nodetitles: to save the users from having to name every bit of content
  • Revision All: to automatically save each update as a revision, so the users can go back and revert if something gets messed up
  • Views: I went with version 6.x-3.x-dev; I thought the ability to expose a "sort by" option to the user would come in handy (I ended up not needing it), but 6.x-3.0-alpha3 has some annoying bugs that have been sorted out in the dev version. Be sure to enable Views UI, too.
  • Reverse Node Reference: This module lets you pull data from nodes that refer to the node you're focused on. It's a lot easier to understand with an example-- look at the "Plant list" View, below.
  • Auto Menu: automatically adds nodes of a given content type to a menu.

(A note for Drupal newbies: to install a module, download it from Drupal.org, unzip it, and put the whole unzipped folder into sites/all/modules. If there isn't a sites/all/modules folder, create one. Then go to [your site URL]/admin/build/modules/list, check the boxes for the modules you want to install, and hit "Save configuration" at the bottom.)

Modules I tried and removed

  • CTools + Panels: I just disabled these, rather than removing them; I suspect they're going to be essential when it's time to organize and present some of the data for public consumption, but I ended up not needing them just yet. Others might use them to create a "dashboard" or the like.
  • Image: I started off with Image, but then realized there were two kinds of images that the users would be uploading, with different fields necessary for each of them. Since Image generates a single content type ("Image"), it was easier to uninstall Image, install ImageField, and create two different content types, each with an image upload option.
  • CKEditor: I've used CKEditor for WYSIWYG on other sites, but when I copied and pasted data from the Word files I had been given, the results were ghastly-- even when using the "copy from Word" feature (which I expect would be used inconsistently at best by the users). The sample data I had been given suggested that there wasn't any use of text formatting (bold, italics, etc.) that couldn't be taken care of using CSS later on, so it seemed like the best way to eliminate crap from Word was to just provide a plaintext box.

Content types

The data collected in this VRE centers around plants-- words for plants, recipes involving plants, how plants are collected, etc. The public presentation of the material will also be primarily organized by plant. As such, almost all the data needs to be linked to one or more plant entities.

Plant

There's a number of ways you could accomplish this kind of data linkage in Drupal; the two that most immediately come to mind are Taxonomy and Node Reference. I used Taxonomy for various other kinds of data (see below), but I chose Node Reference to associate other content types with a plant. The feature that made the difference in my decision was that, when you configure a Node Reference, you can select a View whose output will determine what gets listed for people to choose from. If I went with Taxonomy, the users would only be able to select from the Taxonomy terms (which would probably be the scientific names). That's less usable for the non-botanist, who's more familiar with the common names of the plants. By creating a "Plant" content type, which contains a field for the common name, I could set up a View that lists both the scientific and common name of the plant, so both botanists and non-botanists will find something familiar in the list.

Fields

I created a new content type called "Plant". Under Submission form settings, I changed "Title" to "Scientific name" (which just changes what text is displayed to the user, nothing more), and removed the text from the "Body" field. In the future I might turn it into something like a notes field (by putting "Notes" in "Body field label"), but for now all I needed was a scientific name and a common name.

Saving that, I went into Manage fields, and created a new field labeled "Preferred common name" (field_preferred_common_name), set it to "Text", and chose "Text field". In the subsequent configuration screen, I was tempted to make it required, but it occurred to me that perhaps the botanist might create a plant, not knowing what the preferred common name was, and the non-botanists might add it in later. If I made it required, that would be problematic, so I stuck with the default settings.

The botanist also wanted to be able to add (botanical) taxonomic information about the plant, so I went to Content Management > Taxonomy (using the Admin Menu at the top), then Add Vocabulary, and created a tagging vocabulary associated with the Plant content type. I figured it'd be easier for her to freely add terms, then arrange the terms hierarchically later on, either by going to individual terms, going to the Advanced setting, and specifying the parent, or using one of the Drupal modules that let you do it more easily in a drag-and-drop way. The botanist was pleased that she could click on any of the terms after she'd saved a plant, and it'd show all plants tagged that way.

To look at the full field list for a few of the content types I created, see 'Diagram', below.

Associated content

All said and done, I made six other content types, each with a different set of fields. The users gave me eight kinds of content, but three of them were identical in terms of what data the users would be filling out, so I consolidated them into a single content type, with a Text + Check Boxes/Radio Buttons field that listed three options they could choose from to indicate which specific kind of content a given piece of data was. A few notes on the kinds of fields I created:

  • Every content type had a Node Reference field that references the Plant content type. At the bottom of the configuration page for the field (where it sends you as soon as you create the new field), I went into Advanced - Nodes that can be referenced (View), and selected a View I created that would show the preferred common name of plants, followed by the scientific name. See below for how I set up that View.
  • When configuring image fields, I went to Title text settings and checked the box for "Enable custom title text" and chose "textfield" for the entry. My plan is to use the Image Caption module (which pulls from the title) for image captions on the public version of the site.
  • I created a general tagging taxonomy and associated it with a number of content types where there might be some metadata the users want to capture, without planning out a list of taxonomy terms in advance (e.g. something relevant for ceremonies, or for pregnant women, etc.)

Pathauto and Automatic Nodetitles

After you're done setting up your content types, but before you put in any non-test content, you should set up your Pathauto (Site Building > URL Aliases > Patterns). By default, everything gets saved at "content/[node title]", but you can change that for each content type, taxonomy, and user path. The path carries more weight than just a vanity URL: Blocks (which can sometimes be derived from a View) can be configured to appear only on pages with a certain path. Batch changing the URLs of pages is a little bit of a hassle-- you need to install Views Bulk Operations.

Sometimes it makes more sense to automatically generate titles for nodes, instead of making your users think up a title for a node containing content that doesn't have an obvious "title". In some cases, you can change the display name of the required Drupal node title (under Submission form settings when you're editing a content type) to something meaningful for your user. For the Plant content type, each plant has a unique scientific name, so I re-named "Title" to "Scientific Name". If it makes more sense for the node title (which is important, because it'll appear in the admin list of content, etc.) to be some combination of fields, it's a job for Automatic Nodetitles. When you're editing a content type, at the very top of that form there's a toggle-down for Automatic title generation if you have the module enabled. (It's easy to overlook.) I've always gone with "Automatically generate the title and hide the title field", but you can also make automatic title generation the fallback plan if the user doesn't put in a title. Enter the form you want the title to take, using some combination of the replacement patterns (available in a toggle-down below the text field) and whatever other character data you want to use.

Diagram

Here's a diagram of four of the content types, their relationship to one another, and how the titles are being generated when Automatic Nodetitle is being used. Note the shared fields (that appear in the same color in different content types), and the fact that the non-plant content types are referencing Plant, by means of a View that lists the common name, rather than the default setup which would show the scientific name (i.e. node title.) Click for a larger version, on Flickr.
Drupal VRE CCK diagram

References

All the information in the VRE needs to be traceable back to its source. To accomplish that, there's a reference field in all the supplementary content types. But writing out the full citation every single time is asking for inconsistency and trouble-- if, halfway through, they decide they want to switch from Chicago Manual of Style citations to MLA, it means either a lot of tedious hand-correction or wrangling some SQL queries behind-the-scenes.

The solution I went for involves the Glossary module.

Go to Content management > Taxonomy > Add vocabulary and create a new vocabulary. Don't associate it with any content type, or enable Multi-select or Tags or anything-- just a bare-bones vocabulary. Then, go to Site Configuration > Input formats. Edit your default content type, and under Filters, check the box for "Glossary filter". (You can associate a different input format with your glossary, or multiple input formats, but your user will have to remember to change the input format for the reference field to one that has the glossary enabled, if you choose something other than the default format. They can do that by clicking the "Input formats" toggle-down underneath the text box.)

Next, go to Site Configuration > Glossary Settings. You'll see the name of the input format(s) you've enabled for Glossary in tabs across the top. Select the input format, then under Input format settings, check the box for the vocabulary you're using for the Glossary. Configure the rest of the settings as you see fit.

To populate the Glossary, I asked the users to come up with-- and use-- standard shortened citation forms (abbreviations based on the title, or author-date, or whatever else, so long as it's consistent). They add these citation form to the taxonomy vocabulary associated with the Glossary, with the shortened form as the "term" and the full citation as the "description". When text in the reference box matches a taxonomy entry in the Glossary vocabulary, the user can hover their mouse over it and the full form will appear.

To make it easier for the users to add new references, I added a link directly to the taxonomy vocabulary associated with the Glossary in the main menu.

Images

There are places on the site (e.g. the list of plants) where displaying a thumbnail version of a plant image makes more sense than using the large, full-size image. First, make sure that the image modules listed above are enabled. Then, go to Site configuration > Image API, hit Configure, and enter the path to the ImageMagick convert binary. (The server I was using already had ImageMagick installed.) Then, go to Site building > ImageCache > Add new preset. Give the preset a name; then, choose "Scale and crop". For the thumbnail, I set the maximum width and height to 100 px. You can create as many presets as you'd like. If you decide later that you want to re-configure a preset, go to Site building > ImageCache > List and click on "Flush" for the preset you've changed. It'll automatically re-generate the images according to the new settings.

You can use the presets as part of the node display, by going to Content management > Content types > Edit [content type with the image field in question] > Display fields. In the drop-down for the image field, select the preset you'd like to use for the Teaser and/or the Full node. You can also use the image presets when you add an image field to a View (see below).

Views & display

To make the site easier to use, I turned off the users' access to all the admin pages, including the content list (Content Management > Content) which I personally tend to use for all my content management. Consequently, the users only really access the content using a series of Views.

Plant list

Fields
I created a view, with the View type "node", and for Fields selected Node: Title (which, for the plants, is the scientific name) and Content: Preferred common name. (An aside: the Node/Content distinction can be a little confusing at first-- anything built into Drupal (title, body, content type, date published, etc.) is under Node; any custom fields are under Content, even though one might be inclined to think of the title and body as "content". Even after years of using Drupal, I still look in the wrong place sometimes.) Generally, I get rid of the Label (towards the bottom of the field configuration form) because when using most of the Styles, this results in something ugly like "Field Name: Content". I rendered this one as a Table, though, and for Tables you want there to be titles, because those become the table headers. I did change the Widget Label to "Custom" for "Preferred common name", and shortened it to "Common name". For "Common name", I unchecked "link field to node", since the scientific name was already linked to the node. I added one more field, Node: edit link so the users could jump right to editing the plants from that view.

Basic settings
I clicked on Use Pager, and chose "Display all items"-- there weren't going to be that many plants, and I didn't want the users to have to flip through multiple pages.

Style
I then went to Style Settings, clicked on Unformatted (the default style) and changed it to Table. I made both Scientific Name and Common Name sortable. (If you want to get back to the configuration screen after you've saved it, click the cog to the right of the Style; clicking on the name of the Style, which should now be Table will let you change the Style, but not configure the current Style.)

Filter
The preview output at this point is terrible-- titles of things that aren't plants, and a bunch of empty "Common name:" lines. Under Filters, I chose Node: Type, then "is one of" and selected "Plant", resulting in the output being limited to plants.

Page display
On the far left, I selected Add display for a Page display type. I already had everything configured the way I wanted it, so I didn't have to do any page-specific overrides. (See "User alerts", below, for that.) On the bottom left, there's a Page Settings box; be sure to click on Path to select the URL where you want the page to be, otherwise Views will complain. I also went to Menu in that same box, specified "Primary links", and gave it a title for the menu.

Bonus: pulling data from a different content type
I decided it'd be neat if I could include a photo for each plant, when available. The additional complexity involved isn't trivial, but it's a good illustration of (reverse) references in Views.

Relationships
I added Content reverse reference: Plant, and under Delta selected 1-- that way, if there's multiple images for a plant, it'll only display the first one.

Fields
I added the field referring to the image, and for Relationship (at the top of the configuration screen for the field), I selected the relationship I'd created. Under Format (towards the bottom of the configuration screen), I chose "thumbnail image linked to node". I also checked the box for "Hide if empty".

Advanced settings
At this point, the images should be there, but there will be a lot of duplicate plant entries (one, I think, for every piece of content associated with the plant). I went to Advanced settings > Query settings and checked the box for "distinct". This eliminated all the duplicates.

Data by plant

When the user is viewing the page for a plant, I wanted to be able to show them all the data that's associated with the plant. Here's the View settings I used to create a block that would appear on each plant page with that data.

Basic settings
Title: Materials for %1 [pulls from Relationship]
Fields
Node: Nid [in Style Settings, check Exclude from display]
Node: Type [in Style Settings, check Exclude from display]
Node: Title
Node: Teaser
Style settings
Style: Unformatted [for Style options, select Node: Type for Grouping field; this will get all the content of the same type to show up grouped together]
Relationships
Content: Plant [check Require this relationship]
Arguments
Node: Nid [select Plant for Relationship]
Filters
Content: Plant [select Plant for Relationship]
Display
Add a Block display, and on the block admin page (Site building > Blocks), put the block in one of the regions, and configure it (under Page specific visibility settings) so it only appears on nodes with whatever path you configured for the Plant content type in Pathauto.

Supporting collaborative work

When one user has a question for the other user, they currently call each other up or send an e-mail, which has gotten to be pretty inefficient. I did two things on the site to make that process easier:

  1. Alert field: I created a taxonomy and populated it with the names of the users. (In retrospect, I'd do this with a Text + Check Boxes/Radio Buttons field-- there's nothing to gain from the Taxonomy menu here, there's a small number of collaborators, and the names always have to show up with the other Taxonomy fields, even if I'd prefer them somewhere else.) I also created a text field where the user can put in their question. I explained that if a question arises, all the user has to do is select the name of the person who should look at the problem, when editing the node, and put their question in the text box. When the person has looked at the problem, they just need to de-select their name. There's a View set up for each user (it includes fields for the node title, updated date, the alert text field, and it has a filter for the name of the person being alerted) where they can see all the questions that have been assigned to them.
  2. On-line status: I went to Site Building > Blocks and put the "Who's on-line" block (which is built into Drupal core) in the footer. That way, if multiple users are on-line at the same time, they will know that "You can't edit this node because it's been modified by another user" messages are probably legit, and not a weird technical error.

Theme

I'm really not a designer, and I hate doing Drupal theming. The Drupal-default Garland theme isn't ghastly (and it's easy to re-color for that "sorta custom but not really at all" feeling). I often enough go with some color of the default, when the client really, truly doesn't care about how it looks, but for this project I wanted something that didn't scream "Drupal". I was looking for a theme that was simple, clean, and had a reasonable selection of regions for blocks (header, footer, left sidebar, right sidebar, content, and content bottom).

This turned out to be a harder task than anticipated.

There are over 500 themes on drupal.org for Drupal 6.x. Even when sorting by popularity-- on the hope that the "wisdom of the crowds" would provide a filter against utter dreck-- I found a lot of base themes that would require extensive customization to look decent, themes that were designed for marketing sites, themes without sidebars, and themes that were buggy and/or just plain ugly.

I ultimately ended up going with Corolla, and putting a modified Navigation menu block in the Header Menu region. Corolla has the benefit of offering drop-downs for child menu items for whatever menu block you put in that region, which has been very handy for the users.

Menus

I ended up using the default Navigation menu as my main menu, but disabled all the default menu items except "Add content" (renamed simply to "Add"). Then I went back to all the other major pages-- the View for alerts for each contributor, a View showing all the linguistic data (not described here because it's really boring-- all the fields from the Linguistic content type, arranged in a table)-- and added them to the Navigation menu. I also added a link to the Taxonomy admin interface (just using "Add item" in /admin/build/menu-customize/navigation) so they could easily arrange their biological taxonomy and language terms, and a link directly to the taxonomy that the Glossary module uses, so they can add new references.

There's also a menu item, Plants, that goes to the list of plants; it has child menu items for each individual plant. Instead of the users having to remember to add new plants to the menu, I configured Auto Menu to do it. When editing the content type in question (in this case, Plant), under Workflow settings, you can select the parent item that Auto Menu should add each node to.

Blocks

I didn't do much with blocks (Site building > Blocks), other than assigning the "data by plant" view block to the content area for each plant page (see above). In the footer area, I put the default Drupal "who's on-line" block and the "user login" block.

Screenshots

Click to see a larger version on Flickr.

The front page of the site, which uses the plant list view.
Drupal VRE

The plant profile for Campanula Rotundifolia, using the Data by Plant view. Note how the nodes are grouped by type. Under Reference, USDA is underlined because hovering over it will pop up a tooltip from the Glossary.
Material by plant

Project: 

Tags: 

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.