Thursday, August 9, 2012

Methods for exporting content from a Drupal website

I just wrote thoughts on a universal import/export format for website content, and there's obviously the risk of reinventing the wheel.  With the help of Google, let's take a look at some existing methods to export content from Drupal.

Node Export module: "This module allows users to export nodes and then import it into another Drupal installation, or on the same site."  Okay, that sounds useful and cool.  Further the module is available for both Drupal 6 and Drupal 7, and can be run at the command line using Drush.  It exports in one of these formats:

  • JSON - JavaScript Object Notation code which is known for being security friendly. (Drupal 7 only)
  • Drupal var export - A Drupalized PHP array which is similar to var_export(). (Drupal 7 only)
  • Node code - A customized PHP array which is similar to var_export(). (Drupal 6 only)
  • CSV - RFC4180 compliant CSV code. Ideal for viewing in Windows software, and editing data as spreadsheets.
  • Serialize - Very robust, though not human readable, representation through Serialization using the PHP serialize function.
  • XML - XML 1.0 representation which is good for machine-readability and human-readability.
Data export import: This is a sandbox module that hopes to be a generic dataset import/export utility.  It's project page is written much too high level and generic to make sense of whether this is useful or not.

Content migration, import and export: This is a groups.d.o group focusing on import/export.  

Comparison of Content and User Import and Export Modules: A nice useful chart listing the various import and export modules for Drupal, and the characteristics.

HTML Export: "HTML Export allows you to take your Drupal site and select paths from it based on criteria to export to HTML. It supports OG, results from Views, per content type, all menu router items, and all nodes as default criteria for publishing to html."  Sounds really close to what I wrote about in the previous post on this blog (see link at the top).  The Drupal 6 version is in Beta, and the Drupal 7 version is in DEV.

Creating a static archive of a Drupal site: A list of methods to generating a pile of HTML pages out of a Drupal website.  For example:

wget -q --mirror -p --html-extension --base=./ -k -P ./ http://example.com

This could be useful, but the export format doesn't give you the CCK fields as individual data items, doesn't give you the revisions, doesn't do a lot of other things.  It occurs to me the Boost module does something like this as well.

Drush CTools Export Bonus: Hmm... "Adds more functionality to the CTools drush bulk export commands. drush ctex modulename will export all known ctools exportables of a site to a module. But there are also other configurations you'd like to have in code. This project fills in that gap with extra configuration and drush commands."  Basically, ctools has a method of exporting exportables, and this extends that functionality.  It exports stuff into a "module".  Sigh.  So much for reusability of the data in other venues besides Drupal.

How to make configuration objects exportable with CTools:  Goes over some grotty programming API details of how to ..blahblah..


Thoughts on a universal export format for websites built with Drupal or other CMS's

This blog exists to record my thoughts on migrating a drupal website to some other blog or website platform.  This problem has been fiddling in the back of my mind for awhile, and here's where I've gotten to.  The first step is to create a bold brilliant universal website export format.  Okay, maybe that's too grandiose.  But what I mean is to create a neutral format to represent website content, and then to write software to create data in that format from a Drupal website.

At first blush the typical thing to do is generate a single file containing all the data in the Drupal website.  For example that is what Blogger gives you when you export a Blogger blog, an XML file containing everything there is about that blog.  I've looked inside that file and found that it even includes the template being used by the blog.

But on reflection a single file is too monolithic and unnecessary.  My thought instead is to create a directory structure containing data files for each piece of content.  The directory structure needs to match the URL's of the website.

For a Drupal website, this will mean a directory named "node" with files containing the content of /node/1, /node/2, etc.  For sites using pathauto and/or the URL alias modules, there would be a set of directories matching the URL structure of the aliases, with files containing essentially a "redirect" over to the related /node/NNN

What would be stored in the file named /node/12345?  Here's where I'm a little unsure.  It could be a directory, containing a series of data files, or it could be a text file.  In Drupal it has to represent the metadata of the node, the node content, as well as all revisions, and all CCK fields.

Something to explore is how to represent attached files - that is, a file uploaded directly on the node.  The file itself would be under /sites/default/files but needs to be related to the node.

This leaves taxonomy and user data.  These could be exported as a data file rather than going to a setup like I just described.

Creating an export format is obviously just the first step.  The next step would be software that deals with this exported data, and imports it into some other format.  Such as an importer into another CMS, or to directly build static HTML websites.  If the export format is well enough designed, and there are importers for a variety of CMS's, it could truly become a universal export/import format for exchanging website content between CMS's.

Thursday, August 2, 2012

Migrating from Drupal to TYPO3

David Kordsmeier, Head of Sales at AOE media San Francisco, speaks at the 2010 TYPO3 conference about why in many cases migrating from Drupal 6 to TYPO3 is the smarter move than migrating from Drupal 6 to Drupal 7.






http://vimeo.com/16538241

Jekyll and converting from Drupal/WP/etc to Jekyll


Jekyll is a static website generator apparently written in Ruby.  http://jekyllrb.com/  What this means is you write page content in some format, then with Jekyll generate an HTML website with navigational stuff etc.  The HTML website then can be served by Apache at high rates of speed.

Jekyll includes a bunch of tools for importing from a variety of other blogging platforms, including Wordpress, Drupal and Blogger.  https://github.com/mojombo/jekyll/wiki/blog-migrations

A nice blog post went over why to leave Drupal, and why to choose Jekyll, and how to do so.  http://www.garron.me/linux/switching-drupal-jekyll-migrate.html

The guy's reason?  His blog had grown pretty popular and was seeing thousands of page hits a day, and routinely getting top rank in Digg.  It meant having to migrate the site from shared hosting to a VPS, and then learn the arcanities of nginx, varnish, etc, all just to host a blog.  Specifically: "As you can see in order to keep a server ready to receive the next tsunami of visits, I had to do too much work keeping Apache, PHP, MySQL, Nginx, Varnish and Drupal up to date and secured, also being sure that all of them could work in sync."

That's too much overhead (voice of experience) for a blog.

The Onion Uses Django, And Why It Matters To Us

Back when I started with Drupal (in the 4.6 or 4.7 days) one of the WOW moments was learning that The Onion (one of America's finest news sites) was being run with Drupal.  I thought, if the Onion was happy with Drupal, then surely it can do things for me.

Turns out that the Onion's tech team have since switched to Django.  The reason was partly because they believe Python to be a better language than PHP, but they also were able to implement a more cleanly designed website.


http://www.reddit.com/r/django/comments/bhvhz/the_onion_uses_django_and_why_it_matters_to_us/

7 reasons to switch from Drupal to Yii

Written by "the CTO of a site that switched from Drupal to Yii in 2010."  It's clear from the article that his focus is on developing "applications" rather than hosting content.


  1. Drupal isn't the best way to avoid starting from scratch
  2. If Drupal is a framework, only Rube Goldberg could love it
  3. Community-contributed modules are prone to featuritis and the bugs that result from unneeded complexity
  4. Drupal has PHP 4 baggage
  5. Don't want Drupal 6 or 7 to slow down a Drupal 5 site? Deal with an outdated jQuery
  6. Drupal's field API/content construction kit (CCK) will drive you crazy, and it's part of Drupal 7 core
  7. Drupal is a LOT slower than Yii.






What's Yii?  It's an object oriented application development framework.  This blog post does a decent job of describing its attractiveness .. http://programmersnotes.info/2009/02/24/yii_framework_of_my_choice/ ..  One key thing said is that PHP only acquired good object oriented programming features with PHP5, meaning that any framework designed prior to that date simply could not have proper object oriented designs.  Cough, cough, that's one of the principle objections I have to Drupal, that the programming model is so completely strange.

He found that only ZendFramework (http://framework.zend.com/) and Symfony (http://www.symfony-project.org/) had a good quality object oriented design.  But neither had support for event oriented programming.  He then found the PRADO framework (http://www.pradosoft.com/) which then led him to Yii (http://yiiframework.com/).

Advantages to Yii

  • 100% OO architecture. It is really good application design.
  • Authentication & roles mechanism
  • Caching techniques
  • DB access, which is based on PDO
  • Active record and relational active record implementation
  • Validation – that is really, really nice. To create quite complex register form (check if login is unique, if email is unique, email match with confirmation, passwords match, validate integer/string values, check empty fields and give nice error messages for each field you need only template (view) and model with rules defined. It took me 10-15 mins to do that!)
  • Component concept. Just to give an idea, why is it nice – you can define getter and setter methods for properties, you can define read-only properties for components, define and invoke events, attach event handlers and additional features to the class without modifying it, just by attaching additional behaviour to it





Migrating from Drupal to plone with transmogrifier

Migrating Drupal 6 to WordPress 3

The "Modeling Languages portal" transitioned from Drupal 6 to Wordpress 3 in early 2011.  They had assumed there would be tools for this purpose, but found nothing.  They found a few SQL scripts

http://blog.room34.com/archives/4530

http://socialcmsbuzz.com/convert-import-a-drupal-6-based-website-to-wordpress-v27-20052009/

http://info4admins.com/migrate-convert-import-the-drupal-6-database-to-wordpress-3/

The process used by these scripts is to create database tables for a Wordpress instance in the same database that holds the database tables for the Drupal instance.  The SQL scripts go about an in-place conversion of data from the Drupal database tables to the Wordpress tables.

Based on those the "Modeling Languages Portal" team created a Java program to do a better conversion, still using the same process.

http://modeling-languages.com/wp-content/uploads/DrupalToWordpress.java

Finally, they've set up a service company to do such migrations: http://migratetowp.com/

Turns out that a key thing left out by the scripts above is conversion of the user tables & passwords over to Wordpress users.  The MigrateToWP service company does conversion of user tables & passwords.

http://modeling-languages.com/migrating-drupal-6-to-wordpress-3/


A satisfied customer blogged about the experience here: http://howardowens.com/2011/08/23/migrating-from-drupal-to-wordpress/

The instruction steps for running the Java program should give a flavor for how this conversion process works:


  1. Create your wordpress installation in the same database where you have your Drupal installation (not mandatory but will facilitate things)
  2. Truncate the data in the following WordPress tables: comments, posts, postmetaterm_relationships, term_taxonomy, terms. Delete the users table (if you want to migrate Drupal users) except for the first one (the site administrator)
  3. Clone the taxonomies of categories you had in Drupal.  Basically, data in  term_data is moved to the WordPress table terms and term_hierarchy to term_taxonomy (each row in these two tables represent a parent-child relationship)
  4. Clone the posts.  This involves retrieving data from three drupal tables: node (basic post info), node_revisions (body and teaser) and url_alias (url info) and inserting it into the single posts wordpress table.
  5. Fix the image/files URLs in the posts. Images in wordpress are usually stored in the uploads folder while Drupal stores them in the sites/default/files folder. So, the body content must be updated using a sentence like this one:  body=body.replaceAll(“/sites/default/files/contentImages/”, “/wp-content/uploads/”);
  6. Fix the post URL itself. Drupal stores the full URL while WordPress only the last part (the one corresponding to a url-friendly version of the post title).
  7. Relate posts and categories. Take the relationships in the Drupal tables node and term_node and store them in the term_relationships wordpress table.
  8. Update the category count attribute now that we have inserted the data in term_relationships
  9. Migrate the comments. Quite straightforward, from the comments Drupal table to the comments wordpress table.
  10. Update the comment_count attribute in the posts table (if not comments will not be displayed even if they have been properly inserted in the database).
  11. (not part of the script) Don’t forget to define the redirection rules that will forward all requests to the previous URLs of your drupal posts to the new URLs of your  wordpress ones!! You want people to be able to find your posts!