Thursday, February 21, 2013

Generic blog conversion primarily in/out of Blogger

Description of Blogger import/export XML format:

Matrix of possible conversions:

Migrating content from Drupal 6 or 7 to Blogger

I have a task in front of me to migrate a blog site from Drupal to Blogger, and fortunately I just found that someone had written a script to do exactly this.  The task is a subset of the larger "Migrate From Drupal" task in that it is a conversion from a Drupal blog to specifically a Blogger blog.  That's a little different from the model I've envisioned so far, which is an export of content from a Drupal site into a neutral format that could conceivably be imported into any other website platform.  The latter, exporting into a generic website content format, would be preferable but in the meantime I do need to convert this specific Drupal site into a Blogger site.

Blog post:


The interesting thing about this script is that it doesn't even attempt to work with the Drupal API.  It just goes into the database and pulls out data.  So obviously it's going to miss any CCK fields you've defined.  On the other hand the script is simple enough for someone to add to the queries to pull in data from CCK fields.  (note: the script is Drupal 6, so CCK is the correct term)

Another guy took the above and created a Drupal 7 version



Another blog post gives detailed instructions on its use:

Another fellow concocted a Rube Goldberg style method where you would move the Drupal content to a Wordpress site using the RSS feed importer in Wordpress, then use a Wordpress to Blogger service to import that to Blogger.

The wordpress2blogger converter is here:

Source code:

It's actually a generic blog converter doohickey and the matrix of conversions are here:

Thursday, August 9, 2012

Methods for exporting content from a Drupal website

I just wrote thoughts on a universal import/export format for website content, and there's obviously the risk of reinventing the wheel.  With the help of Google, let's take a look at some existing methods to export content from Drupal.

Node Export module: "This module allows users to export nodes and then import it into another Drupal installation, or on the same site."  Okay, that sounds useful and cool.  Further the module is available for both Drupal 6 and Drupal 7, and can be run at the command line using Drush.  It exports in one of these formats:

  • JSON - JavaScript Object Notation code which is known for being security friendly. (Drupal 7 only)
  • Drupal var export - A Drupalized PHP array which is similar to var_export(). (Drupal 7 only)
  • Node code - A customized PHP array which is similar to var_export(). (Drupal 6 only)
  • CSV - RFC4180 compliant CSV code. Ideal for viewing in Windows software, and editing data as spreadsheets.
  • Serialize - Very robust, though not human readable, representation through Serialization using the PHP serialize function.
  • XML - XML 1.0 representation which is good for machine-readability and human-readability.
Data export import: This is a sandbox module that hopes to be a generic dataset import/export utility.  It's project page is written much too high level and generic to make sense of whether this is useful or not.

Content migration, import and export: This is a groups.d.o group focusing on import/export.  

Comparison of Content and User Import and Export Modules: A nice useful chart listing the various import and export modules for Drupal, and the characteristics.

HTML Export: "HTML Export allows you to take your Drupal site and select paths from it based on criteria to export to HTML. It supports OG, results from Views, per content type, all menu router items, and all nodes as default criteria for publishing to html."  Sounds really close to what I wrote about in the previous post on this blog (see link at the top).  The Drupal 6 version is in Beta, and the Drupal 7 version is in DEV.

Creating a static archive of a Drupal site: A list of methods to generating a pile of HTML pages out of a Drupal website.  For example:

wget -q --mirror -p --html-extension --base=./ -k -P ./

This could be useful, but the export format doesn't give you the CCK fields as individual data items, doesn't give you the revisions, doesn't do a lot of other things.  It occurs to me the Boost module does something like this as well.

Drush CTools Export Bonus: Hmm... "Adds more functionality to the CTools drush bulk export commands. drush ctex modulename will export all known ctools exportables of a site to a module. But there are also other configurations you'd like to have in code. This project fills in that gap with extra configuration and drush commands."  Basically, ctools has a method of exporting exportables, and this extends that functionality.  It exports stuff into a "module".  Sigh.  So much for reusability of the data in other venues besides Drupal.

How to make configuration objects exportable with CTools:  Goes over some grotty programming API details of how to ..blahblah..

Thoughts on a universal export format for websites built with Drupal or other CMS's

This blog exists to record my thoughts on migrating a drupal website to some other blog or website platform.  This problem has been fiddling in the back of my mind for awhile, and here's where I've gotten to.  The first step is to create a bold brilliant universal website export format.  Okay, maybe that's too grandiose.  But what I mean is to create a neutral format to represent website content, and then to write software to create data in that format from a Drupal website.

At first blush the typical thing to do is generate a single file containing all the data in the Drupal website.  For example that is what Blogger gives you when you export a Blogger blog, an XML file containing everything there is about that blog.  I've looked inside that file and found that it even includes the template being used by the blog.

But on reflection a single file is too monolithic and unnecessary.  My thought instead is to create a directory structure containing data files for each piece of content.  The directory structure needs to match the URL's of the website.

For a Drupal website, this will mean a directory named "node" with files containing the content of /node/1, /node/2, etc.  For sites using pathauto and/or the URL alias modules, there would be a set of directories matching the URL structure of the aliases, with files containing essentially a "redirect" over to the related /node/NNN

What would be stored in the file named /node/12345?  Here's where I'm a little unsure.  It could be a directory, containing a series of data files, or it could be a text file.  In Drupal it has to represent the metadata of the node, the node content, as well as all revisions, and all CCK fields.

Something to explore is how to represent attached files - that is, a file uploaded directly on the node.  The file itself would be under /sites/default/files but needs to be related to the node.

This leaves taxonomy and user data.  These could be exported as a data file rather than going to a setup like I just described.

Creating an export format is obviously just the first step.  The next step would be software that deals with this exported data, and imports it into some other format.  Such as an importer into another CMS, or to directly build static HTML websites.  If the export format is well enough designed, and there are importers for a variety of CMS's, it could truly become a universal export/import format for exchanging website content between CMS's.

Thursday, August 2, 2012

Migrating from Drupal to TYPO3

David Kordsmeier, Head of Sales at AOE media San Francisco, speaks at the 2010 TYPO3 conference about why in many cases migrating from Drupal 6 to TYPO3 is the smarter move than migrating from Drupal 6 to Drupal 7.

Jekyll and converting from Drupal/WP/etc to Jekyll

Jekyll is a static website generator apparently written in Ruby.  What this means is you write page content in some format, then with Jekyll generate an HTML website with navigational stuff etc.  The HTML website then can be served by Apache at high rates of speed.

Jekyll includes a bunch of tools for importing from a variety of other blogging platforms, including Wordpress, Drupal and Blogger.

A nice blog post went over why to leave Drupal, and why to choose Jekyll, and how to do so.

The guy's reason?  His blog had grown pretty popular and was seeing thousands of page hits a day, and routinely getting top rank in Digg.  It meant having to migrate the site from shared hosting to a VPS, and then learn the arcanities of nginx, varnish, etc, all just to host a blog.  Specifically: "As you can see in order to keep a server ready to receive the next tsunami of visits, I had to do too much work keeping Apache, PHP, MySQL, Nginx, Varnish and Drupal up to date and secured, also being sure that all of them could work in sync."

That's too much overhead (voice of experience) for a blog.

The Onion Uses Django, And Why It Matters To Us

Back when I started with Drupal (in the 4.6 or 4.7 days) one of the WOW moments was learning that The Onion (one of America's finest news sites) was being run with Drupal.  I thought, if the Onion was happy with Drupal, then surely it can do things for me.

Turns out that the Onion's tech team have since switched to Django.  The reason was partly because they believe Python to be a better language than PHP, but they also were able to implement a more cleanly designed website.