Thursday, August 9, 2012

Thoughts on a universal export format for websites built with Drupal or other CMS's

This blog exists to record my thoughts on migrating a drupal website to some other blog or website platform.  This problem has been fiddling in the back of my mind for awhile, and here's where I've gotten to.  The first step is to create a bold brilliant universal website export format.  Okay, maybe that's too grandiose.  But what I mean is to create a neutral format to represent website content, and then to write software to create data in that format from a Drupal website.

At first blush the typical thing to do is generate a single file containing all the data in the Drupal website.  For example that is what Blogger gives you when you export a Blogger blog, an XML file containing everything there is about that blog.  I've looked inside that file and found that it even includes the template being used by the blog.

But on reflection a single file is too monolithic and unnecessary.  My thought instead is to create a directory structure containing data files for each piece of content.  The directory structure needs to match the URL's of the website.

For a Drupal website, this will mean a directory named "node" with files containing the content of /node/1, /node/2, etc.  For sites using pathauto and/or the URL alias modules, there would be a set of directories matching the URL structure of the aliases, with files containing essentially a "redirect" over to the related /node/NNN

What would be stored in the file named /node/12345?  Here's where I'm a little unsure.  It could be a directory, containing a series of data files, or it could be a text file.  In Drupal it has to represent the metadata of the node, the node content, as well as all revisions, and all CCK fields.

Something to explore is how to represent attached files - that is, a file uploaded directly on the node.  The file itself would be under /sites/default/files but needs to be related to the node.

This leaves taxonomy and user data.  These could be exported as a data file rather than going to a setup like I just described.

Creating an export format is obviously just the first step.  The next step would be software that deals with this exported data, and imports it into some other format.  Such as an importer into another CMS, or to directly build static HTML websites.  If the export format is well enough designed, and there are importers for a variety of CMS's, it could truly become a universal export/import format for exchanging website content between CMS's.

No comments:

Post a Comment