Moving at the Speed of Creativity by Wesley Fryer

Advice for web 1.0 to 2.0 (WordPress) page conversion?

Geeky but important question: Has anyone had luck batch converting a large number of similar, static webpages into a format that is importable into WordPress? I’m thinking a program like TextWrangler which supports batch grep search and replace commands might work, but that would be a pretty geeky / technical way to do this.

Anyone know of any programs that can streamline the conversion process? These are the pages which need to be converted. Ideally a solution which runs on Windows XP would be best in this specific case. (I’m asking on behalf of Mark Ahlness.)

Technorati Tags:
wordpress, grep, convert

Posted

July 23, 2007

blogs, WordPress

Wesley Fryer

Tags:

Comments

4 responses to “Advice for web 1.0 to 2.0 (WordPress) page conversion?”

Anthony Chivetta

July 24, 2007

One option would be to try and convert the pages into RSS…

For example, on unix you could download all the html files into a directory and then run something like (I ran this in zsh when I was playing):

(for file in $(find ./); do echo “$file”; cat $file; echo “”; done) > foo.rss

The resulting file will have a number of issues, such as the description elements containing and such as well as needing a tag, etc.

The biggest issue here is that you are trying to take unstructured data (html files, not all of which follow the same format) and turn it into structured data (RSS, or whatever format you want to import).

If you can get away with it, making posts that just link to the original html files may be much easier… something like Feed43 ( http://feed43.com/how-it-works.html ) might be able to help you with that…

Hope that helps!
Anthony Chivetta

July 24, 2007

It looks like part of my post got mangled, one of the above paragraphs should read like (assuming it comes out correctly this time):

The resulting file will have a number of issues, such as the description elements containing and such as well as needing a tag, etc.
Anthony Chivetta

July 24, 2007

(OK, third try is a charm, right?… replace the bracketed tags with normal tags)

(for file in $(find ./); do echo “[item][title]$file[/title][description]”; cat $file; echo “[/description][/item]”; done) > foo.rss

The resulting file will have a number of issues, such as the description elements containing [html] and such as well as needing a [channel] tag, etc.
Mark Ahlness

July 24, 2007

Wesley, thanks so much for putting the question out there!
Anthony, thank you as well. I got a little mired down and lost
in variable land over at feed43.com. I had looked at rss as well.
I do have a routine spelled out to convert all the html files
to a WordPress friendly file, but it will take me a while, as
there are over 600 html docs, and it’s manual. Will let you know
how it turns out. Thanks again! – Mark