pyblosxom2wxr

pyblosxom2wxr.sh is a shell script that migrates content from PyBlosxom to WordPress. It converts PyBlosxom posts and comments into a WXR (WordPress eXtensible RSS) file that can be imported into WordPress.

Notes:

The post file extension is hard-coded to .txt, since that’s what PyBlosxom expects.
Pages are supported as well as posts. pyblosxom2wxr assumes that post filenames start with the date, in YYYY-MM-DD format, e.g. 2010-07-28_my_post.txt. Files without a prefix in that format are assumed to be pages. (This is hard coded but would be easy to change. Search for the date_re variable.)
The filename is used as the WordPress post/page GUID, and the first line of the file is extracted and used as the title. The second line is assumed to be blank. If your files don’t follow that format, you’ll want to preprocess them or tweak the script.
Categories are not (yet) supported. All posts and pages are assigned to the “uncategorized” category in WordPress.
WordPress limits import files to 2MB, but pyblosxom2wxr can generate output files larger than that. If that happens, you can split it manually or with a tool like ChoppedPress.
By default, the last modified time of post and page files is used as their timestamp. However, if you have a timestamps file from the hardcodedates PyBlosxom plugin, it will be used instead. The default path is ../timestamp; you can customize this by editing the timestamp_file variable in the script.
If you use Markdown or another markup language where line breaks and whitespace are meaningful, you’ll want to apply this patch to the WordPress importer.
pyblosxom2wxr doesn’t assign post ids. It omits <wp:post_id> elements in the output file. This makes WordPress allocate post ids itself.
However, WordPress won’t allocate comment ids itself, so pyblosxom2wxr has to do that and populate them in <wp:comment_id> elements. This means that importing a WXR file generated by pyblosxom2wxr may overwrite any existing comments!
If you use PyBlosxom’s compact_comments.sh, comments imported from -all.cmt files may not be ordered by date. See my page on extracting compacted PyBlosxom comments for a workaround.

Known bugs:

Posts with more than 256 comments are not supported well. Only the last 256 comments will be imported, and will likely be ordered wrong. See the TODO near the end of the script.

Ryan Barrett

pyblosxom2wxr

Leave a Reply