Once I had a grip on the language of PHP and an understanding of MySQL, I needed to start thinking about how it would convert into database material. And I needed to figure out how to get this information not only into the database, but to meet WordPress’ needs for generating the content. Nothing is harder than trying to stick a square peg into a round hole and I need to get my website content into a form that WordPress will not only accept but like to work with.
I decided to begin slowly, with only 65 web pages at first, so I could do damage on only a small number of my 500+ web pages. I knew that this would get me through the learning curve, and take less time to figure out the process than if I was smashing through all 500+ files. Once I figured out the process, I knew it would go much faster across the rest of the files.
Luckily, WordPress has put a lot of effort into making it as easy as possible for the users of other blog software to import their data. WordPress supplies a wide range of import scripts for popular blog software, but little help specifically for non-blog software, like the strictly old fashioned, do-it-yourself html encoded blogs or websites, like mine.
What they haven’t done is come up with a simple way to CONVERT the data, only import. So it’s up to me to make my site data conform to one of their import systems. Unfortunately, WordPress imports are designed to move data from one database system to another, and not from a static website INTO a database. This means I have to convert my data into something that one of the import files will accept, digest, and spit into the database.
Of course, I wanted the easy way out. I wanted to find software that would convert HTML into database material – easy to import. What I was asking for was for someone to create software that would read through all my HTML and CSS coding, pick out what was important to little old me, and strip away the gunk and leave something nice and pretty, ready for import to the database. WRONG!
There is no clean and fast way. As good as software is, we are still not to the Star Trek world and there is nothing that can read my mind, or my html pages, that will give me database material. I looked everywhere. Some will hint at it, but the best I can do is strip the code out so HTML will convert to text. Taking the HTML out will destroy my layout, so that wasn’t an option. There is no nice and clean and fast way to do this. I’m stuck.
Converting Static HTML to Import Material
Ah, but contraire, my friends! Lorelle found a way to do it actually quite easily. Shooting in the dark with only a little help from the forums (until I used capital letters and started pleading), I figured it out. Come along for the long ride.
I began by thoroughly studying the data form required by WordPress in order to “easily” import the information. I decided that the most simplistic form that I could covert my HTML pages to was the import-mt format, used for importing MoveableType blogs into the WordPress database. I liked it because it looked simple to convert my site’s web pages to something similar, and it would also allow importing of HTML/XHTML tags so I could keep my formatting within each page, such as tip boxes, photographs, and graphics. All I had to do was sort my HTML information into the import-mt format.
I found a tutorial on How to Import MovableType Entries into your WordPress Blog and MoveableType instructions for importing data and began to memorize them, tearing each apart and putting it together so I understood each element.
Work From a Copy
The next step was to make sure I had the most current version of my site to work from, and that it was validated to death. I copied my entire website from my site to my computer’s hard drive into a new folder, ready to destroy and rearrange. The originals are still protected and backed up, thank goodness, because if I screw up along the way, I have to have more than one backup.
I also spent some time double checking and validating a good number of pages to make sure that what I was starting with was in good condition. I’m glad I did because I found a lot of little errors, not life shattering but capable of causing me grief later on after the conversion from HTML to XHMTL. The issue of every tag either having a closing tag or being a self-closing tag made it very important to find or add every closing </p> and </li> tag.
One of the little things that also caught me off guard was the issue of the tags all being in lowercase. I’d made that a “rule” during my last major revision of the website, in order to be compliant for when I finally made the move to XHMTL, but some still slipped through, left over from a prior HTML editor that capitalized HTML tags by default. I had to go through and check for those stray capitalized tags, and thankfully, I only found a few in my test pages.
The next step is probably the most tedious. I thought it would be easy, but it may not turn out that way. I have to first covert the HTML into XHTML in order to validate and meet the requirements of WordPress.
After a bunch of research, it turns out that HTML Tidy is the best way to go, but using this program is like going back to Windows before they had numbers after the title. I’m telling you, it’s like working with Pre-Windows 3.11. It harkens back to DOS 3.3 and earlier. Does anyone remember those “good old days” where dreams of having 640K of RAM were still fantasy?
Anyway, Tidy is powerful and archaic, to say the least. There are a few Tidy GUI programs out there that turn Tidy into a Windows program, but they are few in number. If you can handle the old method, 106-IBM has good instructions for helping you through the process, but only as a starting point. After looking at my few choices and failing to make the old fashioned DOS versions work, I settled on Hab Utilities HABTidy, a very simplistic free Windows GUI interface which does multiple files, but no more than 11 at a time
I set the custom options to include “output xml” and choose the “Process Files as a Group”. It works really fast so I didn’t cry too hard at being limited to only 11 files. Select the files and set the selector at the bottom to read “Format with custom options” and you are good to go. HABTidy saves a backup of the file as ~filename.html and changes the original. This conversion isn’t perfect if you have unvalidated HMTL, but if you have seriously validated your code prior to beginning, the conversion works like a charm.
Once I had the converted files, it was time to convert those into something WordPress and MySQL would recognize for import via the MoveableType import for WordPress.
It seems that MoveableType uses the same import and export format, so by following the instructions at MoveableType instructions for importing data, I had a format to follow.
It was time to begin the Search and Replace.