Every web page developer and designer has their own unique way of presenting their code, and packaged HMTL software packages often overcode their output, so no matter how hard you worked to streamline your code and make it consistent and valid, there will always be exceptions to your own rules. You still have to go through your own code to find what can go and what can’t, being careful on every step. The key is to look for redundant code you can quickly remove with a search and replace. The more streamlined and CSS dependent your site is, the easier this entire process is.
Since I’d spent several years turning my site into a seriously CSS dependent site, stripping all unnecessary code left and right and top and bottom to optimize the pages for fast access, the process of cleaning it out wasn’t so difficult.
In fact, if it weren’t for all these notes I am writing as I go, and stopping to check the laundry, seeing if there are answers to my inquiries on the forums, telephone calls, eating, and checking email, the first batch of 65 files would have taken about 2 hours. The time will speed up by leaps and bounds as I get more familiar with the process and don’t make so many mistakes – and back things up more often.
Finding a program or utility to do these mass search and replaces on my static HTML pages wasn’t hard. I’d already discovered two great programs to make this process very simple from my earlier site conversion from tables layout to CSS. You can use these, or look for your own, but let me tell you the features you must have in order to do this process effectively and efficiently.
A good search and replace program or utility must be able to search and replace through multiple files, not just the file you are in. It must also be able to search and replace multiple lines. This is a much harder feature to find. Many search and replace utilities in Word, WordPerfect, Notepad, and other word processing and text editing programs will only search and replace a single line of information, often limited to only a few characters. Some limit the number of characters to about 100 or at most 255, so if you have a section of code or text that exceeds that limit, you end up searching and replacing twice to tackle the first half and then the second half of the same code block instead of the entire block. This can be frustrating as the key to successful search and replace means matching unique and consistent information across multiple files. If you divide, say, an image tag up into two parts, the first half features the unique filename, but the second have may only have the width and height of the image which might be found on other images within your files, causing some image tags to suddenly be missing their back half, screwing up your code and layout. Find one that will do multiple files, multiple lines, and either have no limit on search and replace content, or a huge limit so you can deal with large blocks of information.
Using HotDog Pro, my long time favorite HTML Editor, makes searching and replacing content within multiple files simplistically easy. So easy, I’ve screwed up a few times, but all is fixable from my readily available and constant created backups….RIGHT????
Okay, not so right, but close enough. The only other decent program I’ve found for searching and replacing multiple files is Inforapid Search and Replace, a free search and replace program, which gives you the option of actually deciding if each search and replace is required as it crosses files or doing a global search and replace requiring no attention, as well as limiting search and replace to specific types of files. One of the reasons I really love it is because it isn’t picky about what files it crawls through as it searches and replaces your content. HotDog Pro will only search and replace through HTML files, and if I need to do multiple file search and replaces on javascripts or CSS files, like when we changed our domain name, it won’t work. Inforapid goes mindlessly through anything.
With these two files in hand, I knew that this would be an easy process…or so I thought.
Clean and Begin the Search and Replace
Before beginning the search and replace process, I deleted all the files featuring tildes (~), left over from the Tidy conversion, and all backup and non-essential files from the folder I would be working in. Why search and replace through double the amount of files? This also means only the working files I need are in my test folder. HotDog Pro and Inforapid will work across subfolders, but I’m starting small for my test run, with only 65 pages.
The first things that were easily searched for and replaced across the more than 500 web pages were the head and meta tags. These are fairly consistent across all my pages.
All the meta tags and code will disappear with the conversion, so I just searched and replaced the code with nothing. HotDog Pro’s multiple file search and replace utility made the process very fast and easy. I also knew that the doctype and other information in the header would also disappear, input by WordPress in the generation of the pages, so that could go.
Here is an example of what I could easily remove:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html dir="ltr" lang="en-US"><head>
And meta tags like these:
<meta name="resource-type" content="document">
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Script-Type" content="text/javascript">
<meta http-equiv="distribution" content="Global">
<meta name="Rating" content="General">
<meta name="ROBOTS" content="INDEX, FOLLOW">
<meta name="revisit-after" content="30 Days">
What wasn’t so easy to remove were the keywords, title, and descriptions, which were, for the most part, unique to the document. These would have to be either incorporated into the import file or deleted by hand. So I left them wrapped in their meta tags so they would jump out at me as I manually went through the end results to clean them up.
Other consistent data included the footer and parts of the sidebar, which I knew would be totally revised and replaced by the WordPress template files, so these could go. Since they were consistent, for the most part, across all of the files, they went quick and easy, such as this redundant code within the sidebar:
<!-- Sidebar -->
<div id="sidebar"><div id="menu">
<ul><li><a href="../index.html" title="Home Zone">HOME</a></li>
<li><a href="../doing.html" title="What are they doing?">DOING</a></li>
<li><a href="../being.html" title="The Art of Being">BEING</a></li>
<li><a href="../going.html" title="Tips and Advice for Taking Your Camera on the Road">GOING</a></li>
<li><a href="../living.html" title="Tips and Advice for Living on the Road">LIVING</a></li>
<li><a href="../asking.html" title="Asking and Answering Questions about Life on the Road">ASKING</a></li>
<li><a href="../telling.html" title="Telling Stories of Life on the Road">TELLING</a></li>
<li><a id="active" href="http://cameraontheroad.com/category/learning/" title="Learning about nature photography, business, web pages, and the Internet">LEARNING</a></li>
</ul></div>
If you are paying attention, you would see that in this example, the anchor (link) tags include relative link references ("../telling.html"
). On my static site, I have three levels of folders with relative tags linking categories and documents together. So I had to do three search and replace sets to cover the root folder, and the two below it, with the last one featuring ../../telling.html
links.
With most of the head, header, sidebar, and footer information gone, it was time to customize the remaining data into the format WordPress required for importing MovableType blog entries.