with Lorelle and Brent VanFossen

Importing Into WordPress with the Import-mt

I finally got the import-mt.php import file that comes with the default WordPress installation to work. Whew! What a long and hard ride, but I learned a lot and I hope you will find some lessons here, too.

Some of the following information might be a little redundant, but if you are finding this page first, and not reading through all my previous attempts YET, then this article will be of more value than the others. I do recommend you take a peek at them as you will learn a whole lot about this process.

The magic of making the import for MoveableType work for static HTML pages having consistent formatting on every page. On my original pages, every title was in an H2 heading which is never used elsewhere on the page. The author is also in a unique div, as is the content and footer, and so on. The more uniform your page’s code layout, the easier this process will be.

First, copy your HTML code to a text editor. If you use Word or WordPerfect or other word-processing program, you are asking for mistakes unless you are an expert at tweaking that program so it will not convert quote marks or hyphens into character codes, will not screw up your html tags, and – well, if you know what I’m talking about, you can use a word processor. If you don’t – don’t use one. Trust me. You are only asking for totally borked code. So get a powerful text or html editor and you are good to go.

The only criteria, and this makes things a little more difficult, is that the program should have extensive search and replace capabilities, specifically searching and replacing multiple lines of text.

During this process, we recommend you make many backups along the way for you will make mistakes. It’s part of the process. Make many and frequent backups. Name the backups by the date and time so you can easily go back to the most recent one if you make a mistake.

During the extensive search and replace process to turn your html document into the form needed for the import-mt, your goal is to emulate the following example of the end result you need in order to import your data using this import technique that emulates the MoveableType import/export format. The information MUST BE IN THIS EXACT ORDER AND LAYOUT.

--------
AUTHOR: Author Name
TITLE: Title of the Post or Article
STATUS: Publish
ALLOW COMMENTS: 0
CONVERT BREAKS: 0
ALLOW PINGS: 0
PRIMARY CATEGORY: Home
CATEGORY: About
DATE: 7/10/2000 03:10:03 PM
-----
BODY:
<Article text is here with html and all kinds of information including <i class="red">html using quotes</i> lots of lists and other information....on and on to the end.</p>
-----
EXTENDED BODY:
A rattling on of the article information.
-----
EXCERPT:
A summary of the information which is about nothing important.
-----
KEYWORDS:
fred, sally, nothing, important, but keywords, here
-----
COMMENT:
AUTHOR: ben
EMAIL: something@something.org
IP: 123-45-6789
URL: http://www.asite.com
DATE: 10/07/2002 06:58:26 PM
So...How did you do?
-----
COMMENT:
AUTHOR: fred smith
EMAIL: somethingelse@something.org
IP: 123-45-6789
URL: http://www.asite.com
DATE: 10/08/2002 08:58:34 AM
Comment here that rattles on about something important.
-----
--------
AUTHOR: Silly Person
TITLE: Some Fascinating Idea
STATUS: Publish
ALLOW COMMENTS: 2
CONVERT BREAKS: 0
ALLOW PINGS: 0
PRIMARY CATEGORY:
DATE: 10/05/2002 03:10:03 PM
-----
BODY:
.....and it continues on.

Begin The Process

Copy each html page into the editor and put an 8 dashed line (——–) in between each html page, taking advantage of the doctype or <html> that begins every web page to use for your search and replace. This dashed line is the divider between your “records” (individual web pages).

Now, we are going to use that 8 dash line as our starting point for the next search and replace sequence. Search for the 8 dashes followed by a hard return (line break):

——–

and replace it with the 8 dashes, the hard return (line break) and the following:

--------
AUTHOR: Author Name
TITLE: Title of the Post or Article
STATUS: Publish
ALLOW COMMENTS: 2
CONVERT BREAKS: 0
ALLOW PINGS: 0
PRIMARY CATEGORY: Home
DATE: 7/10/2000 03:10:03 PM
-----
BODY:

Between BODY and the line above is a five dash line (—–). This is the separator between fields.
Adjust the information to your needs. Some of this information may need to be individually searched and replaced. For example, my articles have the author name in a unique DIV, so I searched and replaced:

<div id=author>Author Name</div>

with:

AUTHOR: Author Name

Replacing “Author Name” with the correct name.

At the end of the post, before comments and other information, at the end of the post information, search and replace the post ending code with a five dash line. For example, I had a DIV that states:

<div id="next"><p><a title="next article in the series href="article42.html">Next Article: Article Name 42</a></p></div>

This is consistent (with a different file name) at the end of every article, so I could easily replace:

<div id="next">

with the 5 dash line as the field separator:

-----

Now, you will have a lot of excessive information still left in your file, meta tags, sidebar information, CSS, and other lines of code that you won’t need any more. If they are consistent, search and replace to get rid of them. If they aren’t, get rid of as much as you can and then you will have to clean up the rest manually.

HTML Meets XHTML

To make sure that everything is XML compliant and ready for WordPress, I went through and checked all the code. Here are a list of my search and replaces:

  • <hr> --> <hr />
  • <br> --< <br />
  • Curly Quotes to text quotes (no character codes only quote marks)
  • Curly Apostrophes to plain apostrophes (no character codes only apostrophes)
  • [hyphen] --> - (hard encoded dash – usually formed by holding down the Cntrl+hyphen key)
  • Double Blank Lines --> Single Blank Lines
  • img tag endings from "> to " /> (inspect each one before changing as "> is found on hyperlinks)

These are the most common. Your html code may be different, so either have it converted using special software or manually inspect for your own needs.

Manually Check the Data

Besides checking for XML and non-friendly WordPress characters, I go through the data and clean up things that either might mess up the import, or that just need cleaning.

For instance, occasionally I would break a title into two lines with a line break. This won’t work, so I manually had to go through and clean those out. I sometimes also use ID in the DIV such as <h2 id=”information”> which would be missed in a simple <h2> search and replace. This has to be caught and corrected.

Other code, tags, and information that didn’t belong any more, and that wasn’t consistent across multiple pages, had to be deleted. This is time consuming, but it has to be removed.

When you think you have it all cleaned up, you probably don’t. Go through every bit of data and double check it. As you go, you will need to fill in the “missing” information from your earlier search and replaces. Check the following:

Author
Enter the name(s) of the article or post author(s). If these names deviate in any way from the spelling of the WordPress administrator, new Author User Profiles will be created and the maximum permission level these users will have is 9. By ignorance, I set my User Profile name to be “Lorelle” and yet my import listed “Lorelle VanFossen”. “Lorelle” has level 10 top administrator status, but “Lorelle VanFossen” is stuck at user level 9. Do check your name carefully if you are the only author so your posts will be put into the administrator’s name.
Title
Add the title for each article or post. < a title="article about the compromises made by switching to WordPress" href="index.php?p=588">Think about the post title as you add it. If you choose to use permalinks, these will become the new link titles for your post. If your title was “Another Day at the Office” the permalink would become:

http://example.com/another-day-at-the-office

If this is the point of the post, leave it. If the story of the post is about the copy machine breaking down and your attempts to confront its innards results in a burst of toner that covers you from head to toe and the panic that followed, maybe the title should be more fitting and be “Assaulted by Copier” or something more memorable.

STATUS
You have two choices for your post status: publish or draft. If ”’draft”’ is chosen, the post will be added to your draft post list and you can access and edit it from the Write Post screen from a link below the menu tabs. If you choose ”’publish”’ the post will be immediately viewable after import on your website.
ALLOW COMMENTS
You have two choices here, too. If you put the number 1 here, comments will be open. If you use a number 0, comments for this post will be closed. You can open them later, and it’s recommended that you set them to be closed until you have finished all the editing and checks after the import has been made, or people might be writing comments about how messed up the post is and you will spend more time checking comments than cleaning up your site. It’s up to you.

CONVERT BREAKS
Again, you have two value choices. To not convert breaks, use 0, and to convert breaks, use 1.
ALLOW PINGS
To allow pings, use the value 1, and to turn them off on this post, use 0.

PRIMARY CATEGORY
You have two choices for your categories, a primary category and a subcategory, called “category”. WordPress uses parent categories and the subcategories are also known as children categories. If your post has only one category, list it here. The category can be one or more words with spaces or dashes in between, but no commas or other characters.
Category
If your post is also in a subcategory or is a child category, state the subcategory here. If the post isn’t in a subcategory, remove this line.
Date
Manually edit the date based upon the American format of month/date/year and include the time as follows:

07/21/2003 03:10:03 PM

If you want a different date format, that is controlled from within WordPress. WordPress sorts posts chronologically from most recent to oldest. If dates aren’t important to your content, they are important to creating an order to the posts in a series. Posts which run in series should be dated as follows:

Article 1 March 15
Article 2 March 14
Article 3 March 13
Article 4 March 12
Article 5 March 11
Article 6 March 10

This way, the order is preserved even if the dates are unimportant.

Recheck Manually

Eyes get tired going through all this data, but take time to rest them by saving this information and then coming back to it at least six hours later. Overnight is even better. This way, you are refreshed and ready to look at all of this with new eyes.

Look for little details that might have gotten forgotten or missed in the first manual edits. Stray unwanted code might have been missed, a title or author forgotten, or some other detail. Go through it all with a mental magnifying glass to see if you can catch anything that you don’t want or that might get in the way of the import.

Make sure that every web page record has the record divider of the 8 dashes and the fields are separated by 5 dashes. If anything is blank, delete it. Leave only the barest essentials you need for the import.

If you are really serious about this, spell check your post content and then do one more thorough edit to see if there is anything more than can be cleaned up.

With the HTML inside of the post content, make sure that all the tags are still there, and all open tags are closed and all self-closing tags are closed.

Things You Need To Know

A lot of questions come up during these imports and here are a few of them with the answers:

What about my intra-site links?
Intra-site links are links within a post or article that link to another one on your site. Leave them. They will import just fine and you can later either manually edit them to the correct URL address or redirect the links through the use of the .htaccess file rewrites and redirects.
Will quote marks or apostrophes halt my import?
Unlike an import directly into the MySQL database, WordPress’s import-mt process ignores quotes and apostrophes in the post content section so they will import without any problems. Just leave them alone.
What about links to my graphics and photographs? Will I lose them?
If you leave your graphics and photographs in the same folders that they currently reside in, and remove the relative links to them, so instead of the image link being:

../../photos/travel/spain/barcelona42.jpg

You remove the dots and slashes to the following, if photos is in your site’s root folder:

/photos/travel/spain/barcelona42.jpg

If it isn’t, then add the parent folder before photos folder with a forward slash in front of it. If you keep all your graphics and photographs in specific folders, this can be easily changed later, after the import, if you have problems seeing the images.

What will happen to my styles listed in the head of each web page?
If you have any styles listed in the head of any web pages, this information must be moved into the core style.CSS for the WordPress Theme you are using, or set into another separate style sheet that you can add later using one of the many WordPress conditional tags within the WordPress PHP Loop. If it is not saved, it will be lost, as such information must be removed from the import file.

Any inline styles such as:

<p style="font-size:110%; color: green; margin: 10px">

will remain undisturbed so you can leave them.

Will it import duplicate posts?
By default, the WordPress import scripts will not import duplicate posts, saving you a lot of time and effort to track those down.

Begin the Import

Once you have triple checked your import document, and your concerns have been answered, then it is time to put WordPress’ import-mt.php to work.

Save the file as import.txt, making sure that the file is indeed a text file and not any other type. Upload the file to the wp-admin folder on your WordPress site. Then direct your browser to the following address, using your specific website information:

http://example.com/wp-admin/import-mt.php

The rest of the process is up to WordPress.

If you do get an error, it is usually very specific so you can track it down. Or not all of the posts will import. It will show you the list of what has been imported. If some haven’t been imported, this is usually because the 8 and 5 dashes lines weren’t set right, or there is some other detail that is not right in the import file, like “category” being misspelled. Carefully check that post against the other ones to help find the error.

If you find the error, you can either copy and paste that page record into its own import.txt file and repeat the import, or reimport the fixed original import.txt files, as WordPress will not permit duplicate posts to be imported. The fixed post will be imported and the rest should be ignored.

If you are really having trouble with a particular post, then manually add it to WordPress through the Administration Write Post screen, copying each bit of information into the right spot.

If it worked great and everything imported, then it’s time to start checking the results in WordPress. To view the new material, type your site’s URL in the browser and crawl around looking. There may be a few little bits and pieces that aren’t right, but you can now go into the WordPress Admin area and edit your posts to clean up these details. As long as they are in the database, you can do anything you want with them.

Our goal as been achieved!

4 Comments

  • Posted February 19, 2007 at 11:50 | Permalink

    I’m finally getting around to using this piece of yours to guide my import — you’d recommended it to me at your Lorelle on WordPress Comments page. So far so good, but a question. My WP site is set to display authors based on the Nickname instead of the Username. Should I make my Author items in the import match the Nickname or the Username?

  • Posted February 19, 2007 at 22:25 | Permalink

    Good question, and apologizes for the mess here. At least I think it’s a mess. ;-)

    If your WordPress site is set up to display authors based upon the nickname, you better use it. I would not use username as that is part of the password access. I would use one of the other author template tags in the WordPress Theme to feature the author’s name. Then your import should match “Author”.

    After the import, you can use the Users panel to move posts around to the “right” author or author name very easily.

    Good luck and let me know how this turns out. It was a long and tedious process to figure this out.

    And I am anticipating redoing all these instructions for Lorelle on WordPress next month, after the current Plugin series is OVER! Can’t wait. ;-)

  • Posted February 26, 2007 at 15:36 | Permalink

    Well, I completed my import work. Generally, a success. Thanks so much for your advice here, it was immeasurably helpful. A few things worth noting about how it went:

    –Uploading — The import appears to be a bit different now than in your instructions. No need to upload the file to a spot first, the import itself allows you to point directly to a file on your local drive. Convenient, especially when importing multiple files, which I had.

    –Authors — The import — presumbly also due to changes — allows you to choose an existing author to map import file posts to, so that whole things was basically a non-issue. Piece of cake to handle.

    –Allow pings — Doesn’t work! I had ALLOW PINGS: 1 as directed, but all imported posts came out with the box unchecked. Alas.

    –Search results affected — Whether searching from the site itself or doing a search within the admin interface such as at Manage Posts, I discovered that many of my imported posts weren’t showing up, but some were. I soon realized that the ones that were showing up were ones that I had gone in to modify after the import. It appears that importing along isn’t good enough to get that content into whatever index WP uses for the search functions. For that, it appears you have to hit a manual save on each individual post. Good thing I wanted to add Ultimate Tag Warrior tags in my posts anyway, I’ve got a good excuse to go into each anyway. But, of course:

    –Tags — Naturally, since UTW is a plugin, there is no way to import tags automatically, so, bad news, you have to go in manually to each to add them! It sure would have been nice if there was a way to map the MT Keywords to UTW Tags. Alas. So be it.

    –Multiple files — I consistent got time-out errors if I tried to import more than 30-32 posts at a time. I quickly realized that I had to just break things up into blocks of that size. No big deal.

    Well, I think that’s it. Overall, a great way to get tons of content in quickly, and good be aware of the handful of pitfalls. Thanks again for your help!

  • Posted February 27, 2007 at 0:12 | Permalink

    You are a star. Thank you for the excellent feedback and tips. That will certainly help in updating the article when I put it on Lorelle on WordPress.

    About the search feature – that’s very odd. I’m going to ask about that and see what’s going on. I never had to do that.

    This is terrific info. Thank you so very much!

5 Trackbacks

  • [...] For ages I’ve been battling with long waits after hitting the Save button on the Write Post panel. Some of these waits have reminded me of the early Internet days where a cup of tea could be boiled, prepared, and consumed in the time it took to load a web page. The long delays and waits have a variety of excuses. The more links you have, the longer it takes to process the pings and trackbacks is the number one excuse. I’ve tried a wide variety of things to speed up the process during the importing phase of my website, but little seemed to work until I took drastic measures and removed all pinging capabilities from the options, database and core programming. Painful. [...]

  • [...] I tend to use relative links rather than absolutes. When it came time for my images to appear in my WordPress site, I found they where in the right place, but WordPress wasn’t finding them since there were outside of the wp-content folder and in a folder of their own in the root directory. I certainly didn’t want to go through and search and replace every image reference in my database after importing it. And I wasn’t going to move my image directories around. [...]

  • [...] While I was frantically working on importing my huge static HTML website into the WordPress database, I used several tools that made the process easier, and one of those tools was InfoRapid’s Search and Replace Freeware Utility. [...]

  • [...] Wow! This is the exact same reason why I left behind my static HTML site started over a decade ago. Excellent! The process of importing all those years of posts and articles into WordPress from straight hand-coded HTML was time consuming and complex, but I’m thrilled with WordPress, too. [...]

  • […] Type import was the closest I could get to a plain text (static HTML) importer for my content. I converted all 2,000 web pages, losing a few along the way, into something that would work as a Movable Type […]

Post a Comment

Your email is kept private. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>