with Lorelle and Brent VanFossen

Manually Importing Into WordPress Databases

Throughout this long and tedious process, I even tried manually importing my static html pages into WordPress, after they had been converted to XHTML, cleaned up and restructured to meet the manual MySQL import layout. Since I was working with material I’d already cleaned up, changing it from the import-mt format to the MySQL import format was fairly simple, but not that easy.

This process will work if you have already installed WordPress and everything is working fine in the default format. You’ve done a few test posts and everything is good, then you can do the manual import. Again, for those just jumping into this WordPress import process, the import data must be XML validated, so make sure it is validated as XML before you begin. And some cleanup after the import will also have to happen, but the majority of your critical data will be in the database.

view of the SQL panel of PHPMyAdminThere are two methods to get the information into your database. One is INSERT and the other is LOAD DATA INFILE. After a lot of research, I thought the best one for me was INSERT as it felt much more open and easy to work with, but it turned out that LOAD DATA INFILE, while requiring the data to be in the exact field in order, was the best for importing the majority of the information directly into the database. I did use INSERT later, as I became more familiar with the process, but only for much smaller file uploads.

Using LOAD DATA INFILE

The LOAD DATA INFILE gets information from a specific import file and loads it into the database. The import file must be on the server site. The format which follows comes form the MySQL documents on the LOAD DATA command. The format is:

LOAD DATA LOCAL INFILE 'file_name.txt'
     [REPLACE | IGNORE]
      INTO TABLE tbl_name
     [FIELDS
     [TERMINATED BY '\t']
     [[OPTIONALLY] ENCLOSED BY '']
     [ESCAPED BY '' ]
  ]
   [LINES
     [STARTING BY '']
     [TERMINATED BY '\n']
  ]
   [IGNORE number LINES]
   [(col_name,...)]

The WordPress v1.5 wp-post table has the following fields to import into, and while these are the default fields, they may be different in your database. Check your database’s WordPress installation to make sure they match.

1. ID
2. post_author
3. post_date
4. post_date_gmt
5. post_content
6. post_title
7. post_category
8. post_excerpt
9. post_status
10. comment_status
11. ping_status
12. post_password
13. post_name
14. to_ping
15. pinged
16. post_modified
17. post_modified_gmt
18. post_content_filtered
19. post_parent
20. guid
21. menu_order

Look at your data and determine which elements you need. I needed:

post_author
post_date
post_content
post_title
post_category
post_excerpt
post_status

Like the MoveableType import, the order of the data is critical and you must know the specific names of the fields and their place in the structure in order to properly import the information into your database.

The structure of the import file must be one “record” per line (with no line breaks inside the record) and quotes marks and commas between each “field”. This complicates things a little.

The catch is that the quote marks in the HTML won’t allow the import file to actually import into the database. The quote marks have to be “escaped”. This means that they have to be hidden from the import process.

Let’s make this easier to understand since it took me days to figure it out. If the import process detects a quote mark, it thinks the import is finished, at the end of record, or something other than what your intention is. To keep the quote marks around things like:

<img src="ball.fig" height="50" width="45" />

in your import, and not make the import break when it encounters the quote mark, you have to put a a backslash before EVERY quote mark in your data. This sounds like the perfect solution, but wait! There is hope.

As you do the massive search and replace process to change the form you have the information in to match the form and order structure of the import, when you are ready to begin searching and replacing the identifying text and markers for the beginning of the different table fields, STOP.

In the fourth line of the MySQL import format for LOAD DATA INFILE you will see the “Fields”. This is where the fields need to be set, as listed above. Each Field is separated by a quote mark, comma, quote mark. If we put these into the import file, then search and replace every quote mark with a backslash quote mark (“) to “escape” the quote marks, we’ve just killed our fields.

Before putting in these field separators, we need to “make our escape” – search and replace every quote mark with the backslash quote mark (“). Once that is done, then we can search and replace each of the line breaks or dividing codes to replace them with quote mark, comma, quote mark (“,”).

It is very important that all of the information in the import file be ONE LINE PER RECORD. That means that every “page” is a record and it must not have an ENTER, line break, or hard return (whatever you want to call it) inside of the record as that means it is the end of the record. To create this break-less record, all of the line breaks in your HTML must be removed. I recommend that you first set the beginning and ending of the record sections with some unique markings like XXXQQQ so you know where the record begins and ends. Then search the entire document for line breaks and replace them with nothing. Not even a space bar. Then search for the XXXQQQ and replace it with the closing quote, line break, open quote to end the field previous and start the next field and record. Be sure and add the missing quotes at the beginning of the very first record at the top of the document, and also check for a closing quote on the very last one at the bottom of the document, as those can get forgotten.

If you are using a text editor that forces line breaks, this is bad news. Make sure whatever program you are using will not force line breaks while you are using it, or while saving the file as that will totally screw up the import.

My import command using LOAD DATA INFILE looked like this:

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
     (post_author,post_date,post_content,post_title,post_category,post_excerpt,post_status)
     FIELDS TERMINATED BY ','
     ENCLOSED BY '"'
     LINES TERMINATED BY '\n';
     ESCAPED BY '';

And the first two records (pages) of the condensed version of the import file looked like this:

"Lorelle VanFossen","11/25/2004 03:31:05 PM","<div class="boxblue"><div class="bluetitle"><span>Checking Content</span></div><div class="bluebody"><span>Begin your web page validation</span> by checking the content of your page. It's actually fairly simple. Is everything spelled correctly? Are the sentences compete? Is the grammar right? While this should be done at the time of creation, after you've messed around with the code for a while, you often have a fresh perspective on the content, so take another look to make sure it says what you want it to say, and it says it well and accurately......","Validating the Code Behind the Page","Learn","Working with HTML, CSS, and web page validators to validate web pages for accessibility and search engines.","publish"
"Lorelle VanFossen","11/25/2004 03:31:05 PM","<p>In 2003, experts are saying that the ratio of virus-infected messages to other e-mail traffic increased by nearly 85 percent. If this doesn't scare you into purchasing an anti-virus software program and keeping your operating system up-to-date, I don't know what will.</p><p>If you haven't protected your computer against the people with too much time on their hands who think that violence and trouble is a way to win favor in the world, we have some tips to help you get started leraning about.....","Viruses, Trojans, Worms, and Other Email Miseries","Learn","Tips on avoiding and stopping viruses, trojans, worms, and email invaders and spam.","publish"

I also learned quickly that the size of the file to import had to be less than 2 megabytes. Maybe this is a limitation of my host server’s MySQL version, or something else, I could not find the reason, but if I passed 2 megabytes in the file size, it often choked. So this was piece meal work, but it worked.

Using MySQL INSERT

I also tried to use the MySQL INSERT command, which worked great for small imports, like for categories with only a few posts. I was trying everything, so here is my attempt to use MySQL INSERT to get my web pages into WordPress.

Like LOCAL DATA INFILE, I had to “escape” the quote marks and keep each record on its own line, but I could play with the order of the fields using INSERT because their structure is set in the first part of the SQL statement. Here is a condensed version of two of my web pages being loaded into the WordPress database on my site using INSERT:

INSERT INTO wp-posts ("post_excerpt", "post_title", "post_author", "post_date", "post_category", "post_content", "post_status")
VALUES ("Working with HTML, CSS, and web page validators to validate web pages for accessibility and search engines.","Validating the Code Behind the Page","Lorelle VanFossen","11/25/2004 03:31:05 PM","Learn","<div class="boxblue"><div class="bluetitle"><span>Checking Content</span></div><div class="bluebody"><span>Begin your web page validation</span> by checking the content of your page. It's actually fairly simple. Is everything spelled correctly? Are the sentences compete? Is the grammar right? While this should be done at the time of creation, after you've messed around with the code for a while, you often have a fresh perspective on the content, so take another look to make sure it says what you want it to say, and it says it well and accurately......","publish"
"Tips on avoiding and stopping viruses, trojans, worms, and email invaders and spam.","Viruses, Trojans, Worms, and Other Email Miseries","Lorelle VanFossen","11/25/2004 03:31:05 PM","Learn","<p>In 2003, experts are saying that the ratio of virus-infected messages to other e-mail traffic increased by nearly 85 percent. If this doesn't scare you into purchasing an anti-virus software program and keeping your operating system up-to-date, I don't know what will.</p><p>If you haven't protected your computer against the people with too much time on their hands who think that violence and trouble is a way to win favor in the world, we have some tips to help you get started learning about.....","publish")

Either method will work, so you can try them when the normal import methods fail for getting your web page material into the WordPress database.

2 Comments

  • Posted December 1, 2005 at 18:34 | Permalink

    Just out of curiosity, do you know if there is a limit on the number of pages (static posts) WP can handle. I tried to import about 1,300 pages and it caused things to go batty and not allow me to access any of the pages. But, if I put in 50, all was good in the hood.

  • Posted December 2, 2005 at 0:31 | Permalink

    To answer your question, we need to clear some things up. First of all, you can import Pages or posts into WordPress. The problem is that all will come in as posts.

    The idea of more than 3-5 “static posts”, know as Pages in WordPress, is outrageous. If you mean Pages, then you are thinking old thought and old school web design and not modern thinking. Posts are web pages, and they can consist of articles, posts, blogs, or whatever name for the content you want.

    Keep your articles or posts as “posts” and not Pages and you will be able to use the full power of WordPress. Make them Pages, and you will have lots of problems. For instance, as far as I know, currently any searches of your site in WordPress will only include posts not Pages. Pages are for About, Contact, Site Map, static information outside of the context of your main content.

    Now that this is cleared up, we have two more issues at hand.

    First is the fact that MySQL has limits on how much information you can import at a time. This is not an issue of WordPress. I found that you are limited to about 1 meg. I was able to do 1.5 megs but it croaked on 2 megs. It also depends upon the version and your host limits and they can freak if you make too many hits on your database at one time. This caused me no end of trauma. I had to wait an hour between imports for the massive amount of data I had when I triggered the “too many hits” on the server. Dumb.

    Second, WordPress can handle tons of information in the database. I used to know exactly, but trust me, you can write a novel every day of the week for years on end and not reach that peak.

    The size of content, post size, and such is only limited by the database, usually MySQL’s limitations, not by WordPress. WordPress can handle just about anything you give it. It’s the database limits you have to worry about. So enter years and years of posts and you’ll not have to worry about anything but the limits set by your host.

    Importing is a pain, but it’s worth it to have all your content follow you into your new site structure.

    Oh, and there are no limits on the number of posts or Pages you can have within a WordPress site, just remember that if you use Pages to create your posts, you will regret it really fast. I tried that and I learned the hard way. Learn from me and you won’t suffer so bad.

    I hope this helps clear things up.

Post a Comment

Your email is kept private. Required fields are marked *