How to Use Google Sitemap Generator with WordPress MU

As you might know, it is important to create an XML sitemap for your site so that you specify what are the links you want search engines to crawl. While it might be argued that today’s search engines would eventually crawl your entire website even without one, an XML sitemap helps you expose links to the search engines that might not be easily visible otherwise, among other things. Also, you need one if you want to effectively use Google’s webmaster tools.

For users of WordPress, there is a popular plug-in called Google Sitemap Generator (aka Google XML Sitemaps). This plug-in can automatically generate your sitemap and submit it to Google, MSN Search, and Yahoo among other search engines. You can also generate a sitemap manually from the plug-in options page or by using a GET query if required.

The search engines look for the sitemap by default in the root directory of a website. This means that for a platform such as WordPress MU this will not work because MU uses virtual sub-domains to construct the addresses for each blog you create. This means that the root directory for each of those blogs, including the main blog, is the same. This is why the sitemaps for all your MU blogs will overwrite each other unless there is a way to save each blog’s sitemap in its own location and serve its address to the search engines’ crawlers.

blogs.dir directory structure

Figure 1: blogs.dir directory structure

WordPress MU creates the following structure in order to store files relative your MU root directory: wp-content/blogs.dir/blog_id/files; where blog_id is a unique id number designated to each blog when it’s created. The main blog always defaults to an id value of 1.

There is a post at richardpalace.com that introduces a hack to Google Sitemap Generator to work with WordPress MU.  The post is titled Google Sitemap for WordPress MU Plugin 1.513101, and it shows the steps you need to follow in order for sitemaps to work properly with MU. I have followed that post for my blog, but found some redundant or unneeded steps especially if you are the only administrator of your MU blog; that is you do not have other users with their own blogs.

So, through this post I will outline the steps I followed with as much illustrations as possible and try to make the process clearer. I will use the original article for reference and quoting since my post is based on it.

Update Google Sitemap Generator

1st of all, update your version of Google Sitemap Generator to the latest. We need to change the core file of that plug-in, which means that every time it is updated the change will be overwritten and we would have to do it again. That is at least until an automated solution comes up. I assume you have the latest version of WordPress MU, but that does not matter since we will not be modifying any MU core files contrary to what the post we are referring to says.

Modify sitemap-core.php

Our next step is to modify the plug-in source file sitemap-core.php which is found in wp-content/plugins/google-sitemap-generator/.  You need to open the file in your favorite source editor (I use NotePad++) and look for the second occurrence of return $res. For me that’s on line 954 using version 3.1.4 of the plug-in inside a function called GetHomePath(). Right before that line, add the following code:

global $blog_id;
$res .= 'wp-content/blogs.dir/'.$blog_id.'/files/';

What we have done here is tell the plug-in to return the WordPress MU blog’s directory when requested for each blog rather than always returning the root directory. This would solve the main issue allows the plug-in to save each blog’s sitemap in a directory specifically assigned to it.

Note that the original code suggested by the reference post has an if-condition which excludes the main blog from being referenced to blogs.dir. That is supposed to mean that the sitemap files for the main blog would go into the root directory. But as we see next in the .htaccess modifications, the redirecting statements will always do the redirects to blogs.dir, which is fine and works for our purpose.

I have the file of v3.1.4 of the plug-in already modified for download: [Download not found].

Setting Up Mime-Types

This step is needed so to include the XML sitemap (.xml) and its compressed counterpart (.xml.gz) into WordPress MU trusted files list. Most probably you would not need to do these steps if you are the only administrator of your MU installation, but I chose to do the easy one which does not involve changing any of the MU files (in case you have other users with their own blogs in your MU installation, you need to do both steps).

From your dashboard, go to Options under Site Admin. Scroll down if needed and you should come across a text field labeled “Upload File Types”. At the end of this text field’s content add a space then xml then another space and xml.gz. It should look something like the image below.

Site Admin, Options. File types text field.

Figure 2: Site Admin, Options. File types text field.

Now if you need to do the other setting, which involves adding the xml mime-types to WordPress MU core source files, I would advise that you install and use a plug-in such as WordPress Mime-Config which allows you to configure extra mime-types without modifying any code. In the plug-in options for Mime-Config you need to add extension xml and text/xml as the corresponding mime-type.

Note that although the Mime-Config plug-in is old, it should work properly with the current version of WordPress MU, but I have not tried it myself.

Modifying the .htaccess File

The .htaccess file contains configurations and directives that are read and translated into actions by the Apache web server. One of the uses is to create mod_rewriting rules which allow WordPress to do all the magic things with URIs (aka URLs) like pretty links and redirections. Although some of you might have IIS 6 with a mod_rewriting emulator installed for WordPress MU to work, I am going to assume that this setup works exactly as Apache when it comes to the mod_rewriting feature since I have no information or experience with the IIS setup.

The .htaccess file can exist in many places throughout your website(s), but what we are interested in is the one in the root directory of the WordPress MU installation. You can download this file through FTP, modify it locally on your computer, and upload it again; or do it through a shell access to your hosting server (in case this feature is provided by your hosting provider). Personally, I have a VPS setup and prefer to do most, if not all, of the stuff in an SSH client.

Here you have to do steps 11 and 12 from the article on richardpalace.com and then save the file.

11. Open .htaccess and find

RewriteRule ^(.*/)?files/(.*) wp-content/blogs.php?file=$2 [L]

12. After the above line, add the following: (The lines might wrap. Each line starts with RewriteRule and ends with [L])

RewriteRule ^(.*/)?sitemap.xml wp-content/blogs.php?file=sitemap.xml [L]
RewriteRule ^(.*/)?sitemap.xml.gz wp-content/blogs.php?file=sitemap.xml.gz [L]

Setting the Google Sitemap Generator Options

Finally, you can now browse to the options page of the Google Sitemap Generator plug-in for each blog on your MU installation and do the necessary settings which include making sure that the selected location of the sitemap is wp-content/blogs.dir/blog_id/files and generate your sitemap(s). The plug-in will take care of submitting them to the search engines you have selected.

Solving Permission Errors

If you get permission errors from the plug-in when trying to generate the sitemaps, it means you have to do some extra work by creating the files yourself and setting write permissions to them. This should work through FTP. Create two empty files on your computer with the source/text editor of your choosing. One of them should be called sitemap.xml and the other sitemap.xml.gz. Upload these files to each of the blogs.dir corresponding directory. For example; blogs.dir/1/files, blogs.dir/2/files, etc; like shown in figure 3.

Sitemap files locations, sitemap.xml and sitemap.xml.gz

Figure 3: Sitemap files locations, sitemap.xml and sitemap.xml.gz

Note that if the blog_id directory and/or the corresponding files directory did not exist; you need to create them first and give them the right ownership. I will explain how to do this in an SSH client session.

Assuming you are in the root directory of your WordPress MU installations, do the following:

1.       From the Blogs option in the Site Admin menu of your dashboard, take note of all the blogs that you want to create a sitemap for and their corresponding ID values.

2.       In your SSH client, and assuming you are already in your WordPress MU root directory:

cd wp-content/blogs.dir

3.       For each blog in the list you’ve created in step 1, do the following:

a.       Create the corresponding blog_id directory if it does not exist. Supposedly the blog_id is 3:

[root@xyz blogs.dir]# mkdir 3
[root@xyz blogs.dir]# cd 3

b.      In the blog_id directory, create a directory called files in case it does not already exist:

[root@xyz 3]# mkdir files
[root@xyz 3]# cd files

c.       In the files directory, create the two sitemap files required for the sitemap plug-in. Note that we are creating empty files, so the idea is to open the vi editor using the file name we want to create, write the file to the disk, and quite. The vi commands use the colon “:” and then a letter/phrase syntax:

[root@xyz files]# vi sitemap.xml
:w
:q
[root@xyz files]# vi sitemap.xml.gz
:w
:q

d.      Now you need to set the permissions on the two files for them to be writable by the plug-in.

[root@xyz files]# chmod 777 sitemap.xml
[root@xyz files]# chmod 777 sitemap.xml.gz

e.      Go back to the blogs.dir directory:

[root@xyz files]# cd ../../

f.        Repeat for the other blogs

4.       After you finish the procedure for all the blogs you have listed in step 1, go to blogs.dir directory and change ownership of all the blog_id directories from root (which is the login you most probably used for the SSH session) to your web-user account. The syntax of the command is as follows:

[root@xyz blogs.dir]# chown -R <webusername>:<webusergroup>  *

Note that usually, the web user’s group name is similar to the web user’s account name; for example: chown -R webmaster:webmaster *

Note: Be careful that you do not change ownership in other directories except where is specified here because you might have dire consequences for your site especially if somehow you ended up doing this in the root of your VPS. That would be a disaster-level event. Trust me, I know!

After you do all the steps outlined here, you should be able to tell the Google Sitemap Generator plug-in to create the sitemap in each blog’s corresponding options page of the plug-in.

Other Options

There is a plug-in on WPMU Dev Premium which is a hacked version of All-in-One SEO. You can install it in your mu-plugins directory and it will create a sitemap for each blog you create and submit it to Google.

Check the plug-in post Sitemaps and SEO – WordPress MU Style for more information. You need to join the WPMU Dev premium service in order to be able to download and use this plug-in.

Share

Related Posts:

This entry was posted in SEO, WordPress and tagged , , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">