DEV Community

Bouchaala Reda
Bouchaala Reda

Posted on • Edited on

Migrating a WordPress blog from subdirectory to subdomain without loosing URL structure with Nginx

If you just want to look at the full Nginx config used jump to the end of the post.

From an SEO perspective using a subdirectory or subdomain for your blog/site is a subject of debate. Have a look at this article for an example.

The goal of this post however, is to help you to physically move your WordPress site from your subdirectory to a subdomain without actually having to change your URL structure (i.e. still looks like it's a subdirectory).

The reasons why you'd want to do this may vary. Here are mine:

  • We had a WP blog that lived in the same repository as the main website (let's call it site.com).
  • In order for the SEO team to manage the blog, they had to go through the tech team every time they needed something changed/added.
  • We have a lot of blog traffic, and we didn't want to lose any traffic nor SEO credit by moving everything from site.com/blog to blog.site.com.

So the requirements for this migration are now clear:

  1. Move the WP blog from the site.com repository to a managed WordPress hosting provider like HostGator, Hostingr... etc.
  2. The blog under site.com/blog should still be accessible by visitors as normal, and it should serve the new managed blog at blog.site.com.
  3. blog.site.com MUST NOT be accessible directly by visitors, only via site.com/blog.
  4. Blog traffic must be served with HTTPS.

Requirement number 1 is quite straightforward. We are left with requirements 2 to 4.

The way I did was by adding an Nginx reverse proxy on the main website (site.com) to serve the blog contents from blog.site.com as if it was hosted under site.com/blog. So let's make a start on that.

Here's a diagram of how things work. We'll dive into the Nginx config next.

Diagram showing a request response flow between a visitor, Nginx and our blog

Nginx Reverse Proxy Config

Basic reverse proxy config looks like this



# Any URL path that starts with blog will be using this config block.
location /blog {
    # Request paths coming into our main site will be /blog/something
    # But we want to send requests to blog.site.com as /something
    # So we use rewrite to strip /blog/ from the request path
    rewrite /blog/(.*) /$1  break;

    proxy_pass http://blog.site.com;
}


Enter fullscreen mode Exit fullscreen mode

This is the base config we can work with. Nginx basically catches any request made to /blog and fetches the contents from blog.site.com. A reverse proxy at its simplest form.

Problem 1: We're not using HTTPS

But as you can see, we're using HTTP and not HTTPS and that does not fulfill requirement number 4. So let's configure Nginx to use HTTPS.

Let's configure Nginx to use HTTPS & secure it using HTTP Basic Auth.



location /blog {
    # ...
    rewrite /blog/(.*) /$1  break;

    # SSL config for proxy
    proxy_ssl_server_name on;                   # 1
    proxy_ssl_session_reuse on;                 # 2
    proxy_set_header Host blog.site.com;        # 3
    proxy_set_header X-Forwarded-Proto https;   # 3
    proxy_set_header X-Forwarded-Port 443;      # 3
    proxy_set_header X-Real-IP $remote_addr;    # 3
    proxy_set_header X-Forwarded-Host $host;    # 3

    proxy_set_header Authorization "Basic {CREDENTIALS}"; #4

    # Proxy
    proxy_pass https://blog.site.com;
}


Enter fullscreen mode Exit fullscreen mode

Here's an explanation of the directives we added to our config:

  1. proxy_ssl_server_name on; Will force Nginx to use TLS SNI (Server Name Indication) which is required in this case because we are trying to serve two different websites with two different SSL certificates in one server using one IP address. With this directive, Nginx knows which SSL certificate to use. Note that Support for SNI was introduced in Nginx 1.7.0. So make sure you're using 1.7.0+.
  2. proxy_ssl_session_reuse on; Will re-use the previous negotiated connection to do an abbreviated SSL handshake which is better than doing a full handshake each time we try to connect, the latter is CPU intensive. This is a performance improvement.
  3. The proxy_set_header directive is used to set some required and informational headers to be sent along with the request to blog.site.com. Eg: setting the correct Host header.
  4. Here we set the Authorization header to a Basic (HTTP Basic Auth). The username/password need to be configured at blog.site.com level (if you're using managed WP hosting, they'll definitely have an HTTP Basic Auth section somewhere in the site config). CREDENTIALS is just a placeholder, replace it with a base64 encoding of username:password.

Our Nginx config is now ready, and it will start serving the blog traffic from blog.site.com as requested.

Problem 2: Incorrect links in blog pages

We're now faced with another problem. The WP blog in blog.site.com will have all page links point to blog.site.com/page-url exposes our managed blog and breaks our requirements.

That means that when a visitor first opens our blog, everything will look good, but whenever the visitor clicks on any link on the page, they'll be redirected to blog.site.com/page-url. Definitely not what we want.

Fortunately, Nginx can help us with that as well. The solution is basically to use Nginx's ngx_http_sub_module which will help us modify the response from blog.site.com by replacing string occurrences with other ones, before sending it to the visitor. Let's see how we might do that by adding to our previous Nginx config



location /blog {
    # ...
    rewrite /blog/(.*) /$1  break;

    # SSL config for proxy
    proxy_ssl_server_name on;                   # 1
    proxy_ssl_session_reuse on;                 # 2
    proxy_set_header Host blog.site.com;        # 3
    proxy_set_header X-Forwarded-Proto https;   # 3
    proxy_set_header X-Forwarded-Port 443;      # 3
    proxy_set_header X-Real-IP $remote_addr;    # 3
    proxy_set_header X-Forwarded-Host $host;    # 3

    proxy_set_header Authorization "Basic {CREDENTIALS}"; #4

    # 5
    proxy_set_header Accept-Encoding "";

    # 6
    sub_filter_once off;
    sub_filter_last_modified on;
    sub_filter_types text/html text/css text/xml text/javascript application/json;
    sub_filter 'blog.site.com' 'site.com/blog';

    # 7
    sub_filter 'src="/wp-content/' 'src="/blog/wp-content/';

    # 8
    sub_filter 'http:' 'https:';

    # Proxy
    proxy_pass https://blog.site.com;
}


Enter fullscreen mode Exit fullscreen mode

Let's explain what we added there:

  1. Disable response compression which is required to be able to change the response.
  2. We substitute every occurrence of blog.site.com in the response with site.com/blog on all HTML, CSS, JS, XML & JSON response types.
  3. We prefix absolute asset URLs sent by WP (/wp-content/) with /blog/
  4. We just replace all insecure links with secure ones so that we don't get any browser errors, since we are using HTTPS on our main site and also between site.com & blog.site.com.

Problem 3: Incorrect blog redirects

The last problem we have is a tricky one to solve. The only good (I say good here because I probably could've used if directive, but we know that it causes problems when used in a location block) solution I found to work is kind of a hack, so if you have better ideas please let me know in the comments.

The problem is, whenever the blog actually returns a redirect response, the link in the Location header will be incorrect in some cases. Sometimes the blog returns relative URLs without https:// (eg: /faq-page), and sometimes it returns absolute URLs that are complete and start with https://. We only need to replace Location header links if they relative links and not absolute ones.

With the help of Nginx's ngx_http_map_module and the map directive (think of it as a simple switch/case statement), we can create a dynamic variable (Its value depends on other values/variables) that will be hold the prefix that we need to add to the Location header link.

Add this section before the server block of your Nginx config:



map $upstream_http_location $_upstream_http_location_prefix { # 1
  default $upstream_http_location; # 2
  "~^/"                 "/blog";   # 3
  "~*^http"             "";        # 4
  "~*^((?!http|\/).)*"  "/blog/";  # 5
}


Enter fullscreen mode Exit fullscreen mode

Let's explain what each line does:

  1. The first line uses the map directive to create a dynamic variable called $_upstream_http_location_prefix whose value depend on $upstream_http_location variable. The latter holds the Location value sent by the upstream (our blog.site.com).
  2. The default value of the variable shall be the Location header value itself. This is just to be on the safe side, although this is probably never going to happen because our case statements are quite mutually exclusive.
  3. If the Location header value starts with /, then the prefix will be /blog.
  4. If the Location header value starts with http then the prefix will be empty.
  5. If Location header value neither starts with / nor http then the prefix will be /blog.

Now that we have our Location header prefix ready, we can use it in out location /blog block as so.



location /blog {
    # ...

    proxy_hide_header Location; # 1
    add_header Location "$_upstream_http_location_prefix$upstream_http_location"; # 2

    # Proxy
    proxy_pass https://blog.site.com;
}


Enter fullscreen mode Exit fullscreen mode
  1. We first hide the original Location header sent to us by the upstream (our blog).
  2. Add a new Location header whose value is OUR_CALCULATED_PREFIX + ORIGINAL VALUE and by doing that we effectively re-wrote the header value depending on what it starts with.

Problem 4: Visitors can access WP admin login via /blog

I really didn't want the admin login page to be accessible via site.com/blog so I think we'll be better off if we just hide it completely. Anyone who's interested in logging in to WP admin site need to go to blog.site.com, login using HTTP Basic Auth then login using his/her WP account credentials.

We can easily do that by adding a couple of location blocks



# Return the blog's 404 page when accessing WP login/admin
location /blog/wp-admin { return 301 /blog/404; }
location /blog/wp-login.php { return 301 /blog/404; }

# Prevent the blog's robots.txt from being proxied
location /blog/robots.txt { return 404; }

location /blog {
    # ...
}


Enter fullscreen mode Exit fullscreen mode

Final Nginx config

Here's the full Nginx config if you want to copy and paste the whole thing.

Before the server block, add this:



map $upstream_http_location $_upstream_http_location_prefix { # 1
  default $upstream_http_location; # 2
  "~^/"                 "/blog";   # 3
  "~*^http"             "";        # 4
  "~*^((?!http|\/).)*"  "/blog/";  # 5
}


Enter fullscreen mode Exit fullscreen mode

Then inside the server block of you site add this. Make sure to replace site.com with your website URL.



# Return the blog's 404 page when accessing WP login/admin
location /blog/wp-admin { return 301 /blog/404; }
location /blog/wp-login.php { return 301 /blog/404; }

# Prevent the blog's robots.txt from being proxied
location /blog/robots.txt { return 404; }


location /blog {
    # strip /blog/ from the request path
    rewrite /blog/(.*) /$1  break;

    # SSL config for proxy
    proxy_ssl_server_name on;                   # 1
    proxy_ssl_session_reuse on;                 # 2
    proxy_set_header Host blog.site.com;        # 3
    proxy_set_header X-Forwarded-Proto https;   # 3
    proxy_set_header X-Forwarded-Port 443;      # 3
    proxy_set_header X-Real-IP $remote_addr;    # 3
    proxy_set_header X-Forwarded-Host $host;    # 3

    # Set HTTP Basic auth for connecting to the blog.
    proxy_set_header Authorization "Basic {CREDENTIALS}"; #4

    # Correct Location header (if present).
    proxy_hide_header Location;
    add_header Location "$_upstream_http_location_prefix$upstream_http_location";

    # Disable response compression so we can change it.
    proxy_set_header Accept-Encoding "";

    # Change the response's links to correct ones.
    sub_filter_once off;
    sub_filter_last_modified on;
    sub_filter_types text/html text/css text/xml text/javascript application/json;
    sub_filter 'blog.site.com' 'site.com/blog';
    sub_filter 'src="/wp-content/' 'src="/blog/wp-content/';
    sub_filter 'http:' 'https:';

    # Proxy
    proxy_pass https://blog.site.com;
}


Enter fullscreen mode Exit fullscreen mode

This Nginx config is what something I worked on and is operational at time of writing this post. The blog being served by this Nginx reverse proxy averages 1.2+ million unique visitors per month, and is doing just fine. So I can safely say that this solution is well tested in the real world.

That's it, thanks for reading the article and make sure to drop a comment if you have any questions or feedback!

Top comments (2)

Collapse
 
cheonmux profile image
cheonmux • Edited

Hi, this method works for me. Thank you so much for the detailed guide.

I'm using this method to open a multilingual site and have small issues.

How do i prevent visitor access to blog.site.com?

Currently, my website is accessible and shows the same content as blog.site.com/aa and site.com/blog/aa.

And maybe bot crwaled both websites, will occur duplicate contents issues.

I simply added "return 301 https://$server_name/blog$request_uri;" to blog.site.com's serverblock, but it caused a redirect loop.

Do i need to set disallow: / in robots.txt in blog.site.com?

Collapse
 
breda profile image
Bouchaala Reda

Hi, sorry I just saw your comment. Glad this method helped you out!

Yeah, you'd want to restrict access to blog.site.com, via HTTP Basic Auth, so that people and bots can't access it.

Adding that entry you mentioned to your robots.txt will help prevent duplicate content, but won't keep visitors away which isn't great. You want to block both bots and people.

If you look at the 2nd code example, exactly in point number 4, I add a header called Authorization which adds the user and pass you set up in blog.site.com Basic Auth, when connecting to it.

Here's a guide to help add HTTP Basic Auth in Nginx: link

Hope this helps out, let me know if í can help with anything else!