Twitter will eat your URLs

My HTML periodic table has been getting a lot of attention on Twitter over the last few days. Because the page has a relatively short URL a lot of people have been tweeting the actual URL rather that using a URL shortening service. This has been good for me because shorteners remove the HTTP referrer and stop me from seeing where my Twitter traffic comes from.

A peek at my error logs did reveal one potential problem though. I’ve had well over a thousand hits to invalid URLs like http://joshduck.com/perio. These are obviously URLs which have run up against Twitter’s infamous 140 character limit and have been truncated. This results in wasted traffic for me and a waste of time for my visitors so I decided to push a quick fix.

I was already redirecting 404’s to a custom PHP page, so I added a check which redirects anyone who gets a 404 after accessing a truncated URL to the correct page. To stop similar problems from happening in the future I’ve added a short script that looks at static pages and blog posts on my site and tries to match them against any the requested URL when serving a 404 page. These URLs are then used to display a list of suggested links to the user. The script also does a little regex magic (read: hackery) to find the titles of the suggested links. Now URLs like http://joshduck.com/photo or http://joshduck.com/blog/201 will give visitors a push in the right direction.

The actual script I use is specific to the code I use for my own site, but I’ve supplied a generic version of the script below.

<?php
$request_path = $_SERVER['REQUEST_URI'];
 
// Special case: redirect anyone trying to get to periodic-table.html straight there.
if (strlen($request_path) > 2 && strpos('/periodic-table.html', $request_path) === 0) {
	header('Location: /periodic-table.html');
	die();
}
 
// Check static files.
$suggestions = array();
foreach (glob('/*.html') as $file) {
	$url = '/' . $file;
	if (strpos($url, $request_path) === 0) {
		$content = file_get_contents($file);
		if (preg_match('/<title>([^<]+)/i', $content, $matches)) {
			$title = $matches[1];
		} else {
			$title = $url;
		}
		$suggestions[$url] = $title;	
	}
}

5 Responses

  1. Great idea but a couple comments about the code,

    ~Why not have a hash of `special cases` in case you need to have more later?
    ~Index JSON/HTML instead of a glob?

  2. Nice idea, thanks for sharing this.

    If my memory doesn’t joke me, but isn’t this a feature already available on WordPress?

  3. We whipped up a similar sort of thing in a pre-rolled solution anyone can use:

    http://clean404.com/

  4. I’ve come across numerous 404 pages in the last few months…on major sites. Great idea.

  5. Good job, it will make my blog more life, because for now my 404 page was just a simple notification page that not giving more clue to anyone crashed over there rather than go back to the homepage.