Twitter will eat your URLs
My HTML periodic table has been getting a lot of attention on Twitter over the last few days. Because the page has a relatively short URL a lot of people have been tweeting the actual URL rather that using a URL shortening service. This has been good for me because shorteners remove the HTTP referrer and stop me from seeing where my Twitter traffic comes from.
A peek at my error logs did reveal one potential problem though. I've had well over a thousand hits to invalid URLs like http://joshduck.com/perio. These are obviously URLs which have run up against Twitter's infamous 140 character limit and have been truncated. This results in wasted traffic for me and a waste of time for my visitors so I decided to push a quick fix.
I was already redirecting 404's to a custom PHP page, so I added a check which redirects anyone who gets a 404 after accessing a truncated URL to the correct page. To stop similar problems from happening in the future I've added a short script that looks at static pages and blog posts on my site and tries to match them against any the requested URL when serving a 404 page. These URLs are then used to display a list of suggested links to the user. The script also does a little regex magic (read: hackery) to find the titles of the suggested links. Now URLs like http://joshduck.com/photo or http://joshduck.com/blog/201 will give visitors a push in the right direction.
The actual script I use is specific to the code I use for my own site, but I've supplied a generic version of the script below.
<?php
$request_path = $_SERVER['REQUEST_URI'];
// Special case: redirect anyone trying to get to periodic-table.html straight there.
if (strlen($request_path) > 2 && strpos('/periodic-table.html', $request_path) === 0) {
header('Location: /periodic-table.html');
die();
}
// Check static files.
$suggestions = array();
foreach (glob('/*.html') as $file) {
$url = '/' . $file;
if (strpos($url, $request_path) === 0) {
$content = file_get_contents($file);
if (preg_match('/<title>([^<]+)/i', $content, $matches)) {
$title = $matches[1];
} else {
$title = $url;
}
$suggestions[$url] = $title;
}
}