Improved Site Search

After completing my site improvements recently, there was one thing that I realized I still wanted to fix.

If you searched the site – currently only available on the Archives page – you’d be brought to a Google site search. The reason for this is that in addition to the 816 WordPress posts that I’ve published, I have an additional 296 that were published via Blogger and were migrated from over to this site in 2007. It seemed as though the easiest way to search everything was a Google site search.

This didn’t work well in practice though. Not all posts are in Google’s index. And each time you search you’re brought over to Google, where Google gets a data point on you and you’re served an ad. Similar to how I removed the Facebook Like and Twitter Share buttons because reading this blog shouldn’t require you to give up data to those companies, I felt the same way about searching.

So I decided to create a search engine for my old Blogger posts and integrate that into the WordPress search results. It took a few hours, but it works pretty well. Here’s what I did:

  • Since those posts are static files, I wrote a PHP script to scan those directories and open each file.
  • I then used an Xpath query to extract the post title and post content.
  • I saved the post information (title, content, url) for each post to a single static JSON file. No database needed!
  • I wrote a search function that opens the JSON file, scans through each post for keyword matches, and returns the results of any matches in the order of keyword density. I used a simple formula: a keyword in the title counts 3x a keyword in the post.
  • I included up to 10 results on the bottom of the first page of the WordPress search results and styled the results to match.

This is really cool! If for no other reason than I’m able to find old content easier now. If you’d like to give it a quick test, check out the results for “Detailed Image”. The Blogger results are under the “From The Blogger Archive (2005 – 2007)” heading.