It was New Year’s Eve. Mike, George, and I would normally be out at a party, but because we were all sick and the weather sucked, we all were home resting. Around 8 PM I woke up from a nap and made my way over to my computer to read my feeds, when I noticed all of our sites were down. Not “down” per se, but all had a 500 Internal Server Error like the one below from the Pure Adapt site:
Now, I’ve seen this plenty of times, but never on ALL the sites at once. Usually it’s while I’m messing with some configuration in the .htaccess file and I immediately undo what I did and the problem is solved. Right away I called Mike and George and we confirmed that none of us did anything that would cause such a crazy error, particularly since we were all in bed sick when it happened.
I quickly opened a ticket with Liquid Web, the company who manages our server. They have an average response time of 10 minutes (guaranteed within 30 min) and as usual they did not disappoint. Within an hour of when I noticed the problem, we were back up and running for the most part. It took me a few more hours to work out all of the kinks, but as of today 1/1 I think everything’s basically back.
The odd part is, no one knows what happened. They told us our php configuration was severely corrupted. They reconfigured everything and it worked. I then had to double back and tweak a few things so everything worked OK with all of our sites. But I’ve been pressing them for an answer, and they are just as clueless as I am. We didn’t change anything, they didn’t change anything, and they don’t think it was hacked. They offered a few far off possibilities, but nothing that really explains it. As long as it doesn’t happen again, it’s water under the bridge. If it does, I’ll raise hell.
Speaking of raising hell, it turns out that the error was showing for a good 2 hours before I discovered it. Turns out my website monitoring set up didn’t work. Or, more specifically, Montastic hadn’t monitored our sites in 2 days! A far cry from the 10 minutes they claim on their site. So while Liquid Web was sorting everything out, I signed up for free accounts for SiteUptime, a service I used for years with SL, and immediately received the text message that both DI and TD were down (I configured it to forward emails to my phone as text messages just like I did on this tutorial).
In sum – we don’t really know what happened, it’s fixed now, and we have web monitoring that actually works moving forward. The odd part was how this coincided with us being home sick. Had we all been out drinking, this problem might have dragged on for much more than a few hours on New Years Eve, costing us much more than it did (which probably wasn’t much considering when it happened). Since we were home sick, we were able to take care of it. I guess if I had to be sick and the server had to have a problem, this was good timing. Chalk one up for “everything happens for a reason”.
And of course, this is just one more reminder that the life of a business owner isn’t always quite normal. If I was out, I would have had to leave. No mater how good of a time I was having. There are a lot of advantages to running your own business, but there are also some disadvantages. The buck stops with you. A lot of people couldn’t accept that or don’t factor that in when romanticizing the life of a business owner. Me, I’m OK with it. It comes with the territory. You can’t be the one reaping all of the rewards if you aren’t the one taking the risks.