The Bad News: The server crashed hard early Friday morning, and it appears to have been scrambled in the process.
The Good News: Data recovery efforts have been proceeding well, and I am working on getting all the data hosted somewhere so everyone can retrieve their journals.
The Full Story:
On Friday the 16th, sometime between midnight and 3 A.M., we had a
power surge that caused the circuit breaker to trip, and we lost all
power. The 10-minute battery on the UPS quickly drained, and I didn't
hear the alarm from the bedroom. When I booted the server the next
morning, it quickly became obvious that it had been scrambled.
The server boots, but it says that it's unable to determine its domain name, and is using 127.0.0.1 instead. When I log in locally, everything goes slowly, and any network-related commands time out. trying to read man files gives a screen full of garbage, and pressing the arrow keys in any program results in error beeps and on-screen garbage.
Now, the problem is that I am not a sysadmin. I barely have a clue what to do when the server is running fine, and this is pretty far above my skill level. Common consensus among my geek friends seems to be that the server will need to be reinstalled, which I am afraid of, because I don't think I'll be able to get it back to the way it was configured before.
Further complicating things is the fact that none of the data had been backed up for several months. Frustratingly enough, researching how to set up a cron job to do daily and weekly backups was on my to-do list, and was probably going to get done in the next week or two. And since I don't know if reinstalling or upgrading the OS will wipe out files, my first goal had to be data recovery.
That at least seems to have gone smoothly. I was able to run mysqldump to get a dump file of all the databases, and I was able to tar-gzip all the web directories, home directories and mail successfully. Then I was able to plug in a windows-formatted hard drive and mount it to recover the files (I had to do it that way because I had no network access). Those recovered files are now safe on two separate computers, and I'll be working on extracting user data into usable format and finding a place to host it over the next week.
I purchased some hosting to at least get a status message up for you all so you don't think that I'm ignoring the problem or anything.
What's next? Well, I'll be taking it slowly trying to upgrade or reinstall FreeBSD on the server. If I can get it working again, then I'll put it back up as it was, and KillingMachines and all the other sites will return shortly.
If I can't get it working again, then I'm probably still going to try to try for about a month to get some form of server set back up that can run KMorg. But if I can't get anything set up, then KMorg is probably going to be shut down. The site was built around the fact that it was running on its own server and had access to update the primary DNS records every 15 minutes, which is not something I'll be able to simulate on someone else's server.
I want to reassure you of two things:
However, I have to warn you all that this server has been a pain in my ass since the day that Steve abandoned it, and left me in charge of all the services that the server was providing. This is not a job I want, nor is it a job that I'm qualified for. While I will do everything in my power to put KMorg back up and safeguard people's data, everyone needs to remember that I'm having to do all of this in my spare time after working my full-time job. I'm not asking for sympathy, just understanding if it doesn't all happen overnight.
Thank you for your patience. I'll let you know through this page if anything changes here.
-- Scott Vandehey