This weekend I had some major problems with my web server. Here is a brief log of events.
Saturday 07/16/2006
- 3:00pm: I try to go to one of my sites, "can not connect to database". It seems that all of the connections are in use. Weird. I check the server… 500 open db connections? I reboot the mysql server, things seem like they are working.
- 4pm – 5pm: I go out to eat with my friends. Using my phone, and a ssh client, I check the database server, things seem to be ok. (damn im a nerd! im checking my linux database sever via ssh on a cell phone!)
- 6pm – 7pm: mysql server still running. I look at the message board on notopular.com, "database error". After doing some research, I realize that the harddrive that the databases sit on is 100% full. Not good. Looking at the files, there are a ton of error logs that have been generated. I clear up some room on the drive, reboot the sql server. Sweet now I can write to the DB again.
- 8pm – 10pm: Looking at my message board, "Cannot connect to DB", some more research, I realize that some of the tables for the forums application are corrupt. great. I break out the good ol' "myisamchk -r" command and get to work. A bunch of the indexes are totally hosed. After everything is done running, I can connect and view the database, I back everything up.
- 12 midnight – 4am: I check the message boards again. I can view everything, but I cant post to the forums! the table that contains the message text (a HUGE table, 670,000+ records) has something wrong with it, I cant write to it. Again i break out "myisamchk -r" but i get an error. "Please use the -o flag, -r can not repair this table". CRAP! I need this table! its the guts of the message boards!. I know that the -o flag is going to take a while, so I bring down the apache webserver to gain some CPU power, and then run "myisamchk -o" on the table. I figure out how long its going to take… so I reniced the process to turbo charge it, and cut down on the time… by cutting down on the time, I mean its only going to take 3 hours to run!
- 4am – 5am: The rebuild of the table indexes has worked! All of the records were salvaged! I bring the webserver back up and start checking. Everything is looking good. Back up the database locally and offsite.
- 5:30am: the sun is coming up. im going to bed.
All in the night of a developer/webmaster/systems administrator