Recovering from strange mysql crash

So, yesterday my server started emailing some cron errors over. One particular script that runs every hour was throwing an error about 10 seconds in to running that appeared to be the server hanging up the query. I thought I may have set some timeout too low such that the server wouldn’t allow queries longer than 10 seconds (as this is a web server then nothing should take that long apart from a few analytics scripts that run overnight). Running the query by hand showed the same problem, so I started doing some analysis of the component parts to see which was taking so long. Then, I looked at the timeouts set and noticed:

mysql> show status like "%time%";
| Variable_name              | Value |
| Uptime                     | 105   |
| Uptime_since_flush_status  | 105   |

Hmmm that looks bad. Looking in dmesg confirms that mysql has not been hanging up – it’s actually been crashing!

[504510.549172] init: mysql main process (15317) terminated with status 1
[504510.549186] init: mysql main process ended, respawning

I ran mysqlcheck on the database in question and the server crashed again, even though I was able to query the table fine and even added an index before realizing that there was some issue with it. So, rather than restore from a backup I thought I’ll just clone the table and replace the existing one with it:

mysql> create table t like client_songs;
mysql> insert t select * from client_songs;
mysql> check table t;

New table looks fine, lets do a final update (as it’s continually being inserted into)

mysql> insert t select * from client_songs cs where client_song_ts >= ( select max(client_song_ts) from t ) on duplicate key update select_count= cs.select_count, download_count = cs.download_count, rating = cs.rating, client_song_ts = cs.client_song_ts, print_count = cs.print_count;

Then put it live:

mysql> drop table client_songs;
mysql> rename table t to client_songs;
mysql> check table client_songs;

Everything working again. I wish mysql (5.5.37 from ubuntu 14.04 LTS) was more reliable that’s why I tend to use postgres for new projects these days. It’s really strange that the table could be read fine but one particular query caused it to crash – probably a case of the particular index that was being for the query being corrupted but not the row-data.

Leave a Reply

Your email address will not be published. Required fields are marked *