February 14, 2010

Restarting MySQL master-master replication

Posted in Software at 21:24 by graham

If your MySQL (5.0+) replication is broken, there’s two ways to fix it: The easy way, and the right way.

Run commands starting with $ on Unix. Run commands starting with mysql> in the MySQL client.

The easy way: Skip the problem

If you hit both databases at the same time, with the same INSERT, they will create their own record, and try and replicate to the other, which already has that record, causing a duplicate error.

In a simple case like that, you just want to skip the offending statement:

mysql>SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;

More details on skipping MySQL duplicate errors

Most of the time, you skip one statement, and replication breaks again straight away, because there’s a whole queue of problem statements coming up.

The right way: Rebuild

If you are not sure that you can skip the duplicate, or if replication has been broken long enough that your two servers are out of synch, pick one database to be the master, and rebuild the other from a copy of that master.

First make sure your site is only using the master server. Stop any processes that might modify data on the server you need to rebuild.

We have two database servers:

  • Good Server: The good one, with the correct data.
  • Rebuilding Server: The one we are fixing. All it’s data will be erased with the Good Server data.

You’ll need a unix command line and SQL command line on both servers. If you don’t already use it, after you’ve got this fixed, I recommend checking out Screen.

1. On the Good Server

Dump data from the Good Server. The master-data switch adds a statement at the end of the file to start replication.

The quick switch makes dumping large tables use a lot less memory, so on a VPS it’s much faster.

All tables will locked during the dump.

Replace ‘my_database’ with your database name.

$ mysqldump --add-drop-table --master-data
    --quick -u root -p my_database > my_database.sql
$ bzip2 my_database.sql

2. On the Rebuilding Server

Copy the dump onto the Rebuilding Server. Replace ‘myuser’, ‘good-server’ and ‘my_database’ as appropriate.

$ scp myuser@good-server:my_database.sql.bz2 .
$ bunzip my_database.sql

Load the dump. This can take a few minutes for a large database.

mysql> stop slave;
$ mysql -u root -p my_database < my_database.sql
mysql> show slave status\G

You should see Slave_IO_Running: Yes and Slave_SQL_Running: Yes. The master-data switch to mysqldump, in step 1, started replication at the right place for us. How nice. I love MySQL.

The \G means show vertical instead of the usual horizontal. It works with any MySQL command.

Now you have statements flowing Good Server –> Rebuilding Server. Next we need to get data going the other way.

mysql> flush tables with read lock;
mysql> show master status;

Make a note of the File and Position rows.

3. On the Good Server

Set the slave here to be in synch with Rebuilding Server. Use the file name and log position from the previous step.

mysql> stop slave;
mysql> change master to master_log_file='mysql-bin.000044',
master_log_pos=132059667;
mysql> start slave;
mysql> show slave status\G

4. On the Rebuilding Server

mysql> unlock tables;

Start the processes that point to the Rebuilding Server’s database. Done!

Monitor replication

The database is done, but you’re not done yet.

If you don’t already have good monitoring, take 15 minutes to setup Monit on all your production servers.

Monit is easy to setup (a world away from the monster that is Nagios), and will save again and again.

Once you have Monit working, take another 15 mins to monitor MySQL replication with Monit.

Now you’re done :-)

11 Comments »

  1. Richard Sims said,

    January 16, 2013 at 15:07

    You don’t have to laboriously go to the good server, do a mysqldump, create a huge file, and somehow get that over to the problem server, and load it there. It is often overlooked that the MySQL commands in general are client-server. This is to say that the commands are not restricted to operating on what’s local. Exploiting this functionality allows you to drastically improve database content transfer. Here’s the prototype for that:

    mysqldump -h $master_host –single-transaction –flush-logs –master-data $dbname | mysql $dbname”;

    Thus, on the receiving system, you invoke mysqldump to operate on a remote system, where the client software pulls that data from the remote server to feed it to local command ‘mysql’, to restore the database to the local store.

  2. neil said,

    October 5, 2012 at 16:04

    Hello Graham,

    I followed the procedure, however I get duplicate key errors when starting slave on the ‘rebuilding server’.

    I found on the web suggestions to set in the my.cnf an ignore switch for the duplicate error, however I’m concerned that may lead to other problems.

    Any suggestions as to what the best course of action would be to resolve this whilst preserving data consistency between servers?

  3. cwiggler said,

    May 28, 2012 at 06:59

    Hi Graham, Thanks for the great guide on how to rebuild the DB on a nonworking replication of MySQL.

    I have question that I would like to clarify regarding the procedures. If I have a stored procedures and functions. do I need to dump it and restored it on the rebuilding server or only data should be enough?

    my setup is master-master. thanks

  4. graham said,

    February 2, 2012 at 04:15

    @Callum: Fixed it, thanks!

  5. Callum said,

    January 31, 2012 at 14:55

    Thanks for the article, and particularly the link to replication monitoring with monit. I spotted what I think is a typo in your post, I think you mean “a world away” rather than “a word away” here:

    Monit is easy to setup (a word away from the monster that is Nagios), and will save again and again.
  6. Graham King said,

    September 21, 2011 at 19:27

    @Azz That only avoids conflicts on the primary key. If you’re writing to both masters, any other unique keys you have could still conflict. I’m avoiding master-master, and only using master-slave. See my reply to Antony here: http://www.darkcoding.net/software/restarting-mysql-master-master-replication/#comment-18558

  7. Azz said,

    September 21, 2011 at 16:51

    For master-master set up that avoids index conflicts, see:

    http://stackoverflow.com/questions/325791/which-is-the-best-way-to-bi-directionally-synchronize-dynamic-data-in-real-time-u/325850#325850

  8. Dominic O’Brien said,

    April 20, 2011 at 23:56

    The amount of times I have come to your site as THE reference on how to do it. I got about 3 print outs of your web site floating around in different places just in case I have no connection when I need to remember how its done. Man you saved my bacon a number of times. A big thank you. This can be a really confusing subject and this makes it comprehensible and doable.

  9. pradeep said,

    October 22, 2010 at 09:04

    Hi! i am new in mysql and trying to establish master master replication in mysql

    on first server i have added these lines to mysqld section of my.cnf(mysql Ver 14.12 Distrib 5.0.77, for redhat-linux-gnu (i686) using readline 5.1)

    datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-bin=/var/lib/mysql/mysql-bin.log binlog-do-db=sample1 binlog-ignore-db=mysql binlog-ignore-db=information_schema binlog-ignore-db=test2 binlog-ignore-db=sample3 binlog-ignore-db=example3 binlog-ignore-db=endpoints binlog-ignore-db=meetme binlog-ignore-db=test binlog-ignore-db=sample2

    server-id=1 master-host = 192.xxx.x.xxx master-user = abc master-password = abc_pass master-connect-retry = 60

    relay-log = /var/lib/mysql/slave-relay.log relay-log-index = /var/lib/mysql/slave-relay-log.index

    On second server i have added these lines to mysqld section of my.cnf(mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (i486) using readline 6.1)

    server-id=2

    master-host = 192.xxx.x.xxx master-user = xyz master-password = xyz_pass master-connect-retry = 60

    relay-log = /var/lib/mysql/slave-relay.log relay-log-index = /var/lib/mysql/slave-relay-log.index

    log-bin = /var/lib/mysql/mysql-bin.log binlog-do-db = sample1 binlog-ignore-db=mysql binlog-ignore-db=information_schema binlog-ignore-db=phpmyadmin

    on first server slave status is

    *************************** 1. row *************************** Slave_IO_State: Connecting to master Master_Host: 192.xxx.x.xx Master_User: abc Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 36158 Relay_Log_File: slave-relay.000001 Relay_Log_Pos: 98 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 36158 Relay_Log_Space: 98 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL 1 row in set (0.00 sec)

    ERROR: No query specified

    On second server slave status is

    *************************** 1. row *************************** Slave_IO_State: Connecting to master Master_Host: 192.xxx.x.xxx Master_User: xyz Master_Port: 3306 Connect_Retry: 60 Master_Log_File: Read_Master_Log_Pos: 4 Relay_Log_File: slave-relay.000003 Relay_Log_Pos: 4 Relay_Master_Log_File: Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 5 Exec_Master_Log_Pos: 0 Relay_Log_Space: 106 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 1045 Last_IO_Error: error connecting to master ‘abc@192.xxx.x.xxx:3306′ – retry-time: 60 retries: 86400 Last_SQL_Errno: 0 Last_SQL_Error: 1 row in set (0.00 sec)

    ERROR: No query specified

    Do’nt know why Slave_IO_Running is no.

    i have tried a lot to start replication but i am still unable to start replication. i have also commented

    bind-address = 127.0.0.1

    and

    skip-external-locking

    i need your help. Please send your suggestions and solution for this problem.

    Thanks in advance.

    regards, pradeep

  10. Graham King said,

    July 5, 2010 at 05:07

    @Antony:

    To Nagios, I say only this: http://identi.ca/notice/10418281

    Glad my blog entry helped. If you are currently using master-master replication, I really strongly urge you to stop. Switch to a single master and multiple slaves.

    I found that if you are writing to both masters, with a decent amount of traffic, and have any unique indexes, you inevitably get conflict, which stops replication, and the data on the two servers gets rapidly inconsistent. It’s a nightmare to repair. I got it several times, faster and faster as traffic grew.

    I searched and searched but could not find anyone out there using master-master replication and writing to both masters. Most people seem to use it as a hot standby. In fact most people seem to have one master, which takes the writes, and lots of slaves, which take the reads.

  11. Antony Sohal said,

    June 17, 2010 at 16:12

    This info saved me loads of time. Thanks.

    Regarding good monitoring, have you tried Nagios?

Leave a Comment

Note: Your comment will only appear on the site once I approve it manually. This can take a day or two. Thanks for taking the time to comment.