LibraryToggle FramesPrintFeedback

A SAN file system must implement an efficient and reliable system of file locking to ensure that different computers cannot write to the same file at the same time. The shared file system master/slave failover pattern depends on a reliable file locking mechanism in order to function correctly.


OCFS2 is incompatible with this failover pattern, because mutex file locking from Java is not supported.


NFSv3 is incompatible with this failover pattern. In the event of an abnormal termination of a master broker, which is an NFSv3 client, the NFSv3 server does not time out the lock held by the client. This renders the Fuse Message Broker data directory inaccessible, because the slave broker cannot acquire the lock and therefore cannot start up. In this case, the only way to unblock the failover cluster in NFSv3 is to reboot all broker instances.

On the other hand, NFSv4 is compatible with this failover pattern, because its design includes timeouts for locks. When an NFSv4 client holding a lock terminates abnormally, the lock is automatically released after 30 seconds, allowing another NFSv4 client to grab the lock.

Figure 4.3 shows the initial state of a shared file system master/slave cluster. When all of the brokers in the cluster are started, one of them grabs the exclusive lock on the broker data file, thus becoming the master. All of the other brokers in the clusters remain slaves and pause while waiting for the exclusive lock to be freed up. Only the master starts its transport connectors, so all of the clients connect to it.

Figure 4.4 shows the state of the cluster after the original master has shut down or failed. As soon as the master gives up the lock (or after a suitable timeout, if the master crashes), the lock on the broker data file frees up and another broker in the cluster grabs the lock and gets promoted to master (broker2 in the figure).

After the clients lose their connection to the original master, they automatically try all of the other brokers listed in the failover URL. This enables them to find and connect to the new master.

Comments powered by Disqus