Description of Possible Failures During Database Mirroring
Overview
SQL Server is relational database management system, which is discovered for an enterprise. Dual copies of a database is present on several computers known as server instance. Failures in SQL server could occur in a physical manner, defect in operating system, or the server problem.
This management of the server doesn’t continuously examine the components on which working of Sqlserver.exe file is dependent. With the help of database mirroring, one can verify whether the components are functioning properly or it have failed.
There can be two types of errors due to which, failures occurs during database mirroring:
Hard Error: Such types of errors in which problem is caused due to defect in physical part of machine or the server is known as hard error. These physical parts can be machine’s hardware failure, problem with operating system, disk storage space limit exceeds, etc. However, it is being noticed that such type of error can not be recovered.
Soft Error: It is the type of error, which occurs due to some problem in the application itself. Moreover, this error type can be resolved either by any manual tricks or any third-party utility.
The performance of database mirroring session i.e., Error detection speed & reaction time, is dependent on whether the error is hard or soft. Some of the problems occur due to network failure, delay in reporting time, etc. When dealing with soft errors, the error detection speed is determined by mirroring length time-out period, which is 10 seconds (by default)
List of Possible Failures During Database Mirroring which Occurs Due to Hard Error
If the possible failures during database mirroring occurs due to hard error, then the following symptoms will occur:
- Automatic modification in firewall settings
- Suddenly connection is broken or closed
- Drive Log where transactions are performed
- Machine operating system or process failure
When a log drive within primary database becomes unresponsive and gets failed, the operating system will generate an error message that sqlserver.exe file is not responsive and is unable access the server. Some physical elements like network devices, IO subsystem, etc., have there own expiry date (time-out) that causes failure. Such dates are independent of the server, which does not has any information about them and hence, unaware of system behaviour. In such cases, the delay increases the time between a failure and when RDBMS encounters a hard error.
To troubleshoot these hard errors, one will have to contact the service (or network) provider because they are well-known about error messages that are sent to a port at the time when following consequences occurs on a TCP connection:
- Domain name system is not working
- The wires or cables are unplugged
- Restrictions on firewall network
- A Windows-based server is renamed/rebooted
List of Possible Failures Which Occurs Due to Soft Error
There are several situations that causes mirroring time-outs and some of them are mentioned below:
- Improper configuration of network devices on your machine
- A hanging/crashed operating system, server or database state
- The time of the Windows server is out or delayed
- Insufficient amount of computer resources
NOTE: In case of insufficient memory space, one must increase the time-out/delay period, reduce the temporary files, or change the machine hardware for handling the workload.
An occurrence of soft error could cause a server instance to wait for an indefinite time period because these errors can not be detected directly from the server instance. Therefore, to prevent such waiting situation, the database mirroring carries out its own time-out mechanism that is based on each server instance of a session.
In order to open a connection, the server instance should be notified about the time-out period and the overall time required to send another notification. Encountering a notification while time-out duration indicates that the connection is active & secondary servers are communicating over it. After getting a notification, the server instance resets time-out counter on current connection.
If the server instance is not notified during time-out period, then it considers that the connection time is out. Hence, secondary server terminates the time-out connection & tackle the current event, based on state and session operating mode.
In addition, if the mode is high, then the time-out period will be 10 seconds. This value is enough for avoiding the false failures and in this high mode, the default time-out period value will be 10 seconds (you can also change this value). Moreover, to avoid such failures, it is recommended to keep this value 10 seconds or more.
TIP: For high-safety mode, you can change the value of time-out period with help following command:
ALTER DATABASE SET PARTNER TIMEOUT
To view the current time-out value, run the following command:
mirroring_connection_timeout in sys.database_mirroring
Conclusion
The blog focuses upon all the possible failures during database mirroring. Moreover, it also gives an idea to learn how to get rid off such false failures.