As we have learnt over recent times, disasters can take many shapes and forms. It is why the ability to reliably ‘failover’ is such a crucial component of disaster recovery when delivering critical software to an organisation. In stating this, it has also become clear not many, even within the IT fraternity, understand what failover is and why testing for it is so important. AMS recently completed its annual production failover for our AMS Pulse cloud platform, as outlined below.
Importance of failover testing for payroll software
30 Aug, 2021
What is failover?
According to Wikipedia, failover is:
"Switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network…Systems designers usually provide failover capability in servers, systems or networks requiring near-continuous availability and a high degree of reliability."
In even simpler terms – AMS has a back up system ready and waiting to automatically switch to, should the primary system fail for whatever reason. In New Zealand, failover has been required in occurrences of earthquakes, severe storms and many other ‘disaster’ situations. This is especially important when you are providing a critical service – such as payroll – to ensure no matter what happens you can process timesheets, rosters and pay.
In the midst of a disaster – an even bigger disaster would be your ‘failover’ failing. That is why it is absolutely critical to test failover capability annually to ensure your systems continue to perform, with zero disruption to customer’s operational environments. The reason this is top-of-mind is AMS recently tested its failover service for AMS Pulse – and came through with flying colours.
AMS failover testing
AMS disaster recovery (DR) testing is carried out by AMS support teams, across infrastructure and application support personnel, in accordance with the AMS Pulse Business Continuity Plan and the AMS Pulse DR Failover Guide.
During normal circumstances, all AMS production environments operate out of the primary hosting facility, located in Auckland. When undertaking failover testing, all of these production environments are systematically ‘failed over’ to a secondary hosting facility in a geographically separated location. These production environments then operate from the secondary hosting facility for a period of no less than five business days. This is fully transparent to our customers and any integrations from other systems to AMS Pulse also need to work within this environment.
The failover to the secondary hosting facility is undertaken outside of normal business hours to minimise disruption to the service. We only deem the test to be complete once all workload has been returned to the primary hosting facility. In rolling back to complete this year’s testing there was no data loss in all production instances, meeting the Operational Recovery Point Objective of one hour. Following this verification, the 2021 Annual Failover Test was deemed complete and successful.
AMS promotes disaster recovery capability
It is increasingly important more organisations become aware of the need for failover testing. There appears to be very little transparency around disaster recovery testing being completed and the subsequent results. If you are in the process of procuring cloud services, or reviewing internally hosted applications - disaster recovery should be on the list of questions you are asking.
You need to be sure the failover hasn’t just been a simulated test. To have peace of mind, it needs to have the live production environment failover and put through its paces – as it would, should Disaster Recovery actually be called for. This isn’t something you want to skimp on.
And it is not enough to just confirm disaster recovery testing has been completed. Vendors should be willing to confidentially share the outcomes of failover with customers and prospects to provide reassurance the stated outcomes did in fact occur. AMS is more than happy to share the results of its latest failover testing – all you need to do is ask.