We all know and understand the merits of a backup system... well that is, intellectually at least. We certainly claim to understand triple-redundancy of PLC, DCS and emergency shutdown hardware. But how many of us are caught unawares when we try to restore some sort of backup.
Technews, the publisher of SA Instrumentation & Control, has better backup systems than most. All file servers run in redundant array of inexpensive disks (RAID) configuration, thereby reducing risk of data loss by disk failure. Critical documents are also replicated across a wide area network (WAN). Then to reduce risk even further, the same documents are carried off-site to a third location every night.
Just when we thought nothing could go wrong go wrong go wrong, during the production of this issue of SAI&C the wheels fell off with grand aplomb!
First came a failure of part of the RAID array. Not a problem in itself. Just swap the faulty disk out with a new one and the RAID array reconstructs itself. But the service technician failed to ensure that all staff were logged off the network before bringing down the main file server. The net result was a corruption of data files. Not a problem in itself. Just restore either the WAN or off-site backup... or so we thought.
Technews uses an off-the-shelf piece of software to schedule the backups. The software contains a list of all the paths that must be backed up and the time that they must be backed up. To add to the overall security, a log is generated showing any problems that occurred during the backup. However, a bug in the software meant that if the backup user did not have access rights to a path, the log would not indicate that there had been a problem. As it so happened, the backup user did not have access rights to the folder in which all the corrupted data had been resident (a simple oversight). The result was that when we came to restore our missing data, all was missing... aaargh!
Whilst the context of this disaster might have been an 'office' environment, the principle also applies to industrial environments. When last did you ensure that all your 'offsite' PLC programs could actually be restored and installed onto a working PLC? Or do you know that you can actually still read those floppy or CD-ROM disks that contain your treasured backups? Or better still; do you even know if your backups still exist, or that they contain the correct version of the data?
Perhaps it is time for you to schedule 'backup retrieval testing' as a part of the next maintenance program. The result might be a considerably happier production team with far less downtime. Remember that no backup is a backup until it has been successfully restored.
Remember that back issues of SA Instrumentation & Control are available at our online website. Here you can run searches for past articles. Visit www.instrumentation.co.za and make use of this valuable resource. The SA Instrumentation & Control Buyers' Guide is also available free of charge online at www.ibg.co.za
Graeme Bell
Editorial director: SA Instrumentation & Control
© Technews Publishing (Pty) Ltd | All Rights Reserved