From my old
Practical Unix & Internet Security book --
A Classic Case of Backup Horror --
Sometimes, the weakest link in the backup chain is the human responsible for making the backup. Even when everything is automated and requires little thought, things can go badly awry. The following was presented to one of the authors as a true story. The names and agency have been omitted for obvious reasons.
It seems that a government agency had hired a new night operator to do the backups of their Unix systems. The operator indicated that she had prior computer operations experience. Even if she hadn't, that was okay - little was needed in this job because the backup was largely the result of an automated script. All the operator had to do was log in at the terminal in the machine room located next to the tape cabinet, start up a command script, and follow the directions. The large disk array would then be backed up with the correct options.
All went fine for several months, until one morning, the system administrator met the operator leaving. She was asked how the job was going. "Fine", she replied. Then the system administrator asked if she needed some extra tapes to go with the tapes she was using every night - he noticed that the disks were getting nearer to full capacity as they approached the end of the fiscal year. He was met by a blank stare and the chilling reply, "What tapes?"
Further investigation revealed that the operator didn't know she was responsible for selecting tapes from the cabinet and mounting them. When she started the command file (using the Unix dump program), it would pause while mapping the sectors on disk that it needed to write to tape. She would wait a few minutes, see no message, and assume that the backup was proceeding. She would then retire to the lounge to read.
Meanwhile, the tape program would, after some time, begin prompting the operator to mount a tape and press the return key. No tape was forthcoming, however, and the mandatory security software installed on the system logged out the terminal and cleared the screen after 60 minutes of no typing. The operator would come back some hours later and see no error messages of any kind.
The panicked supervisor immediately started day-zero dumps of all the computer's disks. Fortunately, the system didn't crash during the process. Procedures were changed, and the operator was given more complete training.
Let this be a double reminder for everyone:
(1) Make regular backups and double-check everything.
(2) Don't trust a government bureaucrat to be even the slightest bit competent, at anything, ever.