| • Science | • People | • Locations | • Timeline |
| Contents | ||
Safety engineers distinguish different extents of defective operation: A "fault" is said to occur when some piece of equipment does not operate as designed. A "failure" only occurs if a human being (other than a repair person) has to cope with the situation. A "critical" failure endangers one or a few people. A "catastrophic" failure endangers, harms or kills a significant number of people.
Safety engineers also identify different modes of safe operation: A " probabilistically safe" system has no single point of failure, and enough redundant sensors, computers and effectors so that it is very unlikely to cause harm (usually "very unlikely" means less than one human life lost in a billion hours of operation). An "inherently safe" system is a clever mechanical arrangement that cannot be made to cause harm- obviously the best arrangement, but this is not always possible. For example, "inherently safe" airplanes are not possible. A "fail-safe" system is one that cannot cause harm when it fails. A "fault-tolerant" system can continue to operate with faults, though its operation may be degraded in some fashion.
These terms combine to describe the safety needed by systems: For example, most biomedical equipment is only "critical," and often another identical piece of equipment is nearby, so it can be merely "probabilistically fail-safe". Train signals can cause "catastrophic" accidents (imagine chemical releases from tank-cars) and are usually "inherently safe". Aircraft "failures" are "catastrophic" (at least for their passengers and crew,) so aircraft are usually "probabilistically fault-tolerant". Without any safety features, nuclear reactors might have "catastrophic failures", so real nuclear reactors are required to be at least "probabilistically fail-safe", and some pebble bed reactors are "inherently fault-tolerant".
Ideally, safety-engineers take an early design of a system, analyze it to find what faults can occur, and then propose changes to make the system more safe. In an early design stage, often a fail-safe system can be made acceptably safe with a few sensors and some software to read them. Probabilitically fault-tolerant systems can often be made by using more, but smaller and less-expensive pieces of equipment.
Historically, many organizations viewed "safety engineering" as a process to produce documentation to gain regulatory approval, rather than a real asset to the engineering process. These same organizations have often made their views into a self-fulfilling prophecyA self-fulfilling prophecy is a prediction that, in being made, actually causes itself to become true. For example, in the stock market, if it is widely believed that a crash is imminent, this may reduce confidence and actually cause such a crash. Or, if by assigning less-able personnel to safety engineering.
Far too often, rather than actually helping with the design, safety engineers are assigned to prove that an existing, completed design is safe. If a competent safety engineer then discovers significant safety problems late in the design process, correcting them can be very expensive. This project management error has wasted large sums of money in the development of commercial nuclear reactors.
The two most common fault modeling techniques are called "failure modes and effects analysis" and "fault tree analysis." These techniques are just ways of finding problems and of making plans to cope with failures.
In the technique known as "failure modes and effects analysis", an engineer starts with a block diagram of a system. The engineer then considers what happens if each block of the diagram fails. The engineer than draws up a table in which failures are paired with their effects and an evaluation of the effects. The design of the system is then corrected, and the table adjusted until the system is not known to have unacceptable problems. Of course, the engineers may make mistakes. It's very helpful to have several engineers review the failure modes and effects analysis.