Communications of the ACM had a fascinating post about how NASA built Artemis II’s fault tolerant computer. 3 fascinating excerpts. (1) Eight modules with several back up scenarios: “Orion utilizes two Vehicle Management Computers, each containing two Flight Control Modules, for a total of four FCMs. But the redundancy goes even deeper: each FCM consists of a self-checking pair of processors. Effectively, eight CPUs run the flight software in parallel. The engineering philosophy hinges on a “fail-silent” design. The self-checking pairs ensure that if a CPU performs an erroneous calculation due to a radiation event, the error is detected immediately and the system responds. “We can lose three FCMs in 22 seconds and still ride through safely on the last FCM,” said Uitenbroek. A silenced FCM doesn’t become dead weight, however; the system is designed to reset, re-synchronize its state with the operating modules, and re-join the group mid-flight. (2) Multiple redundancies with…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.