All Failures are Human in Origin, Part 2

All Failures are Human in Origin, Part 2

Ok, so the splitters have provided some feedback on my “All Failures are Human in Origin” blog. (Side note: I had a biology prof who said there were “lumpers” and “splitters” — those who want fewer categories, and those who want more.) Their issue is that there are difficulties which can cause failure in a design that are outside the control of the designer.

Sure. The original post, a bit of engineering philosophy, is intended to be a mental exercise for the student. It’s a starting point for an engineering ethos. It is not intended to be a “be all end all” statement of engineering, and indeed our friend Godel would argue that it can’t be…

Titan Launch Failure

The problem I have is that when a hacker/maker/designer/engineer starts blaming outside forces for their design failures, it becomes easy to blame all failures on outside forces. Do you have any idea how many times I’ve heard an engineer, while developing firmware, say “the chip must be bad”. “Does the ICD connect to it?” “Yes.” “Is the oscillator running?” “Yes.” “Is it drawing excessive current?” “No.” “Why do you think it’s bad?” “Because my code won’t run.” Arg. What.The.Fuck.Ever.

Pony, Please

But, to make the splitters a little bit happier, let’s delve in to so cases where my little axiom about all failures of your design being your fault doesn’t apply, or at least doesn’t apply thoroughly.

First that statement assumes you have perfect information about all materials (no manufacturer ever makes a mistake in silicon, or has incorrect datasheets, right?), and infinite amount of time, and an infinite budget. Obviously none of these are ever true.

As a design becomes more complicated, the number of failure modes goes exponential. It becomes physically impossible to test and verify ever aspect of the design. And testing and simulation of subsystems is exactly that — testing and simulation of parts. It’s not the same thing as testing the whole system in intended use. Here’s a case in point, the F-14 jet fighter.

Check this out, from “Tales of the F-14”, over at Air & Space Magazine:

The F-14 had an all-titanium hydraulic system with an 84-gallon-per-minute pump on each engine with no accumulators, all in the interest of saving weight. Each pump had nine pistons, which were varied in output by a swash plate. As it turned out, each time one of the nine pistons did its thing, it sent a 200-300-pounds-per-square-inch pulse down the basic 3,000-psi system. Apparently, without accumulators to dampen the pulses, a resonance occurred which fatigued the lines. Engineering duplicated the failure on a full-scale mockup of the system in 1.2 minutes at just the right pump RPM. When the line was changed to stainless steel, the line failed in 23 minutes. The answer was not material, but proper forming and clamping of the line to prevent resonance. The second F-14 did not make its first flight until May 24, 1971. There were no hydraulic problems again on the F-14 program.

Hrm. Ok, resonance in the hydraulic lines. Maybe that falls in to a “we shoulda thought of that”, and I bet those guys do now. But as a first or second order design criteria? Well, maybe not.

Admittedly I’ve taken Carroll Smith’s discussion of metal fatigue and statement “all component failures” to an extreme by applying this to any design.

Simple designs or components, with few failure modes, are much easier to verify and design for reliability than are complex components or systems. Interrupt latency in an 8-bit microcontroller embedded system better be spot on. Engine connecting rods better not fail at half the intended service life. Predicting up time on a Microsoft based personal computer running arbitrary applications? Ok, you got me there.

So let’s expand the FirmWarez engineering ethos a little. And maybe this makes it lean towards a paradox. I’m cool with that. First off, I stand by the concept that “things” don’t just fail. The failure is due to some human decision, and looking at this as hackers/makers/engineers/inventors, that human decision is on our shoulders. Like I said before, embrace it. Think about my embedded engineer who kept blaming “the chip” for his firmware issues. If he had approached the problem with the “my design’s failures are due to my actions” attitude, he would have saved time, the most valuable resource on a project.

But the second, paradoxical point, is that Godel is right. And he proved this with math and stuff. Due to discovery, economics, real world pressures, system complexity, the drunken haze of building a Tom Servo replica at 0300, there are times when we can’t analyze, simulate, think through every detail that could lead to failure. You end up with hydraulic fluid spewing out of your jet fighter at 10,000 feet.

So, uh, to use some forum speak, “what do?” Well, that’s the paradox and part of the phun. You are the hacker. You are the engineer. You are the maker. Your design is yours, it’s your creation, and you are responsible for it’s success and failure. Yet you can’t analyze and predict everything. It’s why we have test pilots; it’s why we race Formula 1 cars. You can’t predict it all. What you can do is do your best and accept responsibility for whatever happens once that design goes in to the market place of ideas, the skies, or the racetrack. Do your best as an engineer, then seek out the best to compete against.

Oh, and if you think I said “you are a failure” and not “you failed in your duties as a designer”, you need to HTFU. Buck up old boy, and get back to it!

« | »

Leave a Reply