This is just an idea I've been kicking around a little bit, and kind of a variation of every other Good Idea in Software Development which basically all boil down to "understand why things happen before you create a plan to respond to them", but I haven't specifically seen it developed too much elsewhere.
With some frequency, I'll see suggestions to "add more logging", "add more monitoring", "add more tests", etc., or related questions ("how can I find the logs"). I think these questions are often the wrong questions to ask, because they're jumping over an "understand the problem" step and assuming a solution (the least-surgical, most broad-spectrum one-size-fits-all solution) -- but logging/monitoring/tests are very poor solutions to some problems in these domains.
Narrow Focus | Broader Focus |
---|---|
Logging as an active diagnostic tool. | Observability of the system. |
Where are the logs? | How can I observe/diagnose the behavior? |
Should we log this [to help future operator-at-keyboard active diagnostics]? | How can we make this system more observable? |
Monitoring as a reliability layer. | Reliability of the system. |
Is the system monitored? | Is the system reliable? |
Should we add monitoring? | How can we make the system more reliable? |
Unit tests as regression protection. | Robustness of the system [to change]. |
Do we have test coverage? | Is the system robust to change? |
Should we add tests? | How can we make the system more robust to change? |
This is really just a set of special cases of "describe the problem, not the solution", but they're kind of a weird flavor of that?