Last modified at 2020-09-06 17:08:38 +0000
If you’ve worked in information security engineering for any period of time, you may have seen interrupt-driven workflows. These are workflows that create alerts, thereby distracting from whatever task one would otherwise work on.
For some teams, this means having a person or group of people on-call, waiting for the next relevant alert to fire. Some folks love this, others abhor it.
For many, however, COVID-19 has prompted the need for new flows — either these systems do not fire (usually due to detection based on-premise) — or they fire too much. Whether this is the result of more time spent on their work machines for work is unclear — it’s based upon the signal detected.
But, in both situations above, the urgency around alerts becomes a priority. Do our signals still make sense? Do we still respond with the same urgency if someone goes to a low-risk site now, versus in the office?
I’ve tried to adopt a model I saw on Twitter.
An out-of-office reply from a female Assistant Professor that warrants a tweet- “I do not respond to emails on weekends. If this is an emergency, please call my mobile. If you do not have my mobile number, then you do not have a weekend emergency.” Stephana Cherak
It’s simplistic (and a bit crass), but it gets the point across. High value (high risk) incidents should have focus. If it’s not urgent, it will not be any more urgent if it waited until the week-day.
But treating the alerts (in this case, symptoms of a system that is not within a theoretical secure baseline ) is not the proper solution. One such solution which I’ve attempted to further explore is developing antifragile solutions. The concept of antifragility is best summed by Taleb himself.
Some things benefit from shocks; they thrive and grow when exposed to volatility, randomness, disorder, and stressors and love adventure, risk, and uncertainty. Yet, in spite of the ubiquity of the phenomenon, there is no word for the exact opposite of fragile. Let us call it antifragile. Antifragility is beyond resilience or robustness. The resilient resists shocks and stays the same; the antifragile gets better Nassim Nicholas Taleb
In examining antifragile systems we are looking beyond principles of resilience and robustness — abstracted from concepts like availability, scaling, storage, and so on. We’re speaking at a more meta-level — how is the system performing? What is the health? While both of these questions may relate to traditional metrics of measuring health, it’s not measuring a healthy system or not. Is the system improving from the mistake it saw before? Is it treating data in an appropriate manner? , the question we’re asking is singular: is the system performing how it should be?
Developing systems with this paradigm in mind is key to antifraiglity. I see two particular paths stemming from this — one path being learning from failure, the other, formally verified systems.
Over the coming months, I’m going to document my approach to building such a framework, denoting how we can build systems that operate on the second notion — preventing anything but the intended actions from happening. No specific language be used, although a domain specific language may satisfy some of the needs within my framework. Again, the end goal here is a system that is antifragile – it should benefit from the chaos. This will mean a system that learns from mistakes it may have had — or not have them in the first place.
I’d love to have others join, if you’re interested. Feel free to contact me via any of the mediums listed below — I’d love to collaborate.