Introduction
Operationalizing a program that efficiently drives down meaningful security vulnerabilities is one of the most common, and often unsuccessful, challenges security organizations face. The scale of the problem can be immense, internal resistance to disruption is high, and the tools meant to alleviate this burden often simply shift the work to other (non-security) teams. The rigid, inflexible SLA-based approach often adopted by security organizations produces plenty of “security output as a proxy for progress.”
However, meaningful risk reduction requires security organizations to take a flexible, contextualized approach. At DigitalOcean, we redesigned our vulnerability management program in 2022 to incorporate a concept of “security debt” and have seen success in driving meaningful risk reduction. In addition to the success of the security program, other business units have adopted our approach and taken up this reporting model for their own metrics.
We’re not the first to attempt this kind of approach. In 2021, Carta published an article about measuring and reporting security risk as credit card-like debt. We also drew a lot of inspiration from Twilio Segment’s approach and recommend their presentations on the subject: “Democratizing Vulnerability Management” and “Embracing Risk Responsibly.” The foundation of our program was created from these peer resources. We are publishing this article to similarly share what we believe is a better model for vulnerability management with the broader information security community.
Security SLAs aren’t real
We’ve written about how security practices that shift toil work from security teams onto product or engineering teams impede an organization’s velocity to deliver for their customers more than it improves the security posture. We spoke at OWASP Appsec Global in 2023 about the need for security organizations to take an enablement approach to security programs. In that talk, we admit to a revelation—we don’t believe in SLAs for security vulnerabilities.
Security teams often focus on individual vulnerability tickets—this one has this SLA, and that one has that SLA. It doesn’t matter how many tickets the dev team has overall, or what the system they are responsible for does, or any other contextual factors. These security teams often rely on tactical, obstructionist security outputs instead of strategic, holistic security outcomes. It is not uncommon for many of these issues to fall into the business’s risk register for failing to be remediated by their SLA—a state which is often considered okay as long as you get an executive’s signature each year on the risk register line item. Outputs as a proxy for progress. We like the reasoning behind Accepted Insecure Time (AIT) over the term “SLA,” but we’re still left with a general process that companies have been using for years that is not producing the desired outcomes and leaves every stakeholder involved in the process wanting something better.
DigitalOcean vulnerability management
At DigitalOcean, we never took the risk register/exceptions deferral path. At one point in our history, we held all teams accountable for planning and fixing all reported security issues. We have a lot of security-conscious developers at DigitalOcean, which is great! However, this meant that when we reported a security issue, even if it was a lower-severity problem, developers would often jump to work out a fix.
While this is the type of behavior we’d like to encourage for high-severity issues, lower-severity issues were creating high levels of roadmap disruption, without a real justification. When there were higher severity issues that deserved immediate attention, we were sometimes met with frustration from product owners and engineering managers for yet another disruption. We were perceived to be crying wolf more often than justified, which hurt our ability to rally support around actual emergencies. Some lower-severity issues that were not immediately acted upon fell into the Jira ticket abyss, and we didn’t have a good way to keep track of those outstanding issues and follow up with the appropriate teams.
As security triaged new vulnerabilities, they would apply contextual insight into how this vulnerability impacted DigitalOcean. Given our somewhat unique posture as a cloud provider, some vulnerabilities treated by the wider industry as lower severity were a big deal for us, while other issues considered critical had little to no impact on our platform. Once this context was appended to a ticket, security would reach out to the appropriate application team and inform them of the vulnerability and of security’s SLA.
Then, the waiting game began. Many issues would be acted upon and completed within the first few days. However, others would fall through the cracks. Someone on the security team would be responsible for following up with the app team—tapping them on the metaphorical shoulder and asking “Are we done yet? Is it fixed yet?”
Eventually, the security engineer’s attention would be redirected to new, incoming issues or higher-priority tasks. When they next followed up with the app team, they might learn that the team had finished the issues several weeks or months ago, and had just not notified security. We could now close the vulnerability, but this meant our metrics over the prior period did not accurately reflect the true risk posture of the organization.

