There’s something brewing in the incident management category – new and extreme competition. In the last week, FireHydrant, Rootly, and Incident.io all released on-call products competitive with PagerDuty and OpsGenie. But why did it all happen at nearly the same exact time?
Just like incidents, there are multiple contributing factors – and I wanted to take some time to write why this market is shifting so unbelievably fast from my perspective as the CEO of FireHydrant.
The rapid rise of commercial incident management tooling
Incident management is not a new idea, but it is a relatively new market. Just over five years ago the Google SRE book made its debut, Jon Allspaw spoke at every conference imaginable, debates about root cause rattled Twitter timelines, and all of a sudden, there was a groundswell of interest in more efficiently responding to incidents, running effective retrospectives (postmortems), and adopting best SRE practices. That interest was quickly met with solutions: commercial tools and structured advice for how to actually implement change. In a rapid clip, SRE became one of the most in-demand jobs.
I merged the -m “initial commit” of FireHydrant in September 2017 and I announced it on HackerNews six months later (cue the Dropbox comment vibes). Blameless, Kintaba, RigD, Squadcast, and Transposit all entered the scene around the same time. Monitoring and observability providers took notice, as did the alerting and on-call incumbents. There was a frenzy of acquisitions that resulted in Atlassian’s status pages, PagerDuty’s rundeck automation and Grafana on-call. By 2021, focused incident management providers emerged to compete with FireHydrant (hello Rootly and Incident.io). More recently BetterStack joined the club.
It’s easy to understand why this explosion of standalone reliability tools has occurred: all software will break at some point. And there’s money to be saved (and made) by the way teams respond.
A cosmic shift toward tool consolidation
For more than a decade, the budget for software development (and its accompanying tooling) seemed to grow on trees. And then, shortly after the solidification of the incident management category, we saw the world press pause. For the last few years engineering teams have been under increasing pressure to get by with less expensive tools, limit new purchases, and consolidate the jobs they need done into fewer vendors.
Since PagerDuty is often a massive line item on an engineering budget, we felt the squeeze from some of our customers and prospects who needed to justify a tool for both alerting and on-call and incident management. Certainly they couldn’t wipe out their alerting provider (who takes the batteries out of their smoke detector?). But perhaps this incident management stuff could go. “Why can’t PagerDuty just do both?”, our customers would report being asked by their CFOs. “Their website says they do incident response.”
But by 2023 incident response tooling had proven itself as more than a nice-to-have. Teams could leverage in-product analytics to demonstrate critical improvements in their reliability (and the $$ associated with less downtime). Last year FireHydrant closed some of our largest logos yet – including one featured front and center in a recent PagerDuty earnings report.
The writing was on the wall: modern engineering teams don’t want to live without commercial incident management tooling. The pressure is on to find a budget and consolidate tools. But not every team can withstand this pressure. And why should they have to? For savvy incident management providers, consolidation of alerting and on-call with incident management tooling is a natural progression.
But aren’t you scared PagerDuty will just build this?
I’m openly critical of PagerDuty. I mean no ill-will to the hard working employees of PagerDuty. You’re generating over $400M in revenue yearly – way more than anyone in the incident management category has (for now 😏). But, let’s be honest here: The PagerDuty product has not had any material innovation in its core focus area (alerting and on-call) in years. You can’t add multiple services to an incident, you can’t page a team, you can’t alert anyone without also opening an incident – it’s genuinely baffling. From the outside, it has felt like the solution to lack of innovation was to simply build more things. The Billboards listing a bunch of random tech terms don’t help with that sentiment.
Naturally, the leadership at PagerDuty sees the same trends we do – both the market opportunity and customer demand for a consolidated incident workflow. Otherwise, they wouldn’t have acquired Rundeck and more recently Jeli. The trouble is, engineers aren’t demanding more tools. They’re demanding more efficiency. And you can’t Frankenstein your way to cleaner and more efficient incidents.
What I expect in the next year
PagerDuty isn’t going to cease to exist tomorrow, but at the current rate, I’d expect a significant change in their business operations in the next year or so. There’s already been one PE buyout rumor this year. The stock itself has not moved. And multiple financial institutions are downgrading the stock entirely.
No matter what happens, though, one thing is certain: Alert + Incident Management belong together, and those markets are going to merge, and the companies that moved the fastest with excellent execution will be the long-term winners. The ones that didn’t, will struggle. A classic example of innovators dilemma.
PagerDuty left the gate down over their moat, and FireHydrant’s Signals (general availability) led the charge into the castle grounds. Days later, Incident.io (invite only) and Rootly (coming soon) followed suit. It’s become clear that there’s plenty of excitement for a consolidated alerting and incident management tool and that there are lots of smart people building their own, innovative takes. For now, our eyes are laser focused on our own approach: deeply integrated, staunchly reliable, rich with data, meaningfully bolstered by AI, and charting toward a place where the best incidents are the ones that never happen. ✨Watch this space ✨.
I wouldn’t be doing my job if I didn’t also say: Check out Signals – our modern, fairly priced, and kickass on-call and alerting tool.
Great blog!
I work on the on-call provider add-on scheduling tool oncallscheduler.com. I've been watching all of you piling into the on-call provider market, wondering what's going on. I agree that the best end customer experience is that incident management and on-call should be part of the same product. And the same is true for the oncall scheduling. One UI to learn. One way to authenticate. One REST API to use. One provider relationship. One... And most importantly: super-smooth flow of data across these tied-to-the-hip user scenarios.
I don't think there's an innovator's dilemma situation going on though. Is there really a completely different way to build on-call in the new players, compared to PagerDuty? I don't see anything like the spinning hard drive vs. the solid state disk, or the analog camera vs. digital. This seems more like everybody competing with the same basic approach. I agree that PagerDuty stopped innovating in their core business for a number of years, so there is an opportunity for the new players.
For us at oncallscheduler.com, the result is we'll have to figure out how to expand from PagerDuty/Opsgenie/GrafanaOncall integration, to also build schedule sync for all you new entrants. It'll be fun!