March 10, 2023 | Trundl | Guide
The impact of Enterprise Incidents is measured in thousands of dollars per minute. Let’s compare two giants of Incident Management keeping companies on top of unexpected incidences
Contents
- Overview
- Why PagerDuty vs. Opsgenie?
- Product History
- Alerts & Notifications
- Centralized Monitoring
- Reports & Analytics
- Stakeholder Communications
- Runbooks & Access to Knowledge Base
- Associating Incidents to Problems or Changes
- Integrations
- Post-Mortems & Archiving
- Price
- Use Case
- Conclusion
Overview
Enterprise incidents may be unavoidable, but they don’t have to be unmanageable. With the right strategies and the right preparation, you can quickly and effectively respond to any incident and minimize potential damage to your reputation and your bottom line. In this guide, we’ll explore the two most widely used incident management platforms in detail so that you can effectively prepare for and manage an incident by choosing the right tool. So, let’s get started!
What is Incident Management?
An incident is any event that impacts or has the potential to affect the quality of service. An example of this would be a business application going down or a web server running slowly, which could lead to a complete failure. These incidents can cause disruption and reduce productivity.
Having the right incident management process is essential for any organization. When service outages occur, teams need to be able to respond and resolve the incident rapidly in order to reduce the financial impact on the business.
DevOps and IT Operations teams utilize incident management as a way to address and recover from any unexpected event or service disruption, bringing the service back to its normal functioning state.
The severity of an incident can vary greatly, from a global web service crash to a handful of users facing intermittent issues. When the service is restored and functioning as intended, the incident is resolved. This includes any tasks necessary to reduce the impact and bring back the service’s functional capability.
Why is it smart to get Incident Management right?
The cost of major incidents can be extremely high, with some web-based services costing over $300,000 per hour of system downtime. To help reduce these costs, organizations must have a reliable incident management process in place.
The advantages of such a process are clear:
- Quicker incident resolution
- Reduced costs and revenue losses
- Improved communication during incidents
- Opportunities for continuous learning and improvement
Incident management processes can vary from one company to another in both approach and tooling. There is no one-size-fits-all solution. A common approach is an IT-focused one, such as those following ITIL best practices. Other teams prefer to employ Site Reliability Engineer (SRE) or DevOps-style processes. Whatever your preference, you have to consider tooling at the center of it.
Why PagerDuty vs. Opsgenie?
PagerDuty and Opsgenie are two widely used incident management platforms. How different or similar are they from each other? How do you decide which one of these is best suited for your business and specific requirements? Which platform can prove to be a better investment? To understand this, let’s dive a little deeper into learning everything they have to offer.
What is PagerDuty?
PagerDuty provides a SaaS-based platform to help developers, DevOps, IT operations and business leaders prevent and resolve incidents that would have a negative impact on customer experience. With integrations, on-call scheduling and escalations, machine learning, response orchestration, analytics, and more, PagerDuty effectively provides the right people with the right data in real-time, ensuring customer satisfaction and the protection of brand reputation.
What is Opsgenie?
Opsgenie is an advanced incident management platform that alerts and empowers teams to respond quickly to incidents for always-on services and applications. It leans heavily toward a team-based approach. It provides centralized monitoring, smart/prioritized alerting, and an incident war room for incident owners and stakeholders to employ incident root cause analyses, to access and enact remediation efforts through runbooks, and to draw in more resources as needed.
Opsgenie also provides an on-call scheduling solution that allows you to quickly reach the right people through multiple communication channels, including voice calls, email, SMS, and push messages on mobile devices. In case an alert goes unnoticed, Opsgenie will automatically escalate it to other communications methods and to alternate contacts. Lastly, Opsgenie makes incident post-mortems and related remediation actions easy.
Though Opsgenie and PagerDuty share many similarities, they still have key differences. Let’s explore them!
_________________________________________________
Product History
PagerDuty:
Initially released as an MVP, PagerDuty launched in 2009. It lacked some of the more robust features such as services and incidents, and alarms were limited to being either “on” or “off.” Additionally, scheduling was only possible in weekly or daily formats, and the escalation process was limited to three levels—primary, secondary, and tertiary.
In the years that followed, PagerDuty improved greatly, and is highly regarded as a powerful incident management tool. PagerDuty has thousands of customers across the globe, including SAP, Cox Automotive, Vodafone, Hyland, Zoom, and more.
Opsgenie:
Acquired by Atlassian 2018 for $295 million, Opsgenie is an established solution for DevOps and IT Operations teams needing streamlined incident management. At the time of the acquisition, Atlassian’s Jira Service Management was a PinkVERIFYTM certified ITIL 4 platform for IT teams, and a popular choice for many of Atlassian’s 200,000 strong customer base. However, Jira Service Management, at the time, lacked true enterprise level incident management capabilities. Bringing Opsgenie into the fold changed that.
Already known as an integration-friendly tool (powerful APIs) Opsgenie joined a tightly integrated portfolio of collaboration tools including Jira Software, Confluence, and Bitbucket. The new possibilities of connecting cross-functional teams on incidences was a strategic home run for Atlassian and its future as a DevOps industry leader.
Opsgenie | PagerDuty |
---|---|
Founded in 2012, acquired by Atlassian in 2018 | Released Beta version in 2009 |
Headquartered in Boston, with offices in Virginia & Ankara | Founded in Toronto, now has its headquarters in San Francisco |
Originated as a monitoring aggregation tool with complete incident response orchestration. Notable by its strong APIs, solid scheduling, automation capabilities, and team management support | Originated as an MVP product focused on incident alarms. Developed into a highly respected Incident management platform known for it’s support for customizations and integrations |
_________________________________________________
Alerts & Notifications
Alerts let us know what’s happening and when, and they keep everyone impacted (customers, stakeholders, executives) informed and prepared to take action.
PagerDuty:
As expected, PagerDuty is a very effective alerting platform. An event captured by PagerDuty triggers an alert and incident status, which in turn triggers resource, stakeholder, and customer communications. You can have multiple alerts compiled into one incident for triage, which makes hand-off between teams easier and also centralizes critical information and notifications.
You can group alerts or transfer/move them to another incident. These operations can be automated or manual. Alert grouping has three methods to automate (intelligent based, content-based, and time-based).
If you want to see, sort through, and drill down on all alerts, there’s an Alerts Table. Search by incident type, by service, by team, or other criteria.
Alerts can be established with just about any service or 3rd party monitoring tool (ie. CloudWatch, Nagios, etc). There are some limitations/exceptions to full bi-directional integrations for alerting (Note: Jira is on the list).
The Standard ($19/user/mo) and Enterprise ($29/user/mo) plans provide the best alerting features |
There’s an add-on called Automation Actions (addtl. $20/user/mo), which provides |
---|---|
Automated Actions | Threshold Alerts |
Advanced Alert Customization | Add Notes |
Custom Alert Actions | Scheduled Event Rules |
Alert & Notification Policies | Recurring Event Rules |
Paused Incident Notifications | |
Disable Event Rules |
Opsgenie:
Opsgenie was built around incident communications to stakeholders. Opsgenie notifies stakeholders with automatic/orchestrated notifications and status page updates according to your company’s requirements. These notifications can be automated to keep stakeholders informed about incident resolution progress and service health. In short, when creating an incident or setting up an incident template, you can automate notifications to be sent to any relevant stakeholders tied to a service. It’s just a matter of configuration.
Alerts can be made from different sources:
- Incoming/Bi-Directional Integrations
- Emails
- Alert API
- Heartbeat Monitoring
- Incoming Calls
- Web/mobile Applications
You can view an alerts page showing all alerts that are visible or assigned to a user. You can apply filters, or execute bulk actions (close multiple alerts, ack all etc.), all on that page. You can also look deeper at each alert to see if certain users have “seen the alert.” Too many alerts cause stakeholders to tune out, so once Opsgenie recognizes users have seen an alert, their notification schema changes appropriately.
All alerts have an activity log kept for the entire lifecycle of the alert for auditing purposes. You can view the responder statuses and responder metrics (escalations, quiet periods, forwarded, etc) to have a full view of the organization’s reaction.
Lastly, alert recipients can get notified by email, SMS, mobile push and phone calls. You can add conditional logic to each of those. Prime examples are automatically converting phones from silent mode to ring mode, and escalation alerts based on responses (or most commonly, lack of responses).
The Verdict:
There is parity in all important areas, but you’ll pay more for it in PagerDuty. If considering both, you may want to drill down on some of the noted limitations for some services, monitoring tools or collaboration tools that don’t offer bi-directional integration. Some may mean some manual interventions in your alerting schemes.
_________________________________________________
Centralized Monitoring
Centralized monitoring ensures that your critical programs and services are not only running optimally, but their statuses and health metrics are in one place for your support teams to see them.
PagerDuty:
PagerDuty provides an essential hub for monitoring operations, giving teams immediate awareness of any issues concerning essential systems and services.
During an incident, a conference bridge is a central channel where all responders can gather, allowing connection via web conferencing provider, phone or meeting URL. This can be done manually, automatically, or set to specific account-level or service-level conditions. The conference bridge feature is limited to the two higher tier plans (Business ($41/user/mo), Digital Operations).
Opsgenie:
Opsgenie’s Incident Command Center (ICC) is an integrated suite that enables centralized monitoring and communications needed during an incident response. Think of it as a war room where every piece of information you need (alerts, heartbeat monitoring, resources, knowledge, statuses, statistics, more) is accessible. All information is organized as part of an incident timeline as well, giving all team members context on what happened, when remediation started, who was involved, and more.
ChatOps, Opsgenie-hosted, and Zoom-hosted bridges let you have real-time collaboration and communication with on-call and stakeholder resources.
The Verdict:
Very similar, feature-wise. Both tools provide one place to access all critical information on services, utilities, and alerts, as well as an intuitive Incident escalation pathing, leading to resources, remediation, and resolution.
_________________________________________________
Reports & Analytics
Incidences are often complex and involve a lot of data. Whether it’s for handling the incident as it’s occurring or explaining what happened and what could improve, it’s all through reporting.
PagerDuty:
In PagerDuty, you have access to Basic and Advanced Reporting, depending on your tier/plan. The Free and Professional Plans have Basic Reporting, which only comprises Notification and Incident Volumes. If you have the Professional and Digital Operations Plans, you get Advanced Reporting, which includes System, Team, User, Notification and Incident reports.
- System Report
- Team Report
- User Report
- Notifications Report
- Incidents Report
Opsgenie:
With Opsgenie, you can gain valuable insights into the success of your operations and areas you can improve. Opsgenie tracks all alerts and incidents, enabling you to use advanced reporting and analytics to discover where your most alerts originate, how your team responds and resolves them, and how on-call workloads are divided.
Here’s how Opsgenie powers you with some of the effective reporting and analytics processes.
- Operational efficiency analytics
- Monthly overview analytics
- Downloadable and schedulable reports
- User and team productivity analytics
- On-call analytics
- Conference attendance and efficiency analytics
- Service and infrastructure health reporting
- Post-incident analysis reporting
To take advantage of most of these reports, we recommend the Enterprise plan ($29/user/mo), which you can take advantage of…
- User & Team Productivity Analytics
- Post Incident Analysis Reporting
- Service and Infrastructure Health Analysis
- On Call Analytics
The Verdict:
Slight edge to Opsgenie. Both tools provide effective reporting on the most important metrics in driving MTTR (mean time to resolution) and in addressing incidences throughout their lifecycle. However, Opsgenie comes with more reports, and they are built into the platform, whichever tier you choose. PagerDuty’s licensing relies on add-ons.
_________________________________________________
Stakeholder Communications
Optimizing collaboration is essential for prompt, effective incident management. Let’s compare these tools on their ability to keep the right people informed (and calm).
PagerDuty:
PagerDuty provides great options for ensuring business stakeholders and service owners are proactively notified and enabled to address the issue at hand, in real time. Like Opsgenie, monitoring events trigger alerts, which in turn trigger notifications to stakeholders. Also like Opsgenie, there is a high level of control over alerts and triggers for secondary and escalatory alerts.
There are multiple alert channels, including SMS, mobile app push notifications, phone/voice, and email. Within those channels, you can refine them further with…
- Multi-User Alerting
- Alert Noise Reduction / Alert Groupings
- Enriched Incident Context Alerts
- Rich HTML Email Notifications
- Dynamic Notifications (by channel, payloads, service, or time of day)
Opsgenie:
There are 4 main contact methods within Opsgenie for qualified incidences E-Mail, SMS, Phone Voice-Calls, Mobile Push Notifications (Android, iOS, Opsgenie for BlackBerry Dynamics)
You can set notification rules to configure conditions, time constraints, and notification steps for an event for each notification action type. Tie those to alert customization, alert tagging, Heartbeats, and Opsgenie Edge Encryption, and you have powerful options.
There are 8 types of notification events:
- New Alert
- Acknowledged Alert
- Closed Alert
- Re-Notified Alert
- Assigned Alert
- Added Note to Alert
- Schedule Start
- Schedule End
- Incoming Call Routing
Escalations ensure that the alert gets the necessary attention when an alert is not acknowledged within a certain amount of time. For example, if no one is available, Opsgenie will take a message, generate an alert, and notify the right person via their preferred notification channel. Call details are attached to the notification, and recipients can listen to the message.
Status pages reduce distractions and allow responders to focus. Opsgenie does offer a service status page for an overview of system health.
The Verdict:
Equal. Both platforms are known for their mature and powerful alerting. Both platforms give companies granular control automations and the use of alerts that fit the incident, the team, the time, and a host of other conditions.
_________________________________________________
Runbooks & Access to Knowledge Base
When an incident happens… what’s your role? What’s your team’s role? What approved process do you follow?
PagerDuty:
PagerDuty offers Intelligent Triage, which provides responders two ways to gain more context on the affected service, including past incidences and related incidences.
You can also utilize what PagerDuty calls Response Plays. These help you plan your response processes and procedures so that you can mobilize easily, accurately, and instantly on future incidences. Response Plays enable:
- Multi-team response mobilization for major incidents
- Proactive planning of response by responders and key stakeholders
- Response to incident creation via chat, web/mobile app, monitoring, and custom integrations
- Define status updates regarding resolution, accessible by stakeholders and the broader organization
- Response Plays can only be used for Business tiers of PagerDuty or higher.
Opsgenie:
In Opsgenie, Incident responders need to have immediate access to runbooks and pre-filled incident data. You can store various runbooks and have them accessible for stakeholders based on conditional logic (service, severity, type, team).
When an incident is resolved, Opsgenie provides a knowledge base that can be used for post-mortem analysis. This knowledge can help identify future problems or changes and be able to explain what led to them.
If you are already using Opsgenie within Jira Service Management (Cloud Premium), you can create and document incidents and problems or change tickets to improve resolution processes and capture learnings. These documents can include runbooks – a list of steps for resolving an issue – and postmortem analysis – a summary of what happened and how it was addressed. By recording these details, you can build a knowledge base for future issues and identify potential areas for improvement.
The Verdict:
Equal. Accessibility of knowledge, incident timelines or other data varies, but both tools provide robust options and are built to support best practices. Both provide access to pre-set runbooks for the type of incident, and both tools capture knowledge from previous incidences for reference on future incidences.
_________________________________________________
Associating Incidents to Problems or Changes
PagerDuty:
PagerDuty is an incident management tool but does not offer direct problem or change management into a ticketing/collaboration system. Companies using PagerDuty need to integrate with a separate service management tool, such as Jira Service Management, Zendesk, Salesforce Service Cloud, or Freshdesk, to manage problems or changes. In such cases, you need to be on the Digital Operations or Business plans for this to be optimized. Additional integrations may not always be completely seamless and could involve tools such as BMC Remedy ITSM, IBM Doors, etc.
Opsgenie:
Opsgenie makes it easy to associate incidents with problems or changes with ticketing tools. If you have already invested in Opsgenie and Jira Service Management (JSM), you’re in luck – both are from the same company and integrate seamlessly.
Additionally, Opsgenie has the capability to tie incidents to problems or changes in JSM. If you use another ticketing/collaboration platform, Opsgenie has almost all of the established integrations available (BMC Remedy, Freshdesk, Microsoft Azure DevOps, Salesforce Service Cloud, others).
The Verdict:
Leans toward Opsgenie. With Opsgenie, you can easily identify the source of a problem or change, as it is linked to an incident or may have been triggered by one. In contrast, PagerDuty requires integration with an external software platform to trace back the source of an issue. Therefore, Opsgenie provides more direct and reliable visibility into the underlying cause of the problem and change tickets.
_________________________________________________
Integrations
PagerDuty:
PagerDuty’s ecosystem of 650+ platform integrations as part of their verified integration program. Of those, 300+ are native. Like OpsGenie, you can integrate with all of the established services like ServiceNow, Nagios, Jira, and more. You can set up email integrations and custom APIs (HTTP call or custom script) for in-house apps or tools not on PagerDuty’s list.
Opsgenie:
Opsgenie connects to more than 200 established apps, web services, monitoring tools, and communication tools being used every day. It also has bi-directional functionality in native integrations, and powerful JSON over HTTPS API for applications not found as part of native integrations. In both methods, integrations can be part of your alerts, escalations, schedules, teams, and Heartbeat requirements. Overall, Opsgenie should be able to run both passive and active monitoring of your environment.
For customers with an on-premise/private cloud deployment, Opsgenie offers Edge Connector allowing secure connection of on-premise solutions (e.g. Jira Server, Nagios, SolarWinds) with one outbound port. You can set up executables triggered by Opsgenie with it.
At a minimum, you will want the Standard plan ($19/user/mo) or higher, which provides:
- Outbound Integrations
- Bi-Directional Integrations with ITSM Tools
- Integrations with In-house/On-premise Systems
- Action Mappings System
- Service Options through APIs
- Service Subscriptions
- Service and Infrastructure Health Analysis (Enterprise only)
The Verdict:
Close, but slight edge to PagerDuty. PagerDuty’s list of integrations is technically larger, however for 98% of use cases, both Opsgenie and PagerDuty provide all of the integrations that enterprises would need or expect. The APIs are equally powerful, as well.
_________________________________________________
Post-Mortems & Archiving
After an incident is resolved, the post-mortem process is an essential part of the incident lifecycle. It enables teams to gain insight into what happened (root cause) and how to prevent similar incidents from occurring.
PagerDuty:
The post-mortems feature is available for accounts on Business ($41/user/mo) and Digital Operations (“Call for pricing”) plans for PagerDuty. It’s essentially a template that a post-mortem owner would enact with stakeholders (ideally within 5 days of the incident resolution). The template lets you document root causes, timeline information, notification audit trails, changes, and analysis notes. It also allows for action tickets that, through integration, can become tickets for collaboration tools (ie. Jira, BMC, etc).
Opsgenie:
Opsgenie’s Post-Incident Analysis Report allows users to create a comprehensive incident report by summarizing the key information related to an incident in an easy-to-read format. It includes all of the data from the Incident Command Center (ICC), including incident impact, mitigation steps, root cause, and follow-up actions. NOTE: You must be on an Enterprise plan to access incidents and postmortems.
The Verdict:
Leans toward Opsgenie due to better automation with documentation. PagerDuty post-mortems require more manual processes to capture information that was consumed during the incident. Opsgenie makes this data more pre-filled after resolution, allowing the team dedicated to post-mortem analysis and actions able to focus on prevention and actions, rather than documentation.
Price
PagerDuty:
PagerDuty has more complicated pricing than Opsgenie.
- Free – $0 – On-call and incident response for small teams
- Professional – $21/user/mo – On-call and incident response for growing teams
- Business – $41/user/mo – Streamlined incident response for the enterprise
- Digital Operations – “Call Us” – End-to-end digital operations solution tailored to you
Added Features and Costs
- PagerDuty Event Intelligence – Apply AI to reduce noise, provide real-time context – $24/user/mo
- PagerDuty Automation Actions – Connect external automation to diagnose and remediate incidents. Delegate automated jobs or proactively trigger through Event Orchestration. – $20/user/mo
- PagerDuty for Stakeholders – Real time visibility for stakeholders to keep awareness on incidents and customer impact – $3/user/mo
- PagerDuty Status Pages – Proactively communicate real-time status of digital operations for customers – $89/mo/1000 subscribers
- Add-On: PagerDuty Event Intelligence – Apply AI to reduce noise, provide real-time context and eliminate manual tasks. – $20/user/mo
Opsgenie:
Opsgenie is a more cost-effective solution. It also has a simpler pricing model based on graduating tiers. Below are the Opsgenie standalone prices and tiers:
- Free – $0 – Basic Alerting and On-Call Management for Small Teams
- Essentials – $9/user/mo – Alerting and Incident Management, Optimized for Simplicity
- Standard – $19/user/mo – Unlimited Alerting and Incident Management, Built for Flexibility
- Enterprise – $29/user/mo – Advanced Incident Management with Enterprise Collaboration & Business Visibility
- As you go up in price, you unlock more capabilities, including increased API requests, routing rules, rich alerts, Bi-directional integrations, call center capabilities, access management, and service integrations. You can view the tiers and trade-offs here: https://www.atlassian.com/software/opsgenie/pricing
An important factor of comparing Opsgenie pricing against other tools is the relationship with its sibling software in the Atlassian portfolio. If parts of your organization already use Jira Software and need to be in the loop for incidences, they can have stakeholder roles and you don’t have to pay extra for them.
What’s better is if your organization already uses Jira Service Management Cloud Premium, you already have Opsgenie. In other words, Opsgenie can be $0 (sunk cost).
The Verdict:
Opsgenie is the clear winner whether you’re taking advantage of Opsgenie as part of a Jira Service Management, or buying it standalone. Let’s see a sample license cost comparison for a like-for-like solution for a mid-sized company.
_________________________________________________
Let’s explore a real use case comparing cost:
The company’s requirement, by the numbers:
- 250 Engineers who need to be able to respond and collaborate on incidences
- 100 Stakeholders who need to be informed on incident response progress/resolution
- Requirements to automate alerts and assignments with conditional logic on tickets/incidences
- Requirements for a Status Page for outside parties to be informed of progress, with 1000 Subscribers
PagerDuty:
PagerDuty Business ($41/Mo): $123,000/yr
Add-On: PagerDuty Automation Actions ($20/User/Mo): $60,000/yr
Add-On: PagerDuty for Stakeholders ($3/User/Mo): $1800/yr
Add-On: PagerDuty Status Pages ($89/mo/1000 Subscribers): $1068/yr
Cost: $185,868
Opsgenie:
Opsgenie Enterprise ($29/User/Mo): $87,000/yr
Statuspage Business ($399/Mo): $4,788/yr
Cost: $91,788
_________________________________________________
Conclusion
Both platforms are industry leaders in incident management for a reason. However, if you are currently using PagerDuty, you should seriously consider a switch to Opsgenie. Whether you get Opsgenie “for free” with Jira Service Management Cloud Premium or you’re getting Opsgenie standalone, you get essentially the same capabilities for, in some cases, half the cost.
Cost is going to be the main factor, but there are other aspects, depending on your business model, that make it preferred. If your business is already invested in Atlassian (Jira Software, Jira Service Management) for incident stakeholder teams (Engineering, infrastructure, IT), Opsgenie is more effective. Alerts, collaboration, knowledge and post-incident actions are all more closely integrated when Opsgenie and Jira are combined.
Get an estimate for Opsgenie
Drop us a line, we will get back to you on Licenses, Deployment and Support