Incident postmortem analysis with AI: finding themes across a year of retrospectives

How engineering leaders use AI to find recurring root-cause themes across dozens of incident postmortems, with traceability back to source.

Cover Image for Incident postmortem analysis with AI: finding themes across a year of retrospectives
Share this article:

Quick answer: Incident postmortem analysis with AI means reading dozens of past postmortem documents as a single qualitative dataset, rather than one at a time, to surface recurring root-cause themes and the services or processes that keep appearing. Tools like Rootly, PagerDuty, and FireHydrant document individual incidents well. Skimle complements them by analysing the accumulated set, with every theme traceable back to the specific postmortem and passage it came from.

Most engineering organisations are good at running a single postmortem. A senior engineer facilitates a blameless review, the timeline gets reconstructed from Slack and pager logs, contributing factors are documented, action items get assigned, and the report is filed. Do this 40 or 80 times a year, across a growing service catalogue, and something curious happens: nobody goes back and reads the whole pile. Each postmortem is a well-documented single data point. The set of them, taken together, is where the more expensive patterns hide, and almost nobody is looking there.

This guide covers why cross-postmortem analysis is a different problem from running a good single postmortem, what the popular incident management tools do and do not do in this space, and how a qualitative analysis approach (treating postmortems as a text corpus rather than a stack of tickets) surfaces the systemic patterns that justify investment decisions.

Why one good postmortem is not the same as understanding the pattern

The blameless postmortem is one of the more durable ideas in site reliability engineering. Google's SRE book states the principle plainly: a blamelessly written postmortem "assumes that everyone involved in an incident had good intentions and did the right thing with the information they had," and the goal is to fix systems and processes rather than people. This is correct, well-established practice, and nothing in this article argues against it. If anything, cross-postmortem analysis depends on it: aggregating findings accurately only works if the underlying documents focus on systemic causes rather than individual fault.

The trouble is that a blameless culture, applied incident by incident, tends to produce blameless amnesia at the organisational level. Each review does its job: it identifies what went wrong in this incident, assigns a handful of action items, and closes. What it rarely does, because it is not designed to, is ask whether this incident's contributing factors match the contributing factors of the incident four months ago, or the one before that, in a different service owned by a different team. That comparison requires reading across the set, and reading across 60 postmortem documents is not a task most engineering leaders have time for, however much value sits inside them.

What's actually being missed

The kind of pattern that goes unnoticed across postmortems is rarely a single dramatic cause. It tends to be a recurring contributing factor that shows up in a clause or paragraph of each document, never quite the headline:

  • A particular service's alerting consistently fires too late or too noisily to be acted on, mentioned as a secondary factor in five separate incidents over a year
  • A deployment or change-review process that gets blamed, in slightly different words, every time a release goes wrong, without ever becoming the subject of its own dedicated review
  • An on-call handover gap that contributes to slower detection, noted once as "could have been faster" and then never raised again until it recurs
  • A specific upstream dependency or vendor whose failures keep showing up as a contributing factor across unrelated incidents

None of these show up if you are looking at one postmortem at a time, because each individual mention looks like a minor, already-actioned detail. They show up only when you read the full-text narrative of dozens of postmortems side by side and ask: where does the same root cause keep appearing, in different words, across documents nobody thought to compare?

What incident management tools actually do (and do not do)

It is worth being precise about the popular tools here, because their marketing sometimes blurs the line between "analysing one incident well" and "analysing many incidents together."

Rootly is an incident management platform that automates the operational side of running an incident and writing it up: it captures a timeline of Slack messages, alerts, and commands as the incident unfolds, uses AI to draft a first version of the postmortem narrative, and syncs action items into Jira or Linear. Its strength is making a single postmortem faster and more complete to produce.

PagerDuty owns the alerting and on-call layer and has been migrating its postmortem feature into a broader "Post-Incident Reviews" capability, with customisable templates and detailed per-incident timelines. In late 2023, PagerDuty acquired Jeli, an incident analysis platform built specifically around comparing incidents to each other. Jeli's distinguishing feature, now part of the PagerDuty product line, is a cross-incident view that aggregates tags and structured annotations (which team, which service, what time of day, who responded) across an organisation's incident history to surface organisational patterns such as recurring on-call or communication issues. This is the closest any of the mainstream incident tools come to cross-incident analysis, and it is a meaningful capability. It works from structured metadata and coordination data captured by the tool itself, in incidents that were run through Jeli or PagerDuty from the start.

FireHydrant focuses on making a single retrospective thorough and well-facilitated: customisable templates, contributing-factor fields, and AI-generated incident summaries within one retrospective document.

The pattern across all of them: each is built first to make running and documenting a single incident faster, more complete, and less painful. That is a real and valuable problem to solve, and none of this is a criticism of what these tools are built to do. The gap is in the free-text narrative itself. A postmortem document contains a written account: what happened, what the team believed at each stage, what they tried, what they got wrong, what they would do differently. That narrative is qualitative data, and qualitative data does not reduce well to the tags and dropdowns that structured cross-incident dashboards rely on. Two postmortems can describe the same underlying alerting failure in completely different language, tagged under different services, and never show up as related in a tag-based view.

A side-by-side comparison

Rootly, PagerDuty, FireHydrant (single-incident tools)Jeli / PagerDuty cross-incident viewSkimle (qualitative cross-postmortem analysis)
Primary jobRun and document one incident wellCompare structured tags/metadata across incidents run through the platformRead the full text of many existing postmortems as one dataset
InputLive incident data (Slack, alerts, commands) as it happensTags and annotations captured during incident responsePostmortem documents already written, in any format, from any source
What it findsA complete timeline and narrative for this incidentPatterns in who, when, and which team, based on how incidents were taggedRecurring root-cause and contributing-factor themes expressed in the prose itself, even when worded differently each time
TraceabilityBack to the incident's own timelineBack to tags and annotationsBack to the specific postmortem document and exact passage
Works retroactively on a backlog of old reportsPartially, if the report was produced in that toolOnly for incidents run through that platformYes, this is the primary use case

How cross-postmortem thematic analysis works in practice

The mechanics are closer to thematic analysis of interview transcripts than to incident response tooling, because the input (a written account of what happened and why) is the same kind of unstructured text a qualitative researcher works with.

Collect the corpus. Export or upload the postmortem documents you already have, whether they live in Rootly, PagerDuty, Confluence, Google Docs, or a folder of PDFs. Skimle accepts a range of document formats, so the source tool does not matter. An organisation running 40 to 80 incidents a year typically has a meaningful corpus after 12 to 18 months.

Add metadata for the dimensions that matter. Service or team affected, incident severity, month, whether it was a deployment-related incident or not. Skimle's metadata features let you attach these as structured fields and then slice the thematic findings by them, so you can ask not just "what root causes recur" but "which root causes recur specifically in the payments team's incidents" or "has the mix of contributing factors changed since we adopted the new deploy pipeline."

Run the analysis. Skimle's automatic thematic analysis reads every postmortem and groups contributing factors, root causes, and remediation themes into categories, the same way it would group themes across customer interviews or survey responses. A root cause described as "alerting threshold was too conservative" in one document and "we did not get paged until the error rate was already high" in another lands in the same theme, because the grouping works on meaning, not exact wording.

Trace every theme back to its source. Each theme in Skimle links to the underlying insights and quotes that produced it, and each of those links back to the specific postmortem document and passage. If a theme says "alerting gaps in the checkout service contributed to five incidents this year," you (and anyone you show the finding to) can open each of the five passages and read exactly what was written, in context.

Review before presenting. This is the step that matters most, and it deserves real attention rather than a quick sign-off. Postmortems are written in a blameless register on purpose, focused on systems and decisions rather than people. An AI summarising across dozens of them can occasionally produce a theme phrased in a way that drifts toward a name or a team in a way the source documents did not intend, particularly if one engineer's name appears repeatedly across timelines simply because they were on-call often. Before sharing any cross-postmortem finding with leadership, someone should read the underlying passages and confirm the theme is stated at the systemic level the postmortems themselves were written at. This is a discipline question as much as a tooling one: aggregation should never reintroduce the blame that the individual reviews were structured to avoid.

How does this change the conversation with leadership?

The practical payoff of cross-postmortem analysis is evidentiary, not novel. Most engineering organisations already sense, informally, that certain services or processes cause disproportionate trouble. The problem is that the case for fixing them usually rests on whichever incident is freshest in memory, which is a weak basis for a headcount or tooling investment decision.

A theme that says "this contributing factor appeared in 9 of the last 52 postmortems, across 4 different services, with the underlying passages attached" is a different kind of argument. It is not "the database team had a bad outage last month, again." It is "the same alerting gap has now contributed to incidents in checkout, billing, and notifications over the past year, and here are the 9 passages that say so." That is the difference between an anecdote and an evidence base, and it is considerably harder for a budget conversation to wave away.

This is the same shift the competitive intelligence and research repository use cases describe in other domains: scattered qualitative records, read individually, produce anecdotes. The same records, analysed as a set, produce a defensible pattern. Engineering postmortems are simply a domain where this has not yet become standard practice, in part because most of the dedicated tooling in the space is still optimised for running the next incident, not for mining the last eighty.

Knowledge workers outside dedicated research roles run into the same problem whenever they accumulate a pile of documents, meeting notes, or reports and need to find what recurs across them without a formal research background. If that describes how your engineering org ends up with postmortems, see how Skimle supports curious professionals who need to make sense of scattered documents without becoming qualitative researchers first.

What the data says about why this matters

Three figures are worth keeping in view when making the case for systematic postmortem review.

ITIC's 2024 Hourly Cost of Downtime Survey, which polled more than 1,000 organisations worldwide, found that the average cost of a single hour of downtime exceeds $300,000 (€280,000) for over 90% of mid-size and large enterprises, and 41% of enterprises report hourly downtime costs of $1 million to over $5 million (€930,000 to €4.6 million). Recurring root causes are not a minor inefficiency at that scale. They are a multiplier on an already large number, because every incident traceable to a known but unaddressed pattern is a repeat cost that better visibility could have prevented.

Google's DORA research gives a useful benchmark for what good recovery looks like: in the 2024 Accelerate State of DevOps Report, elite-performing teams recover from failures in under one hour, while teams below elite performance take substantially longer. Postmortem themes that point at the same recurring delay (a slow alert, an unclear escalation path, a missing runbook) are themes that directly affect where an organisation sits on that curve.

Atlassian's 2025 State of AI in Incident Management report, based on a survey of over 500 software developers, IT professionals, and decision-makers in the US, found that 79% of teams are already exploring AI for incident trending, even as 74% cite security risk as a top barrier to expanding that use. The appetite for AI-assisted pattern-finding in incident data is already there. The open question for most organisations is less "should we use AI here" and more "what exactly should it be reading."

Frequently asked questions

Does cross-postmortem analysis conflict with blameless postmortem culture?

No, and it should be designed to reinforce it. The analysis works on the same systemic-cause language that blameless postmortems are written in, and the output should be phrased at the same level: which services, which processes, which categories of failure recur, not which people were involved. Anyone running this kind of analysis should treat individual names appearing in source documents as a reason for extra review, not as a finding to surface.

How many postmortems do we need before this is worth doing?

Most organisations find the analysis becomes useful somewhere between 30 and 50 postmortems, enough that a recurring theme reflects a real pattern rather than two coincidental incidents. Below that, simply re-reading the small set manually is often still feasible.

Can this replace the incident management tool we already use?

No, and it is not meant to. Tools like Rootly, PagerDuty, and FireHydrant remain the right place to run an incident and produce a thorough single postmortem. Cross-postmortem thematic analysis is a periodic exercise, perhaps quarterly or twice a year, that reads the accumulated output of those tools as a dataset, looking backward across the set rather than forward through the next incident.

What if our postmortems are inconsistent in format or detail?

Inconsistent postmortems are still usable, though the analysis is more reliable when documents share a reasonably similar structure (timeline, contributing factors, root cause, action items). If formats vary a lot across teams, standardising the template for future postmortems, even loosely, improves the quality of cross-postmortem analysis from this point on without requiring you to rewrite old reports.

Does this work if our postmortems live across multiple tools?

Yes. The analysis works on the exported text of the documents themselves, not on a live connection to any single incident management platform, so postmortems pulled from Rootly, PagerDuty, FireHydrant, Confluence, or plain documents can be analysed together as one corpus.

A practical starting point

If you want to try this without committing to a full quarterly process, a reasonable first pass is:

  1. Export the last 12 months of postmortems from whichever tool or wiki holds them
  2. Upload them as a single project, adding metadata for service, severity, and month if your reports record those consistently
  3. Run thematic analysis and review the categories that emerge for recurring root-cause and contributing-factor themes
  4. For any theme that recurs more than two or three times, open the underlying passages and read them before drawing a conclusion
  5. Bring the themes, with passages attached, to the next engineering leadership review as evidence for where investment should go

Ready to find out what your postmortems have been telling you all along? Try Skimle for free and run a thematic analysis across your last year of incident reports.

Related reading:


About the authors

Henri Schildt is a Professor of Strategy at Aalto University School of Business and co-founder of Skimle. He has published over a dozen peer-reviewed articles using qualitative methods, including work in Academy of Management Journal, Organisation Science, and Strategic Management Journal. Google Scholar profile

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand their markets, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile


Sources