How to do thematic analysis in Excel: a step-by-step guide (and when to upgrade)

To do thematic analysis in Excel, set up a workbook with one row per quote and columns for: document name, the quote itself, an initial code, a theme, and notes. Paste your interview excerpts or survey responses into the quote column, assign short descriptive codes in the next column, then use filters and a pivot table to group codes into broader themes. For projects with less than a dozen documents, this workflow is genuinely practical. For larger datasets, or anything involving multiple coders or evolving codebooks, a purpose-built tool like Skimle will save you significant time and reduce the risk of losing your analytical thread.

When Excel actually works for thematic analysis

There is a tendency in qualitative research circles to dismiss spreadsheet-based analysis as insufficiently rigorous. That view underestimates Excel's real strengths for smaller projects.

If you are analysing 5 to 15 interview transcripts for a consulting engagement, a market research project, or an internal employee survey, Excel can handle the job. You probably already have it open. Your clients or colleagues can open your file without installing anything. And the discipline of manually reading each quote and assigning a code forces you to stay close to the data in a way that automated tools sometimes skip over.

Braun and Clarke's foundational framework for thematic analysis emphasises that the analyst's close engagement with the data is not a weakness to be engineered away — it is the point. Excel, used thoughtfully, supports that engagement.

The boundary is rapidly approached after a dozen documents. Below that threshold, Excel is a reasonable choice. Above it, the manual overhead starts to dominate your time, and the structural limitations of a flat spreadsheet become frustrating. More on that later.

For now, let us walk through exactly how to do thematic analysis in Excel, step by step.

Setting up your Excel workbook for thematic coding

Before you import a single quote, get the structure right. A clean column layout makes everything else easier.

The recommended column structure

Column	Header	Purpose
A	Document	File name or respondent ID
B	Quote	The verbatim excerpt
C	Initial code	Your first-pass label
D	Theme	The higher-level theme it belongs to
E	Notes	Analytical memos, flags, questions

Keep one row per quote. Do not put multiple quotes in a single cell. Do not merge cells. Merged cells break sorting and filtering, which you will need later.

A few layout tips

Freeze the top row. With your cursor in row 2, go to View > Freeze Panes > Freeze Top Row. As your dataset grows to hundreds of rows, you will be glad you did this.

Wrap text in the Quote column. Select column B, right-click, Format Cells > Alignment, and tick Wrap Text. Set the row height to auto. This makes quotes readable without having to click into each cell.

Add a second sheet for your codebook. As you assign codes, maintain a running list on Sheet 2 with columns for Code, Description, and Example Quote. This becomes your reference when you are deciding whether a new excerpt fits an existing code or needs a new one.

Use a third sheet for themes. Once you start grouping codes, a separate sheet listing Theme, Codes included, and a short definition helps you think through the structure before committing it to the main data sheet.

Step 1: Prepare your data

The most tedious part of thematic analysis in Excel is getting your raw data into the right shape. Transcript files, survey exports, and interview notes all arrive in different formats.

From interview transcripts: Open each transcript, read through it, and copy the excerpts that are relevant to your research question into the Quote column. You do not need to include everything. Select the passages that speak to your questions. Record the document name in column A so you can always trace a quote back to its source.

From survey exports: Survey tools like Typeform or SurveyMonkey export open-text responses as CSV files. Open the CSV in Excel, copy the relevant column of esponses into your analysis sheet, and add the respondent ID or survey submission ID to column A. Delete any columns you do not need.

Formatting guidance. Keep quotes as close to verbatim as possible. If you shorten a quote, use square brackets to indicate the edit: "We found the process [confusing] from the start." Avoid paraphrasing at this stage — paraphrase strips out the language that often contains the most analytically interesting material.

This preparation phase connects directly to good data practices described in our guide to how to analyse interview transcripts.

After this step, you will have a table with 20 to 30 rows per each source document (e.g, for 250 rows for 10 interviews), capturing the most interesting snippets of insights. If you need to go back to the original source data, a simple word search will work. Trying to include all the verbatim data from all interviews is a common amateur mistake as it bloats the analysis file.

Depending on your workflow, you might also want to anonymise surce data in this copy step, either manually or using tools like [Skime Anonymise](../features/what#anonymise-documents]. This way the further downstreams analyses can be shared more broadly without needing to worry about revealing individual respondent identity.

Step 2: First-pass coding

With your data in place, read through each quote and assign a short initial code in column C. At this stage, stay close to the language of the data. Descriptive codes are better than interpretive ones at first pass.

If a respondent says "I kept waiting for someone to make a decision and it never happened," your initial code might be "decision paralysis" or "unclear ownership," not "poor governance" (which is an interpretation you may or may not end up standing behind).

How many codes? For a typical 20-interview project, you might end up with 50–120 initial codes. That range is normal. The codes will consolidate in the next step.

Using colour to mark progress. Colour coding in Excel is a useful visual signal, but treat it as a supplement to the code text, not a replacement. A practical convention: yellow for quotes you are unsure about and want to return to, green for quotes that feel analytically important, and no fill for everything else. This gives you a quick visual scan of where you need more thought. Some also have the confidence level as an additional column.

Avoid the temptation to assign multiple codes to a single row by cramming them into cell C. If a quote genuinely serves multiple codes, duplicate the row and give each copy its own code. It feels redundant, but it makes filtering and analysis much cleaner.

For a deeper discussion of coding approaches, including the difference between inductive and deductive strategies, see our guide on how to code qualitative data.

Step 3: Developing themes from codes

Once you have coded your way through the dataset, switch to an analytical mode. The goal now is to look across your codes and identify patterns: groups of codes that are pointing at the same underlying idea.

Using a pivot table. A pivot table is the most powerful Excel tool for this stage.

Click anywhere in your data table.
Go to Insert > PivotTable and place it on a new sheet.
Drag "Initial code" to the Rows field and "Document" to the Values field (set to Count).
This gives you a frequency table showing how many quotes each code appears in.

Codes that appear once across 20 documents may be outliers or overly specific labels. Codes that appear in 8 or 10 documents are likely pointing at something important. Use this as a rough guide, not a rule: a single quote from a single document can be analytically significant if it captures something no-one else articulated.

Grouping codes into themes. Once you have a sense of the code clusters, go back to your main data sheet and start filling in column D (Theme). Group 3 to 8 related codes under a single theme label. A common mistake is making themes too broad ("communication" as a theme covering everything from email frequency to board reporting) or too narrow ("lack of weekly update emails" as a theme when it is really a code-level observation).

If you are savvy with Excel, you can use a VLOOKUP or similar tool and pull the theme name for each code from a lookup table. If this sounds like gibberish to you, simple sorting and filtering followed by manually copying the theme name does the same trick.

Our thematic analysis complete guide covers the distinction between themes and codes in depth, and is worth reading alongside this practical walkthrough.

Step 4: Reviewing and refining themes

Themes rarely come out right on the first pass. This step is about iterating.

Read all quotes under each theme together. Filter column D to show a single theme and read every quote in that group. Ask: do these quotes actually belong together? Is there a subgroup that deserves its own theme? Are there quotes here that would be better placed elsewhere?

Renaming in Excel. When you decide to rename a theme, use Find & Replace (Ctrl+H) rather than editing cells one by one. Search for the old theme name in column D and replace with the new one. This prevents the situation where you have "Leadership issues," "Leadership challenges," and "Leadership concerns" as three separate themes that should be one.

Splitting and merging themes. If a theme feels too big, add a subtheme column (column F) to break it down. If two themes feel redundant, merge them by updating one theme name to match the other and then reviewing the combined set.

Keeping a change log. Add a tab called "Decisions" where you note when you changed a theme name, merged two codes, or dropped a code entirely, and why. This is your analytical audit trail. It is the closest Excel gets to the kind of version control that dedicated QDA tools provide.

This iterative process is described well in the demystifying thematic analysis guide, which walks through the review phase in detail.

Step 5: Writing up from your Excel analysis

Once your themes are stable, Excel becomes a source document for your report rather than an active workspace.

Extracting representative quotes. Filter by theme, scan the quotes, and copy the most illustrative ones into your report or slide deck. A good theme section in a report typically includes 2–4 direct quotes that capture the range of perspectives within the theme, not just the most extreme example.

Summarising with a pivot table. A second pivot table — with Theme in Rows and Document in Values (Count) — gives you a quick frequency overview: how many documents touched each theme. This is useful context for report readers and for your own sense of where the weight of evidence lies.

Preserving the connection to source. Column A (Document) is your audit trail. When you quote a respondent in the report, note the source. "Several respondents described the process as chaotic (Interviews 3, 7, 12, 18)" is more credible than an unattributed claim.

For guidance on turning your coded analysis into a coherent narrative, see our guide on how to synthesise user research and the post on how to write up a thematic analysis.

Where Excel breaks down...

Excel is a general-purpose tool, often being tortured to do something it was not designed for. For small projects, the workarounds are manageable. For larger ones, they become genuinely costly.

Volume. Once you are working with 20 or more documents, a spreadsheet with several hundred or thousands of rows becomes unwieldy. Scrolling, filtering, and navigating between sheets takes real time. More importantly, it becomes harder to hold the whole dataset in your head, which is where analytical insight comes from.

Multiple coders. If two or more people are coding the same dataset, Excel falls apart quickly. Shared files get overwritten. Conflicting edits are hard to reconcile. Tracking inter-rater reliability, a standard quality check in systematic analysis, requires custom formulas that most people do not set up correctly. For consulting teams or research groups working together, this is a significant limitation. See qualitative research for consultants: tools and workflow for how professional teams typically handle this.

Changing codes mid-analysis. Qualitative coding is iterative. You will want to rename codes, split one code into two, or merge three codes into one. In Excel, every rename is a manual find-and-replace operation (or change in the lookup table) with real risk of inconsistency. A dedicated tool tracks these changes automatically.

Tracing quotes back to context. In Excel, you have a quote in a cell and a document name in another cell. To re-read that quote in context, you need to open the original document, search for the text, and find the surrounding passage. In a dataset of 20 interviews, doing this for 200 quotes adds hours to your analysis.

No AI assistance. If you want to use AI to help identify patterns, generate initial codes, or flag contradictions across the dataset, Excel offers nothing. You end up pasting chunks of data into ChatGPT, losing the connection to source and creating serious data governance questions. Our post on whether ChatGPT can analyse qualitative data covers the risks of that approach in detail. There are workaround (e.g, Google Sheets has AI functions as do some Copilot office licenses) but they still only work on individual cells, take time and require you to develop the codebook in advance as something for the AI to code against.

Audit trail for stakeholders. In consulting and market research contexts, clients increasingly ask how you reached your conclusions. An Excel file with colour-coded cells is not a compelling answer. A structured tool that links every insight to the underlying quotes is.

What to use when Excel isn't enough

When your project outgrows Excel, you have two realistic options for professional qualitative analysis.

Traditional QDA software (NVivo, MAXQDA, ATLAS.ti)

These tools, NVivo in particular, were built for academic qualitative research and offer powerful features: hierarchical node structures, complex queries, mixed-methods integration, and detailed audit trails. If you are writing a PhD thesis or publishing in a peer-reviewed journal, they remain the gold standard.

The trade-offs are significant for business contexts. NVivo costs around $599 (€550) per year for an individual licence. The learning curve is steep: most users spend several days on training before they feel confident. And the interface reflects academic workflows rather than business ones. For a consultant who needs to complete analysis in three days, not three weeks, that investment does not always make sense. Our QDA software comparison covers the full landscape.

A few years back, traditional QDA software was the only alternative, and in practice never used in business settings like consulting or market research.

Modern AI tool combining rigour with speed: Skimle

Skimle is built for the situation this article is about: a professional with tens or even hundreds of documents who needs rigorous, transparent analysis without a difficult learning curve and burning weeks of midnight oil in the back-office wrangling with data.

You upload your transcripts or documents, and Skimle reads them, identifies themes, and surfaces the quotes behind each one. Every insight is traceable back to the source document and the exact passage, so you can re-read context without ever leaving the platform. Codes and themes can be adjusted and the analysis updated without manual find-and-replace operations. Multiple team members can work on the same project without file conflicts, and agents can also be connected to the same dataset using MCP.

For consultants running commercial due diligence or customer research, the time saving is typically measured in weeks rather than hours. For market researchers handling large survey datasets, the ability to analyse open-text responses at scale without losing analytical rigour is what makes the difference. See how teams use it at the consultants and investors use case page and the customer and market researchers page. The Noren case study gives a real-life example of how this Helsinki-based strategy consulting company uses Skimle for transcription, document management and analysis.

The AI assistance in Skimle is transparent rather than opaque: you can see which quotes drove which themes, and the system does not present conclusions without evidence. That matters when you are presenting findings to clients or publishing results. Our post on thematic analysis with AI in 2026 explains the methodological principles behind AI-assisted qualitative analysis.

Ready to move beyond the spreadsheet? Try Skimle for free and see how purpose-built qualitative analysis compares to the Excel approach. No training required — most users complete their first analysis on the same day they sign up.

Want to go deeper on thematic analysis methodology? Read our complete thematic analysis guide, the practical walkthrough in demystifying thematic analysis, or the five-step guide to analysing interview transcripts.

About the author

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile