Complete guide to thematic analysis - from raw data to actionable insights across academic and business settings

Thematic analysis is the most widely used method for analysing qualitative data, yet many researchers and analysts struggle to apply it systematically. Whether you're an academic researcher analysing interview transcripts, a consultant synthesising expert calls for due diligence, a policy analyst reviewing consultation responses, or a market researcher exploring customer insights, thematic analysis provides a rigorous framework for transforming unstructured qualitative data into structured insights.

This comprehensive guide covers everything you need to know: from foundational concepts and methodological approaches to step-by-step implementation, practical applications across different contexts, and how modern AI-assisted tools are transforming thematic analysis while maintaining academic rigour.

What is thematic analysis?

At its core, thematic analysis is a method for systematically identifying, analysing, and reporting patterns (themes) within qualitative data. These patterns represent something important about the data in relation to your research question, and often describe a level of patterned meaning within the dataset.

Think of it as moving from chaos to clarity: you start with dozens or hundreds of pages of transcripts, documents, or open-ended survey responses, and through systematic analysis, you end up with a coherent set of themes that capture the essential patterns in your data.

What makes it "thematic"?

The term "thematic" refers to identifying themes, which are not simply topics or categories, or summaries of all the interviews conducted arranged as a list. A theme captures something significant about the data in relation to the research question and represents some level of patterned response or meaning. For example:

Not a theme: "Summary of Bob's, Mary's and Alice's interviews" (just single informant responses summarised). Actual theme: "Fear of AI eroding demand for high-quality qualitative research"

Not a theme: "Price" (this is just a topic) Actual theme: "Perceived value mismatch between price and delivered features" (this captures meaning and pattern)

Not a theme: "Onboarding" (just a category) Actual theme: "Implementation complexity as a barrier to early adoption" (describes a relationship and pattern)

Why thematic analysis?

Thematic analysis has become the foundational method for qualitative research across disciplines for several compelling reasons:

Flexibility: Unlike some qualitative methods tied to specific epistemological positions (like interpretative phenomenological analysis or grounded theory), thematic analysis can be applied across theoretical frameworks and research paradigms.

Accessibility: The method doesn't require the same level of technical knowledge as methods like discourse analysis or conversation analysis, making it accessible to researchers new to qualitative work.

Richness: When done well, thematic analysis produces rich, detailed insights that capture complexity and nuance in ways quantitative methods cannot.

Credibility: The systematic nature provides a defensible evidence trail. When stakeholders ask "How do you know that?", you can point to specific data supporting each theme.

Versatility: The method works across data types (interviews, focus groups, documents, open-ended survey responses) and research contexts (academic, commercial, policy, legal).

Core methodological approaches to thematic analysis

Before diving into the step-by-step process, it's crucial to understand that thematic analysis isn't a single monolithic method. Researchers make several important methodological choices that shape how they approach the analysis.

Inductive vs. deductive approach

Inductive (data-driven) thematic analysis means themes are strongly linked to the data themselves, with minimal influence from pre-existing theory or the researcher's preconceptions. You let the themes emerge from the data.

Example: You're analysing exit interviews from employees who left your company. Rather than starting with predetermined categories like "compensation" or "work-life balance", you read through all the data and let employees' own framings shape your themes. You might discover unexpected patterns like "lack of clear career progression" or "misalignment between stated and lived values".

When to use inductive: Exploratory research, understanding new phenomena, policy consultations where you want all voices heard, market research discovering unanticipated customer needs.

Deductive (theory-driven) thematic analysis means themes are approached with pre-existing theories, frameworks, or specific research questions in mind. You're testing or applying an existing framework to new data.

Example: You're using job satisfaction theory to analyse employee feedback. You structure your analysis around known factors (e.g., autonomy, mastery, purpose, fairness) and code data into these predetermined categories, while remaining open to data that doesn't fit.

When to use deductive: Testing specific hypotheses, applying established frameworks, benchmarking against prior research, consulting projects with defined scope.

Hybrid approach: Many researchers use a pragmatic combination, starting with broad theoretical concepts but remaining open to unexpected themes emerging from the data. This is probably the most common approach in business and applied research contexts.

Semantic vs. latent themes

Semantic themes identify explicit, surface-level meanings. You're reporting what participants actually said, without looking for underlying meanings.

Example: Analysing customer interviews about a product feature, a semantic theme might be "Difficulty finding the export function" based on multiple customers explicitly stating they couldn't locate this feature in the interface.

Latent themes go beyond surface meaning to interpret underlying ideas, assumptions, and conceptualisations that shape the semantic content.

Example: Looking at the same data through a latent lens, you might identify a deeper theme of "Discoverability crisis reflecting fundamental assumption mismatch between developer mental models and user expectations" — interpreting what the difficulty finding features reveals about deeper product philosophy issues.

When to use semantic: Practical business questions requiring actionable answers, policy analysis focused on explicit stakeholder positions, descriptive research documenting what people say.

When to use latent: Academic research building theory, understanding underlying belief systems, strategic consulting exploring cultural assumptions, situations where "what people say" differs meaningfully from "what that reveals about deeper issues".

Reflexive vs. codebook approaches

Two distinct analytical traditions have emerged, each with different philosophical commitments:

Reflexive thematic analysis (Braun & Clarke tradition) treats theme development as an active, interpretive process where the researcher is the instrument. Themes are not "discovered" in data but actively constructed by researchers engaging with data. Coding schemes evolve throughout analysis. Grounded theory (Glaser and Strauss 1965 and 1967) goes even further and uses the identified themes as a basis for developing theory on how they are related,

Codebook thematic analysis develops a structured coding framework (often called a codebook) early in the process, with clearly defined codes and their meanings. Multiple coders can apply the framework, and inter-rater reliability can be measured.

For business and applied research: Codebook approaches tend to work better when multiple analysts need to apply consistent categorisation, when you're analysing very large datasets manually, or when you need quantification (e.g., frequency of different issue types). Reflexive approaches work better for smaller, complex datasets requiring deep interpretation.

The six phases of thematic analysis: comprehensive methodology

The foundational framework comes from Braun and Clarke's influential 2006 paper, which outlined six phases of thematic analysis. This remains the most widely cited framework in qualitative research. Here's how to apply it rigorously across different research contexts. Note that for a more pragmatic and purely business oriented take on the phases, we wrote earlier a practical step-by-step guide to thematic analysis for business people on Signal & Noise

Phase 1: Familiarising yourself with the data

What it means: Immerse yourself in the data to develop deep familiarity before any formal analysis begins. This isn't passive reading, it's active engagement with the dataset as a whole.

What this looks like in practice:

Block out dedicated time for deep reading (several hours uninterrupted). If you conducted the interviews yourself, re-read transcripts even though you "know" the content. If you're analysing documents or written submissions, read them end-to-end in sitting where possible.

Take notes on initial impressions, recurring ideas, or striking moments, but resist the urge to start formal coding yet. If you have audio recordings, listen while reading transcripts to capture tone, emotion, and emphasis that text alone misses.

For very large datasets (100+ documents), you might sample strategically for initial familiarisation, then conduct more systematic reading during coding. But don't skip this phase.

Why it matters: Starting analysis without familiarisation leads to missing the forest for the trees. You might identify micro-patterns in early documents but miss macro-patterns that only emerge across the full dataset. As we discussed in our guide to analysing interview transcripts, immersion prevents premature fixation on themes that seem important initially but prove peripheral later.

Example from academic research: A PhD student analysing 35 interviews about remote work adaptation initially might notice many references to "Zoom fatigue" in early transcripts. If they'd started coding immediately, they might have over-emphasised this. After reading all transcripts, they realised the deeper pattern was about "loss of informal learning through casual observation instead of active immersion" — Zoom fatigue was just one manifestation of this broader theme.

Example from consulting: A due diligence team analysing 20 expert calls about a target company's market position noticed early calls heavily emphasising "pricing pressure". However, systematic reading revealed this was specific to one market segment. The actual cross-cutting theme was "unclear value proposition relative to low-cost alternatives" — a much more strategic insight for the acquisition decision.

Phase 2: Generating initial codes

What it means: Systematically work through the entire dataset, identifying interesting features and assigning short descriptive labels (codes) to chunks of data relevant to your research questions.

What this looks like in practice:

Work through data line-by-line or paragraph-by-paragraph, depending on density. For every interesting segment, assign one or more codes that capture what's relevant about that extract. Codes should be specific enough to be meaningful but not so narrow that you end up with hundreds of unique single-use codes.

Use a mix of semantic codes (describing surface content) and interpretive codes (capturing meaning). Include contradictions and outliers, not just dominant patterns. Code the same data extract with multiple codes if it's relevant to several ideas.

Keep a list of all codes with brief definitions to maintain consistency. In traditional manual analysis, this means a spreadsheet or document. In older thematic analysis software (NVivo, ATLAS.ti, MAXQDA) it's a code book which you manually edit but with an optimized interface and in some cases some rudimentary AI help. In modern AI-first tools like Skimle, the system creates the preliminary category structure and coding automatically for the researcher then to edit, with full two-way transparency.

Inductive coding example (business context): Analysing customer churn interviews without predetermined framework:

"Implementation took 6 months instead of promised 2 weeks" → [timeline-mismatch] [implementation-complexity] [unmet-expectations]
"Support team kept transferring me between departments" → [support-fragmentation] [process-inefficiency] [customer-frustration]
"We never figured out how to integrate with our existing systems" → [integration-challenges] [technical-barriers] [incomplete-solution]

Deductive coding example (academic research): Using expectancy-violation theory (Burgoon 1978) to analyse the same data:

"Implementation took 6 months" → [expectancy-violation: timeline] [negative-outcome-evaluation]
"Support team kept transferring" → [expectancy-violation: support-quality] [relationship-satisfaction-decline]
"Never figured out integration" → [expectancy-violation: capabilities] [continued-use-barriers]

How many codes?: There's no magic number, but expect dozens to low hundreds depending on dataset size. Guest et al.'s research (referenced in guide on sample size for qualitative analysis) found that most codes emerge in the first 9-12 interviews, but meaning development continues longer.

Practical time investment: Traditional manual coding takes 1 to 3 (or more) hours per interview hour. For a typical qualitative study of 25 interviews, this phase alone can represent 25-75 hours of work. This is why AI-assisted coding has become increasingly valuable, reducing this to hours rather than weeks while maintaining rigour.

Phase 3: Searching for themes

What it means: Analyse your codes to identify broader patterns of meaning. Codes are building blocks; themes are the structures you build from them.

What this looks like in practice:

Print or display all your codes and start grouping related codes together. Look for codes that cluster naturally around a central organising concept. Some codes will clearly belong to emerging themes; others won't fit anywhere initially (that's normal).

Create hierarchical structures with main themes and sub-themes. For example, a main theme of "organisational barriers to adoption" might include sub-themes of "budget approval processes", "technical infrastructure limitations", and "change resistance from existing users".

Use visual mapping techniques. Many researchers literally spread codes out on a large table or wall, physically moving post-it notes into thematic clusters. Software tools provide digital equivalents. The key is seeing relationships spatially.

Example from market research: Analysing interviews about why customers chose a competitor, you might have codes like [expensive], [poor-value-perception], [hidden-costs], [competitor-transparency], [pricing-structure-complexity]. These might cluster into themes like:

Theme: "Price opacity eroding trust"
Theme: "Competitor value proposition clarity"

Example from policy analysis: Reviewing consultation responses on environmental regulation, codes like [compliance-cost], [unclear-requirements], [enforcement-uncertainty], [timeline-concerns] might form themes such as:

Theme: "Implementation burden on small operators"
Theme: "Regulatory ambiguity creating compliance risk"

How many themes?: Typically 5-8 main themes for a focused study, up to 12-15 for broader exploratory research. More than that usually indicates themes aren't sufficiently abstracted. Fewer might suggest you haven't captured enough diversity, unless your research question is very focused.

Phase 4: Reviewing themes

What it means: Check that your themes work at both the coded extract level and the entire dataset level. This is where you refine, split, combine, or discard potential themes.

What this looks like in practice:

Level 1 review - Internal homogeneity: Read all coded extracts for each theme. Do they form a coherent pattern? If a theme feels too diverse or contradictory, it might actually be two themes or a poorly defined theme.

Level 2 review - External heterogeneity: Check whether themes are distinct from each other. If you're struggling to articulate what makes theme A different from theme B, they might need combining or reconceptualising.

Create a thematic map showing relationships between themes and sub-themes. This visual representation often reveals structural issues. For example, you might notice that what you thought was a single theme is actually a mediating variable explaining relationships between two other themes.

Example of splitting a theme: In employee retention research, you initially have a theme "Work-life balance issues". Reviewing the extracts, you realise some data is about time pressures (quantitative: too many hours) while other data is about flexibility (qualitative: inability to control when/where work happens). These are distinct patterns that might merit separate themes if you're interested in understanging this topic in more depth.

Example of combining themes: Analysing policy consultation data, you have separate themes for "unclear regulatory requirements" and "difficulty accessing guidance". Reviewing reveals these consistently co-occur and represent the same underlying issue: "information accessibility barriers in compliance". Combining strengthens the theme.

Example of discarding a theme: In due diligence interviews, you identified "legacy system challenges" as a theme. Reviewing reveals this only appears in 2 of 18 interviews, both from the same customer segment. It's not a robust pattern across your dataset. You might keep it as a minor insight but not as a main theme.

This phase often feels frustrating because you're "going backwards", but it's essential for quality. Expect to cycle between Phases 3 and 4 several times.

Phase 5: Defining and naming themes

What it means: Develop a detailed analysis of each theme, clearly defining what it's about and what it's not about. Create clear, concise names that immediately communicate the essence of each theme.

What this looks like in practice:

For each theme, write a narrative describing what the theme is about. This should be detailed enough that someone unfamiliar with your data could understand the theme's scope and boundaries. Include how the theme relates to your research question and to other themes.

Identify the "story" each theme tells and how it fits into the broader narrative about your data. Check that each theme has enough data to support it (at least several coded extracts from multiple sources).

Create names that are concise but informative. Good theme names are specific enough to convey meaning but general enough to encompass the full scope of the theme.

Examples of poor vs good theme names:

Poor: "Price" → Better: "Pricing concerns" → Best: "Perceived value mismatch undermining purchase decisions"
"Problems with onboarding" → "Onboarding challenges" → "Implementation complexity as barrier to early value realisation"
"What customers said about competitors" → "Competitor comparison" → "Competitive feature parity forcing differentiation on service quality"

Defining theme boundaries: Write explicit criteria for what belongs in each theme and what doesn't, for example:

Theme: "Information overwhelm in decision-making" Includes: References to too much information, difficulty filtering relevant from irrelevant, analysis paralysis, unclear prioritisation Excludes: Lack of information (opposite problem), specific technical complexity, time pressure (related but distinct)

This clarity is crucial when you write up findings or when multiple analysts need to apply codes consistently (common in large-scale consulting or government research projects).

Phase 6: Producing the report / article / analysis

What it means: Create a compelling narrative that tells the story of your data through the themes you've identified, supported by vivid examples.

What this looks like in different contexts:

Academic papers: Structure around themes as main sections or results, each with exemplar quotes demonstrating the pattern. Include description of analytical process in methodology section, discussion of how findings relate to existing literature. Our guide to using AI in qualitative research discusses emerging norms using and disclosing the use of AI for qualitative research.

Business reports: Lead with executive summary of key themes, use themes as organisational structure for detailed findings. Include relevant quantification where appropriate (e.g., "mentioned by 18 of 25 customers") alongside rich qualitative description. See how consultants synthesise expert interviews for practical frameworks.

Policy briefings: Frame themes in terms of policy implications, use stakeholder quotes to give voice to different constituencies. Balance comprehensive coverage with clear prioritisation of most significant patterns.

Due diligence reports: Organise themes by strategic relevance (market position, operational risks, growth opportunities), include representative and outlier views, quantify where possible (e.g., "unanimous among competitors", "mentioned by 60% of customers").

In all cases, when selecting quotes choose extracts that vividly demonstrate each theme, but ensure they're representative not just dramatic. Include context so quotes make sense standalone. Show diversity within themes, not just the clearest examples. Your final report shouldn't just list themes, it should tell a story about what the data means. How do themes relate to each other? What's the bigger picture they reveal? What are the implications?

Applying thematic analysis across contexts

The methodological principles of thematic analysis remain consistent, but practical application varies significantly across contexts. Here are some nuances of how to apply thematic analysis in different professional settings:

Academic research

Thematic analysis is used in academic research across virtually all social sciences and humanities disciplines. Applications include phenomenological studies of lived experience, evaluation research examining programme effectiveness, policy research exploring stakeholder perspectives, and theoretical development in management, education, health, and psychology.

Distinctive features: Emphasis on methodological rigour and transparency, explicit discussion of epistemological position, concern with trustworthiness (credibility, dependability, transferability), often aiming for theoretical saturation, peer review standards requiring detailed methodology sections.

Sample sizes: As discussed in our guide to interview sample sizes, academic studies typically involve 20-50 interviews, with sample size justified by saturation principles and methodological framework (IPA requires fewer, grounded theory requires more).

Timeline: PhD projects might dedicate 6-12 months to data collection and analysis. Journal articles often represent 12-18 months of work from conception to submission.

Quality indicators: Inter-coder reliability checks (where multiple researchers code subset of data), member checking (verifying interpretations with participants), audit trails documenting analytical decisions, reflexive consideration of researcher positionality, discussions on thematic saturation to justify sample size, and clear contributions to theory instead of just showing results specific to a situation.

Business consulting

Consulting firms use thematic analysis (though rarely calling it that) for due diligence, market research, customer insight development, strategic assessment, and organisational diagnostics. The analysis informs decisions worth millions of euros, so rigour matters even if academic conventions don't apply. Someone might even argue that the standards for quality should be even higher, despite the fact that this is often not the case in practice...

Distinctive features: Tight timelines (weeks not months), focus on actionable insights not theoretical contribution, commercial sensitivity requiring confidentiality, multiple stakeholders with different needs (client executives, deal teams, subject matter experts), often blending qualitative and quantitative data.

Sample sizes: Pragmatically driven by available expert network contacts and budget. Might be 10-25 interviews for focused due diligence, 30-50 for comprehensive market assessment. Quality of sources (industry experts, senior executives) often matters more than quantity.

Timeline: Typical consulting project runs 4-8 weeks from kickoff to final presentation, with 1-2 weeks for data collection and 1-2 weeks for analysis. Doing fewer interviews well is better than many interviews poorly.

Quality indicators: Internal review by senior consultants, triangulation with quantitative data and secondary research, client workshops validating emerging themes, clear documentation allowing hand-off between team members, crisp synthesis and "so what" statements to make it practical.

Policy and government research

Governments and policy organisations use thematic analysis for public consultations, programme evaluations, stakeholder engagement, and evidence synthesis informing legislation or policy changes.

Distinctive features: Democratic legitimacy requiring all voices be heard, need for transparency and audit trails given public accountability, large volumes (sometimes 500+ consultation responses), diverse stakeholder groups with conflicting interests, political sensitivity requiring careful framing.

Sample sizes: Public consultations may receive hundreds of responses that must all be analysed. Stakeholder engagement studies typically involve 30-40 interviews across different constituent groups.

Timeline: Policy cycle requirements often impose rigid deadlines (e.g., respond to consultation within 3 months). Analysis needs to be thorough but achievable within these constraints. AI-assisted tools](../features/what) are increasingly essential for handling large volumes within tight timeframes, as demonstrated in our EU Digital Omnibus consultation case study with 500+ statements to analyse.

Quality indicators: Systematic coverage of all submissions, clear audit trail from data to findings, transparency about analytical methods, balance between majority and minority views, appropriate handling of duplicate/form responses.

Market and UX research

Commercial research firms and in-house teams use thematic analysis for customer feedback analysis, user research, product development, brand perception studies, and competitive intelligence.

Distinctive features: Action-oriented (informing product roadmaps, marketing strategies), fast pace requiring rapid turnaround, mix of structured and unstructured data sources, often integrated with analytics and behavioural data, need to communicate findings to non-researcher stakeholders.

Sample sizes: Varies widely from exploratory studies (10-15 users) to large-scale feedback analysis (1000s of support tickets or reviews). Continuous research programmes accumulate data over time.

Timeline: Shorter cycles than academic research (days to weeks not months), often iterative with findings feeding immediately into product decisions.

Quality indicators: Internal validation through product metrics (does identified pain point correlate with usage data?), stakeholder buy-in (do product managers recognise patterns?), actionability (can findings drive specific decisions?), reproducibility (can subsequent research validate findings?).

Legal and compliance

Law firms and compliance teams use thematic analysis for discovery document review, witness statement analysis, regulatory investigation, compliance monitoring, and risk assessment.

Distinctive features: Extreme emphasis on accuracy and completeness (missing something can be catastrophic), need for absolute confidentiality, often very large document sets, requirement for defensible methodology, potential use in litigation requiring robust evidence trail.

Sample sizes: Discovery in major litigation might involve reviewing "millions of documents" (though many filtered through keywords before human review). Compliance investigations might analyse 50-200 documents or interviews.

Timeline: Driven by legal deadlines which are often inflexible. Might need to review thousands of documents in weeks.

Quality indicators: Multiple reviewer validation, systematic documentation of inclusion/exclusion criteria, technology-assisted review with human validation, clear chain of custody, ability to explain methodology under cross-examination.

Manual vs. AI-assisted thematic analysis

The fundamental methodology of thematic analysis remains unchanged, but how researchers execute that methodology has been transformed by AI-assisted tools. Understanding the trade-offs helps you choose the right approach for your context.

Traditional manual analysis

Tools: Pen and paper, Microsoft Word/Excel, or dedicated previous-generation qualitative data analysis software (QDA) like NVivo, ATLAS.ti, or MAXQDA.

Process: Researcher reads data, highlights relevant passages, applies codes manually, physically or digitally organises codes into themes, writes interpretive memos. Full analysis of 25 interviews takes weeks and weeks of dedicated analytical work.

Strengths:

Complete researcher control over every interpretive decision
Deep immersion in data through manual engagement
Established within academic traditions with clear quality standards
No concerns about algorithmic bias or errors

Limitations:

Extremely time-intensive, limiting feasible sample sizes for qualitative research
Consistency challenges when analysis takes weeks (codes drift in meaning)
Difficulty managing very large datasets
Fatigue effects reduce quality in later stages
Hard to iterate (reworking code structure means starting over)

Best for: Small studies (under 20 interviews), phenomenological research requiring deep interpretive engagement, projects where researcher immersion is valued as part of methodology, contexts where AI use would raise ethical concerns.

AI-assisted analysis (done rigorously, not just "paste to ChatGPT")

Tools: Specialist qualitative analysis platforms like Skimle that systematically structure data using AI, following established qualitative methodology rather than just one-shot LLM summaries. Note that simply "chatting with your documents using ChatGPT" or trying to run them through a simple RAG model does not qualify as analysis!

Process (Skimle example): Upload transcripts or documents, AI systematically analyses each document to extract insights and build category structure (similar to manual coding but automated), researcher reviews and refines the AI-generated structure, iteratively develops themes through human-AI collaboration. Full analysis of 25 interviews takes 5-15 hours of human review and refinement work.

Strengths:

Dramatically reduced time investment (70-80% time saving vs manual, allowing to spend time on insights not coding)
Can handle larger datasets (50-500+ interviews feasible)
Consistency across dataset (AI applies coding logic uniformly)
Easy iteration (can quickly restructure categories and re-analyse)
Real-time insight development (analyse first batch to guide later data collection)
Full two-way transparency through linking categories to quotes and quotes to categories.

Limitations:

Requires researcher to still immerse themselves to content and do careful validation of AI-generated codes/themes
May miss nuanced cultural or contextual meanings requiring human judgment
Academic norms still developing around appropriate disclosure
Quality of analysis depends heavily on tool design and approach taken, risk of generating "AI slop" with wrong tools

Best for: Larger studies (20+ interviews), time-sensitive projects, exploratory research where iteration is valuable, projects where systematic coverage is crucial (policy consultations, due diligence), research programmes where efficiency enables more ambitious scope.

Critical distinction: Not all "AI for qualitative analysis" is created equal. Simple approaches that feed documents to ChatGPT or dump data into RAG databases lack the systematic methodology required for rigorous analysis. Proper AI-assisted analysis must follow established qualitative methodology, structuring data systematically rather than just querying it at runtime.

Hybrid approaches

Researchers can use AI for initial systematic coding (Phase 2) and pattern identification (Phase 3), then apply human judgment for theme refinement (Phase 4), definition (Phase 5), and narrative construction (Phase 6). This combines efficiency gains with human interpretive expertise where it matters most.

Example: An academic researcher analysing 40 interviews uses Skimle to generate initial codes and provisional themes in a few hours, then spends a week deeply reviewing the structure, verifying correct coding of all materials, refining theme boundaries, identifying relationships between themes, and developing theoretical insights. Total time: 2 weeks instead of 8 weeks for fully manual analysis, with better quality as time can be spent on insights, not manual coding.

Common pitfalls in thematic analysis

Even experienced researchers make mistakes in thematic analysis. Here are the most common problems we've come across in both academic and business settings, and how to prevent them:

Pitfall 1: Confusing topics with themes

The problem: Listing what your data covers (topics) rather than identifying patterns of meaning (themes).

Example of topic-focused analysis:

Topic 1: Pricing
Topic 2: Customer support
Topic 3: Product features
Topic 4: Onboarding

This tells you nothing about patterns, relationships, or insights.

Example of thematic analysis:

Theme 1: Value perception disconnect between pricing and delivered capabilities
Theme 2: Support accessibility as proxy for overall relationship quality
Theme 3: Feature complexity exceeding user needs and creating adoption barriers
Theme 4: Onboarding inadequacy as critical vulnerability period

Solution: For each potential theme, articulate what pattern or meaning it captures, not just what topic it addresses. At McKinsey we called this the "so what" and often had a design principle that all slides in the client deliverable had to have the synthesis in the form of an action title above the analysis. So e.g., a graph showing correlation between price and churn would no be titled "Price vs. churn" but instead e.g., "High price sensitivity suggests review of pricing approach needed to optimize total lifetime value of customers"

Pitfall 2: Unwarranted frequency-based claims

The problem: Treating theme prevalence as indicating importance without considering context.

Why it's wrong: A theme mentioned by every participant might be socially desirable answering rather than deeply important. A theme mentioned by only 3 participants might be a critical insight that others hadn't articulated or considered. A theme mentioned by half of the people but omitted by the other half might actually be considered important by everyone, but randomness, interviewer approach or other reasons might have led to some respondents not bringing it up.

Example: In employee satisfaction research, 90% mention "I like the people here" but this might be superficial social expectation. Meanwhile, only 15% mention "unclear decision-making authority" but further probing reveals this is a profound root-cause of frustration that others hadn't consciously identified even if they do bring up issues related to red tape, slow approvals and general difficulty in getting stuff done.

Solution: Use prevalence as one indicator among many. Consider depth of discussion, emotional intensity, how central a theme is to other themes, and whether absence might be meaningful (people avoiding difficult topics).

Pitfall 3: Confirmation bias in coding

The problem: Seeing what you expect or hope to find, coding data selectively to support pre-existing beliefs.

Example: A product team analyses user feedback expecting to find that requested features would solve churn. They over-code any mention of features they're already planning and under-code references to fundamental usability issues they'd prefer not to confront.

When developing Skimle we have caught ourselves doing this multiple times in customer discussions... it's so much nicer to find validation to our own ideas than be forced to make painful deprioritisation decisions in favour of features suggested by actual users.

Solution: Actively look for disconfirming evidence. Code everything relevant, including data contradicting your hypotheses. Have others review a subset of coding. Be transparent about researcher positionality and preconceptions.

Pitfall 4: Insufficient or excessive abstraction

The problem: Themes are either too concrete (barely abstracted from raw data) or too abstract (untethered from actual data).

Too concrete example: "Customers complained about the checkout process taking too many clicks" — this is basically just summarising data, not identifying a pattern.

Too abstract example: "Temporal dissonance in expectation management" — without clear link to what this actually means in the data, this is academic jargon obscuring rather than revealing.

Just right example: "Transaction friction undermining conversion at critical decision point" — captures pattern (multiple friction sources affect conversion), relates to data (specific checkout issues), and conveys meaning (why it matters).

Solution: Themes should abstract from specific instances to patterns, but remain clearly grounded in data. Test whether someone unfamiliar with your data could understand what a theme means from its name and description. While still at McKinsey, I found the "barbeque test" good here: if you would explain your findings to a friend at a BBQ event, how would you describe them without slides?

Pitfall 5: Overlap and redundancy in themes

The problem: Themes that aren't sufficiently distinct from each other, creating confusion about where coded data belongs.

Example: "Implementation challenges", "Adoption barriers", and "Onboarding difficulties" might all be describing the same underlying pattern in different words.

Solution: Review for external heterogeneity (Phase 4). If you struggle to explain how theme A differs from theme B, they probably need combining or reconceptualising. Draw boundaries: what belongs in each theme and what doesn't? The root causes for this include a wish to keep specific themes alive from earlier iterations (to avoid e.g., rewriting sections), desire to stick to a pre-set number of themes (e.g., wanting to have "7 top reasons for churn") and so on - be ready to adjust for communicability not stick to a pattern that doesn't work.

Pitfall 6: Quantity over quality in data collection

The problem: Conducting many superficial interviews rather than fewer deep interviews, leading to thin data that can't support rich thematic analysis.

Why it's wrong: As discussed in our sample size guide, quality matters more than quantity. Fifty rushed 15-minute conversations likely produce worse analysis than five thoughtful hour-long interviews. Thematic analysis requires sufficient depth to identify patterns, not just coverage.

Solution: Design interviews that allow depth (see our guide to conducting effective interviews). Plan adequate time. Use follow-up questions. Accept that smaller n with better data often produces superior insights.

Advanced techniques and considerations

Once you've mastered the fundamentals, several advanced techniques can enhance your thematic analysis:

Cross-case and within-case analysis

For studies with clear subgroups (e.g., different customer segments, multiple organisations, before/after intervention), conduct parallel analyses:

Within-case analysis: Identify themes within each subgroup separately. What patterns characterise each group?

Cross-case analysis: Then examine themes across all groups. What's universal? What's distinctive? How do themes vary by context?

This approach is particularly powerful in comparative research, policy analysis with multiple stakeholder groups, or market segmentation studies.

Temporal analysis

When data has a time dimension (longitudinal studies, historical documents, process narratives), examine how themes change over time:

Developmental patterns: How do experiences or perceptions evolve? What drives transitions?

Turning points: Are there critical moments where themes shift? What triggers them?

Persistence and change: What remains constant despite changes? What's fleeting?

Negative case analysis

Deliberately seek instances that don't fit your emerging themes. These "deviant cases" can be incredibly informative:

Refining theme boundaries: Exceptions help clarify what a theme really encompasses.

Identifying moderating factors: Why do some cases differ? What contextual factors explain variation?

Alternative explanations: Negative cases might suggest a different way of interpreting the pattern.

Quantitising qualitative data

While thematic analysis is fundamentally qualitative, sometimes counting is informative:

Simple frequency: How many participants mentioned this theme? (But remember Pitfall 2 above: prevalence ≠ importance)

Co-occurrence: Do certain themes consistently appear together in the same interviews or documents?

Intensity coding: Beyond presence/absence, you might code how centrally a theme featured in each case.

Demographic patterns: Does theme prevalence vary by participant characteristics?

Use quantification to complement, not replace, qualitative interpretation. The numbers describe the dataset structure; the qualitative analysis explains what that structure means.

Team-based analysis

When multiple researchers analyse the same dataset, special considerations apply:

Code development: Create initial codebook collaboratively, pilot on sample of data, refine definitions until acceptable inter-coder agreement.

Division of labour: Some teams have everyone code everything (time-intensive but maximises engagement). Others divide the dataset by interview, but regularly discuss emerging themes. In consulting settings I would sometimes split work streams by topic (e.g., market dynamics vs. target assessment) which then resulted in people searching for specific themes across all interview notes.

Resolving disagreements: Differences in coding often reflect interesting analytical questions, not just errors. Discuss disagreements to clarify theme boundaries and deepen understanding.

Audit trails: Document who made what analytical decisions when, especially important in high-stakes contexts like legal discovery or policy analysis.

Ensuring quality and rigour

How do you know if you've done good thematic analysis? Several quality indicators help:

Transparency

Can others follow your analytical process? This requires:

Clear description of methodology including why qualitative analysis and your sample size was chosen
Explanation of epistemological approach (inductive/deductive, semantic/latent)
Documentation of coding decisions
Examples of coded extracts for each theme
Discussion of how themes were developed and refined

For academic research, this appears in methodology sections. For applied research, it might be an appendix or technical report accompanying client-facing deliverables. In any case, you want to have a crips and solid rationale for what you did and why as a foundation for the insights and recommendations to stand on.

Coherence

Do themes hold together internally and relate logically to each other? Check:

Is each theme internally consistent?
Are themes distinct from each other?
Is there a clear narrative connecting themes?
Do themes address your research question?

Grounding

Are themes clearly evidenced in data? Each theme should:

Be supported by multiple data extracts
Appear across multiple participants or documents (unless explicitly noted as minority view)
Include vivid exemplars that demonstrate the pattern
Connect back to the research question

Reflexivity

Have you considered your role in constructing the analysis? Reflect on:

Your pre-existing assumptions and how they might shape interpretation
How your identity/position might influence what participants shared
Alternative interpretations you considered and rejected
Analytical decisions and what drove them

Catalytic validity

For applied research (consulting, policy, UX): do findings enable action? Check whether:

Insights are specific enough to guide decisions
Themes resonate with stakeholders as genuine and important
Findings suggest clear next steps or interventions

Conclusion: from method to mastery

Thematic analysis is both simpler and more complex than it first appears. The basic steps are straightforward: familiarise, code, identify themes, refine, define, report. Yet executing these steps with rigour, avoiding common pitfalls, and producing genuinely insightful analysis requires practice and methodological understanding.

The method's flexibility is both strength and challenge. It can adapt to virtually any qualitative research question across any discipline or context. But this flexibility requires researchers to make thoughtful methodological choices about approach (inductive/deductive, semantic/latent, reflexive/codebook) appropriate for their specific research goals.

For academic researchers, thematic analysis provides a rigorous, transparent method meeting peer review standards while remaining more accessible than specialised approaches like discourse analysis or grounded theory. For business analysts, consultants, and policy researchers, it offers a systematic framework for extracting actionable insights from qualitative data, with clear audit trails and defensible conclusions.

Modern AI-assisted tools like Skimle are transforming the practical execution of thematic analysis, dramatically reducing the time required for systematic coding while maintaining methodological rigour. This efficiency enables more ambitious research designs, larger sample sizes, and faster turnaround times, making systematic qualitative analysis feasible in contexts where time and resource constraints previously forced either superficial analysis or abandoning qualitative approaches altogether.

Approaching thematic analysis as a craft that improves with practice will serve you well. Each project teaches you something new about identifying patterns, interpreting meaning, and transforming complex qualitative data into coherent insights that advance knowledge or inform decisions. At Skimle we believe that thanks to Jevons paradox, the emerge of AI will further increase the need for qualitative researchers as larger and deeper analyses now become possible thanks to advanced tools for interviewing and analysing responses.

Ready to transform your qualitative data into structured insights? Try Skimle for free and experience how AI-assisted thematic analysis can help you handle larger datasets while maintaining academic rigour and generating deeper insights.

Want to deepen your qualitative research skills? Explore our guides on how to conduct effective interviews, determining appropriate sample sizes, analysing interview transcripts systematically, and choosing the right qualitative analysis tools.

About the Author

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile

Complete guide to thematic analysis - from raw data to actionable insights across academic and business settings

Master thematic analysis with this comprehensive guide covering academic research, business consulting, policy analysis, and market research. Learn systematic methods for identifying patterns in interviews, documents, and qualitative data.

What is thematic analysis?

What makes it "thematic"?

Why thematic analysis?

Core methodological approaches to thematic analysis

Inductive vs. deductive approach

Semantic vs. latent themes

Reflexive vs. codebook approaches

The six phases of thematic analysis: comprehensive methodology

Phase 1: Familiarising yourself with the data

Phase 2: Generating initial codes

Phase 3: Searching for themes

Phase 4: Reviewing themes

Phase 5: Defining and naming themes

Phase 6: Producing the report / article / analysis

Applying thematic analysis across contexts

Academic research

Business consulting

Policy and government research

Market and UX research

Legal and compliance

Manual vs. AI-assisted thematic analysis

Traditional manual analysis

AI-assisted analysis (done rigorously, not just "paste to ChatGPT")

Hybrid approaches

Common pitfalls in thematic analysis

Pitfall 1: Confusing topics with themes

Pitfall 2: Unwarranted frequency-based claims

Pitfall 3: Confirmation bias in coding

Pitfall 4: Insufficient or excessive abstraction

Pitfall 5: Overlap and redundancy in themes

Pitfall 6: Quantity over quality in data collection

Advanced techniques and considerations

Cross-case and within-case analysis

Temporal analysis

Negative case analysis

Quantitising qualitative data

Team-based analysis

Ensuring quality and rigour

Transparency

Coherence

Grounding

Reflexivity

Catalytic validity

Conclusion: from method to mastery

About the Author