AI qualitative data analysis checklist: 20 questions before you publish

A practical pre-publication checklist for qualitative research that used AI tools — covering documentation, traceability, transparency, and peer reviewer expectations.

Cover Image for AI qualitative data analysis checklist: 20 questions before you publish
Share this article:

Using AI tools in qualitative analysis is increasingly common and increasingly accepted — but acceptance is conditional. Peer reviewers and ethics boards have specific expectations about documentation, transparency, and the role of human judgement in AI-assisted work. Research that does not address these expectations invites rejection or revision requests that could take months to resolve.

This checklist covers 20 questions you should be able to answer before submitting qualitative research that used AI assistance. It is organised around the four areas reviewers scrutinise most closely: documentation, traceability, quality and bias, and human oversight.

Work through this before finalising your methods section. Many of the items below are not just good practice — they are increasingly explicit requirements in journal submission guidelines.


Section 1: Documentation (questions 1-6)

1. Have you named the AI tool and described its specific role?

"AI-assisted analysis" is not enough. Your methods section should name the specific tool (ChatGPT-4o, Skimle, ATLAS.ti AI Assist, etc.) and describe precisely what it did: initial code generation, transcript summarisation, theme clustering, quote retrieval, or something else.

Reviewers increasingly expect the same level of specificity you would apply to any other analytical instrument. If you used multiple tools at different stages, document each one separately.

2. Have you documented the prompts or settings used?

If you used a prompt-based tool (ChatGPT, Claude, Gemini), document the prompts you used for each analytical task, or at minimum describe their structure and intent. If you used a structured tool with fixed settings (like Skimle), document the configuration — number of theme levels, any custom instructions, metadata variables used for filtering.

This matters because another researcher attempting to replicate or evaluate your method needs to understand what the AI was asked to do, not just that it was used.

3. Have you described how you validated and refined the AI's output?

The AI produced a first pass. What did you do with it? Your methods section should describe the validation process: did you read through the AI's code list and add, remove, or modify codes? Did you check that themes reflected the actual data rather than plausible-sounding patterns? How many rounds of refinement were involved?

Reviewers are not looking for perfection here. They are looking for evidence that the AI's output was critically engaged with rather than accepted uncritically.

4. Have you addressed AI use in your ethics documentation?

Depending on your institution and funder, AI-assisted analysis may need to be addressed in your ethics application or consent forms. Relevant questions: does your data leave your institution's environment if processed by a cloud AI tool? Were participants informed their data might be processed by third-party AI? Does the AI tool store or use your data for model training?

Check your institution's guidance. This is an area where norms are still developing, and what is expected varies significantly between institutions and jurisdictions.

5. Have you cited the methodological literature for the analytical approach, not just the AI tool?

AI tools assist with analysis; they do not replace the methodological framework. If you conducted thematic analysis with AI assistance, your methods section still needs to cite Braun and Clarke (or whichever framework you used) and describe the analytical procedure in relation to that framework. The AI tool citation is additional, not a substitute.

See how to write up a thematic analysis for the full methods section structure.

6. Is your data handling compliant with privacy regulations?

If your research involves personal data (which most interview-based qualitative research does), you need to confirm that processing that data through an AI tool is compliant with GDPR, HIPAA, or the relevant regulatory framework in your jurisdiction. Most cloud AI providers offer data processing agreements, but you need to have signed one and confirmed it covers your use case.

Anonymisation before processing is the safest approach — but anonymisation has limits, and some combinations of demographic information can re-identify participants even without names.


Section 2: Traceability (questions 7-11)

7. Can every theme in your results be traced back to specific quotes?

This is the core quality requirement for AI-assisted qualitative analysis, and it is the same requirement as for manual analysis. For each theme you report, there should be a set of quotes that support it, drawn from specific participants in specific documents.

Tools that maintain this linkage automatically — where you can navigate from a theme to the supporting quotes to the source document — make this easy to demonstrate. If your AI tool did not maintain this linkage, you need to establish it manually before writing up your results.

8. Have you checked that the AI's reported quotes are verbatim?

Large language models occasionally generate plausible-sounding quotes that are not verbatim from the source document. This is well-documented and not specific to any one tool, but it varies significantly between tools depending on their architecture. At Skimle, we ended up using fully deterministic programming (string matching) to check each quote verbatim.

Before citing any quote that came from an AI-assisted process, verify it against the original transcript. This is non-negotiable. Fabricated quotes are a research integrity issue, and "the AI generated it" is not a defence.

9. Can you identify which participants or documents contributed to each theme?

Your results section should be able to say (at minimum) how many participants expressed each theme, and ideally which participant segments or subgroups showed the theme more prominently. If the AI produced themes that you cannot connect back to specific participants, the analysis is not sufficiently grounded.

This also matters for transferability claims. If Theme 3 only appeared in interviews with one subgroup of participants, that is analytically important and needs to be reported.

10. Are your themes grounded in the data, not in the AI's general knowledge?

This is the subtlest risk in AI-assisted qualitative analysis. Large language models have extensive general knowledge about most domains. When analysing interview data about, say, employee motivation, the AI may produce themes that reflect well-known theories of motivation (self-determination theory, Maslow's hierarchy, etc.) rather than what your specific participants actually said.

Check each theme: does it emerge from the data in this study, or does it map onto a pre-existing framework the AI knows about? The latter is not automatically wrong, but it should be the result of a deliberate analytical choice, not an artefact of the AI's training data.

11. Have you retained the source data and your analytical record?

Journal data archiving requirements are tightening. Many journals now expect qualitative researchers to retain and, in some cases, share analytical records — codebooks, memos, theme hierarchies — even if the underlying transcripts cannot be shared due to participant confidentiality.

Ensure you have retained the AI's outputs, your validation notes, and your final codebook in a form that could be provided to a reviewer or deposited in a data repository if required.


Section 3: Quality and bias (questions 12-16)

12. Have you conducted a negative case analysis?

Negative case analysis — actively seeking data that challenges or complicates your emerging themes — is a quality criterion in most qualitative traditions, and it becomes more important when AI tools are used. AI tools are pattern-matching systems; they are better at finding what fits a pattern than what disrupts it.

Deliberately look for interview segments that do not fit your themes, or where participants expressed conflicting views. Document what you found and how it affected your analysis.

13. Have you addressed your own positionality?

Reflexivity is a requirement in most qualitative traditions, and using AI tools adds a dimension to it. You need to address both your own positionality (how your background may have shaped your interpretive choices) and the positionality embedded in the AI tool (what kinds of patterns and framings the model is likely to favour given its training data).

The latter is harder to address fully, which is an argument for tools that make their analytical process more transparent.

14. Have you checked for systematic gaps or biases in what the AI identified?

AI tools may under-identify certain types of content: emotionally nuanced passages, culturally specific references, implicit communication, or content that requires contextual knowledge the model does not have. If your research touches on topics where these limitations are likely to matter, document them and describe what additional steps you took to address them.

15. Have you applied the appropriate saturation standard?

If you are claiming thematic saturation, your methods section needs to specify which saturation standard you applied (code saturation or meaning saturation) and provide an empirical basis for the claim. See how many interviews are enough for qualitative research for the relevant literature on saturation thresholds.

AI-assisted analysis does not change the saturation standard. It may change when you reach saturation (because you can process more data more quickly), but the standard itself is a property of your research question and population, not your analytical tool.

16. Have you addressed the quality of your source data?

AI analysis quality is constrained by input data quality. If your transcripts are low quality (incomplete, inaccurate, heavily paraphrased), the AI's output will reflect those problems. If you used automated transcription, note the accuracy level and any limitations.


Section 4: Human oversight (questions 17-20)

17. Can you clearly state that analytical judgements were made by the researcher?

This is the core accountability question for AI-assisted research. The AI organised and surfaced patterns in the data. The researcher made interpretive judgements: which patterns matter, what they mean, how they relate to each other and to the research question.

Your methods section should make this division clear. If a reviewer cannot identify where the human interpretation happened in your analytical process, you will be asked to clarify — or to defend claims that may be difficult to substantiate.

18. Have you engaged with the AI's output critically rather than accepting it?

The most common failure mode in AI-assisted qualitative analysis is using the AI's output as the final answer rather than as a starting point. If your results section presents themes exactly as the AI generated them, without evidence of your own interpretive engagement, reviewers will notice.

Concrete signs of critical engagement: you renamed or redefined some of the AI's themes; you merged themes that the AI separated; you split themes the AI combined; you added themes the AI missed; you rejected themes the AI proposed as insufficiently grounded. Document these decisions in your methods section.

19. Is your methods section sufficient for another researcher to evaluate your analytical process?

This is the reproducibility standard applied to qualitative work. Another qualitative researcher should be able to read your methods section and understand, in enough detail to evaluate, how your analysis was conducted. They do not need to be able to replicate it exactly (qualitative analysis is not replicable in the quantitative sense), but they should be able to assess whether your process was systematic and appropriate.

Peer review, at its best, is this evaluation. Write your methods section for a sceptical but fair reviewer, not for a sympathetic one.

20. Have you read your journal's specific guidance on AI-assisted research?

Many journals have published specific guidance on AI use in research in the past 12-18 months. Some require disclosure in a specific section; some restrict certain types of AI assistance; some have specific requirements for documentation. Check your target journal's author guidelines before submitting.

Major publishers including Elsevier, Springer Nature, Wiley, and SAGE have all published position statements. The guide on using AI in qualitative research for academics covers the current state of journal-specific requirements in detail.


Working through the checklist

Not every item on this list applies to every study. A study using AI only for transcript summarisation (not coding or theme development) has different documentation requirements from one where AI drove the entire thematic analysis. Use your judgement about which items are most relevant to your specific use of AI tools.

What the checklist is designed to prevent is the common situation where researchers have used AI tools effectively but have not documented that use in ways that will satisfy reviewers. Good AI-assisted research is defensible; undocumented AI-assisted research is not.

For a structured workflow that builds traceability and documentation into the analysis process from the start, see how Skimle approaches thematic analysis. The approach is designed so that the AI's contribution is auditable at every stage and the human researcher's interpretive choices are preserved in the analytical record.


Ready to run your qualitative analysis through a process that generates an auditable analytical record by design? Try Skimle for free and see how structured AI-assisted analysis handles the documentation requirements.

Related reading:


About the author

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile


Sources