How to do descriptive coding in qualitative research: a step-by-step guide

Descriptive coding is a first-cycle qualitative coding method where you apply a fixed set of short, summary labels to passages of data — describing what each passage is about rather than interpreting what it means. To do it: build a codebook of descriptive labels derived from your research questions, apply those labels consistently to every relevant passage across your transcripts, and use the coded structure as the foundation for more interpretive analysis. With Skimle's predefined categories feature, you define your codebook in advance, and the AI applies it across your full corpus simultaneously — then you review, adjust, and refine each assignment. The result is a consistently coded dataset in hours rather than weeks, with every code traceable back to the source passage.

What is descriptive coding?

Descriptive coding — sometimes called topic coding — is the practice of labelling passages of qualitative data with short phrases that capture their subject matter. Where in vivo coding uses the participant's own words as codes, and process coding uses gerunds to describe actions, descriptive coding uses the researcher's neutral summary of what is being talked about.

Johnny Saldaña, in The Coding Manual for Qualitative Researchers, describes descriptive coding as assigning labels that "summarise in a word or short phrase — most often as a noun — the basic topic of a passage." The codes are descriptive rather than interpretive: they tell you what the data is about, not what it means.

A descriptive code might look like:

ONBOARDING EXPERIENCE
MANAGER RELATIONSHIP
PRICING CONCERN
FEATURE REQUEST
TECHNICAL ISSUE
COMPETITOR MENTION

Each code is applied to passages wherever that topic appears. Once coded, you can sort by code to gather everything participants said about onboarding, or everything that mentioned competitors, and begin to see patterns across the corpus.

When to use descriptive coding

Descriptive coding is the right approach when you are working with a predefined set of topics you need to map across your data. It suits:

Research with a fixed question structure. If you ran interviews or surveys with a structured interview guide, your questions define a natural set of descriptive codes. Every interview covered the same topics — descriptive coding lets you gather responses by topic across all participants.

Deductive analysis. When your analytical framework is established before fieldwork begins — from a theoretical model, a policy question, or a prior literature review — descriptive coding operationalises that framework. You are applying a structure to the data, not building a structure from it.

Content analysis at scale. Descriptive coding is the backbone of systematic qualitative content analysis. When you need to document the frequency and distribution of topics across a large corpus (support tickets, consultation responses, open-text surveys), descriptive coding produces the structured counts that more interpretive methods cannot.

Mixed-methods research. Descriptive codes produce data that can be quantified — how many participants mentioned X, how often topic Y appeared in each cohort — which makes them useful for connecting qualitative and quantitative strands of a study.

It is worth distinguishing descriptive coding from thematic analysis, which is inductive and interpretive. Thematic analysis builds themes from the data upward; descriptive coding applies predetermined labels downward. Many research projects use both: descriptive coding as a first pass to organise the data, thematic analysis to interpret what the patterns mean.

How to build a descriptive codebook

A codebook is the reference document that defines every code: its name, definition, inclusion and exclusion criteria, and example passages. Without a codebook, codes drift — the same label gets applied differently by different coders, or differently by the same coder on Tuesday versus Friday.

A good codebook entry for each code includes:

Code name: A short noun phrase, in capitals by convention. PRICING CONCERN, not "concern about pricing" or "they mentioned price."

Definition: One to two sentences on what this code covers. "Applied to any passage where the participant raises concerns, objections, or uncertainty about the product's price, pricing model, or value-for-money."

Inclusion criteria: What counts. "Includes direct price comparisons, mentions of budget constraints, questions about discounts or tiers."

Exclusion criteria: What does not count. "Does not include general satisfaction scores or passages where price is mentioned positively without concern."

Example passage: A brief quote from your data or a constructed example that a coder could use as a benchmark.

Building the codebook typically happens in two passes. The first pass uses your research questions and interview guide to define an initial set of codes — these are the topics you know you need to cover. The second pass runs after you have read a subset of your transcripts (typically the first five to eight), and it adds codes for topics that came up but were not anticipated. After this, the codebook is locked for consistent application across the rest of the corpus.

How to apply descriptive coding manually

Manual descriptive coding follows a standard workflow:

1. Segment the data. Work through each transcript and identify passages that represent a single topic or unit of meaning. Passages may be a sentence, a paragraph, or a longer exchange — the right unit depends on your data. Mark the boundaries.

2. Apply the code. For each passage, assign the code from your codebook that best describes its topic. Some passages will receive more than one code if they cover two topics simultaneously.

3. Check against the codebook. When you are uncertain, consult the definition and example passage in the codebook before assigning. Do not code from memory — the codebook is authoritative.

4. Flag uncoded passages. If a passage is topically relevant but does not fit any existing code, flag it for review. This is how you discover that your codebook needs a new category.

5. Calculate inter-rater reliability if needed. For academic research where rigour of coding needs to be demonstrated, a second coder applies the same codebook to a sample of the data. Agreement is measured (typically Cohen's kappa). Discrepancies are resolved through discussion, which often leads to clearer codebook definitions.

6. Compile by code. Once all transcripts are coded, gather every passage assigned to each code into a single view. This is where analysis begins: you can now read everything participants said about PRICING CONCERN in one place, sorted by participant, segment, or interview date.

The bottleneck is time. A corpus of 30 one-hour interviews — roughly 300,000 words — takes most researchers four to six weeks of intensive reading and coding at this level of care. The codebook helps with consistency, but it does not reduce the volume of reading.

How to do descriptive coding at scale with Skimle

Skimle's predefined categories feature is built for exactly this workflow: you define your codebook, and the AI applies it across your full corpus simultaneously.

Step 1: Upload your documents. Import your transcripts, open-text responses, or other qualitative documents into a Skimle project. The platform accepts PDFs, Word documents, and plain text files. If you recorded your interviews, AI transcription can convert audio directly.

Step 2: Define your categories. In the predefined categories mode, enter each code from your codebook along with its definition. The more precise your definition, the more accurately the AI will apply it — this mirrors the codebook logic for human coders. You can enter as many categories as your framework requires.

Step 3: Run the analysis. The AI reads every document and assigns your predefined codes to passages that match each definition. It works across the entire corpus simultaneously, applying the same codebook consistently regardless of how many documents you have.

Step 4: Review the assignments. This is the critical step. Skimle shows you every passage the AI assigned to each category, with the full source context visible. You can:

Confirm assignments that are correct
Move a passage to a different category if the AI miscoded it
Remove a passage that was incorrectly included
Add passages the AI missed by browsing the document directly

The review step is where your judgement operates. The AI applies the codebook mechanically and consistently; you apply the interpretive knowledge that only the researcher has. For passages near the boundary of a code's definition, your decision during review effectively refines how the codebook is being applied.

Step 5: Identify uncoded material. Skimle shows you what proportion of each document was assigned to your predefined categories. Passages that received no code are visible for review — this is where you discover whether your codebook has gaps that need a new category.

Step 6: Analyse by code and segment. Once the coding is complete, you can sort all passages by code, filter by document metadata (participant segment, interview wave, cohort), and export the full coded dataset. The metadata analysis feature lets you compare code frequencies across segments — for example, how often PRICING CONCERN appears in SMB versus enterprise interviews.

For a corpus of 30 interviews, this workflow typically takes a few hours rather than weeks — with the review step ensuring that the speed does not come at the cost of accuracy.

Reviewing and refining AI-applied codes

The review step deserves more attention because it is where descriptive coding with AI differs most from descriptive coding without it. In manual coding, the coder makes every assignment decision. In AI-assisted coding, the coder makes review decisions — confirming, correcting, or overriding the AI's assignments.

This is a genuine methodological difference, and it matters for how you write about the coding process in a methods section. The key principle is that every assignment in the final dataset has been reviewed by a human researcher. The AI is not an autonomous coder; it is a first-pass reader that applies the codebook to a scale no human could match alone.

A few practices that improve review quality:

Sample-check first. Before reviewing the full corpus, take a random sample of 20-30 passages from each code and read them carefully. This quickly reveals whether the AI understood your definition correctly or whether the definition needs sharpening.

Review borderline cases closely. Skimle flags lower-confidence assignments. These are the passages that fall near the edge of a code's definition — exactly where a human coder would consult the codebook most carefully. Spend disproportionate time here.

Document your decisions. When you move or remove a passage during review, note why. This documentation forms part of your audit trail and is useful when writing the methods section or responding to peer reviewer questions.

Update the definition if needed. If you find yourself consistently correcting the AI in one direction — say, it keeps including passages that you keep removing — the issue is usually that the definition is ambiguous. Refine it and note the version change.

This review process is methodologically analogous to inter-rater reliability checking, but faster. You are not starting from scratch to verify the coding; you are reviewing an already-coded dataset and correcting errors. For large corpora, this is a significant efficiency gain even after accounting for review time.

Moving from descriptive codes to thematic analysis

Descriptive coding organises your data by topic. Thematic analysis interprets what those topics mean across participants. Most research projects need both.

Once your descriptive coding is complete, the coded passages for each topic become the input for the next analytical stage. For each code:

Look for patterns within the code. Gather everything coded as ONBOARDING EXPERIENCE and read it as a set. What is consistent across participants? What varies? Are there sub-types — onboarding that went well, onboarding that created lasting problems, onboarding that participants had workarounds for?

Compare across segments. If your data includes metadata variables (participant role, company size, tenure), compare code frequencies and content across those dimensions. PRICING CONCERN might mean very different things for individual users versus procurement teams.

Identify relationships between codes. Some topics co-occur consistently — every time TECHNICAL ISSUE appears, MANAGER RELATIONSHIP appears nearby. These co-occurrence patterns are often where the most interesting analytical claims emerge.

Move toward themes. Descriptive codes answer "what are they talking about?" Themes answer "what does this mean?" The leap from code to theme is the interpretive work that the researcher — not the AI — must do. If you are working in the reflexive thematic analysis tradition, this is the stage where your theoretical position and analytical perspective shape the themes you name.

Skimle's automatic thematic analysis can run over the same corpus after descriptive coding to surface interpretive themes from the coded material — giving you both levels of analysis from the same project. See how to analyse interview transcripts for the full analytical workflow from raw transcript to final themes.

Frequently asked questions

What is the difference between descriptive coding and thematic analysis?

Descriptive coding is deductive: you apply a predetermined set of labels to describe what each passage is about. Thematic analysis is typically inductive: you build themes upward from the data, looking for patterns of meaning rather than applying predefined categories. Descriptive coding produces organised, searchable data; thematic analysis produces interpretive findings. Most research projects use descriptive coding as a first pass and thematic analysis as the interpretive stage. See thematic analysis: a complete guide for the full method.

How many codes should a descriptive codebook have?

Enough to cover the topics your research needs to address, no more. A codebook with 8-15 codes is manageable and produces results you can actually use in analysis. A codebook with 60 codes produces a dataset so fragmented that patterns are hard to see. If your research requires many codes, consider a hierarchical structure: 8-10 parent codes, each with 3-5 child codes.

Can I add new codes after analysis has started?

Yes, but with care. Adding a new code mid-analysis means you need to go back and check whether earlier documents contain passages that should have received the new code. With Skimle, you can add a new category, rerun the analysis for that category only, and review the new assignments across the full corpus — which is significantly faster than going back through transcripts manually.

How is this different from content analysis?

Descriptive coding and content analysis overlap significantly. Content analysis is the broader methodology; descriptive coding is one of the coding approaches used within it. Quantitative content analysis counts the frequency of codes. Qualitative content analysis uses coding to organise data for interpretation. Descriptive coding sits at the intersection: it produces a structured, countable dataset while preserving access to the full qualitative context of each passage.

How do I report descriptive coding in a methods section?

Describe the codebook development process (how codes were derived, from what sources), the application process (how codes were applied to the data), any reliability procedures (inter-rater checks, review process), and any changes made to the codebook during analysis. If you used AI-assisted coding, describe the tool, the review process, and the researcher's role in confirming assignments. Many journal editors now accept AI-assisted coding as methodologically sound, provided the review process is clearly documented.

What qualitative research software supports predefined codebooks?

NVivo, MAXQDA, and ATLAS.ti all support manual coding with a predefined codebook. None of them apply the codebook automatically across the corpus — the researcher applies every code manually. Skimle's predefined categories feature applies your codebook across the full corpus using AI and then surfaces the assignments for researcher review — which is a different workflow, not a different method.

Ready to apply your codebook across your full corpus without spending weeks on manual coding? Try Skimle for free — define your descriptive codes, run predefined category analysis, and review every assignment with full source traceability.

Related reading:

About the authors

Henri Schildt is a Professor of Strategy at Aalto University School of Business and co-founder of Skimle. He has published over a dozen peer-reviewed articles using qualitative methods, including work in Academy of Management Journal, Organisation Science, and Strategic Management Journal. His research focuses on organisational strategy, innovation, and qualitative methodology. Google Scholar profile

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile