How to analyse open text responses at scale

You have just closed a survey with 847 respondents. The multiple-choice questions are already in neat charts. But then you scroll to the final open-ended question: "What else would you like us to know?"

There are 312 responses. Each one is different. Some are single words ("Nothing"), others are paragraph-long essays about specific frustrations or feature requests. One person has written about pricing, another about customer service, a third about a bug in the mobile app, and a fourth about how much they love the product but wish it integrated with their CRM system.

You have two choices: spend days reading and categorising every response manually, or skim the surface and miss most of what your respondents actually said. There has to be a better way.

The challenge: hundreds of responses, dozens of different topics

Open text survey responses are uniquely difficult to analyse because respondents do not follow your agenda. Unlike interview data where you can guide the conversation, or focus group transcripts where themes emerge through discussion, survey responses are scattered observations across dozens of different topics.

A typical dataset might include:

200 responses about product features (but mentioning 15 different features)
50 responses about pricing (ranging from "too expensive" to "great value")
30 responses about customer support experiences
20 responses about the onboarding process
15 responses mixing multiple topics in the same answer
Plus assorted comments about everything from UI design to delivery times to competitor comparisons

The fundamental challenge is that every response discusses something slightly different. You cannot just read through them linearly and expect coherent insights to emerge. You need structure.

What most people do (and why it fails)

When faced with hundreds of open text responses, most teams default to one of three approaches:

Approach 1: Word clouds and frequency counts

Tools like SurveyMonkey or Google Forms will generate a word cloud showing that "price," "quality," and "support" appear frequently. This tells you almost nothing useful. Yes, people mentioned those words. But did they say the price was too high or surprisingly reasonable? Was the support helpful or frustrating? Word clouds strip away all context and nuance, leaving you with pretty visualisations that cannot inform decisions.

Approach 2: Cherry-picking interesting quotes

Someone reads through the responses and pulls out 5-10 quotes that seem representative or particularly compelling. These get dropped into a presentation deck with headers like "Customer Feedback" or "What Users Are Saying." The problem is that cherry-picking is inherently biased. You notice quotes that confirm what you already believed or are emotionally striking, and you miss the 99% of responses that contain more subtle but equally important patterns.

Approach 3: Manual categorisation without proper methodology

A well-meaning analyst creates categories on the fly ("positive feedback," "feature requests," "complaints") and starts sorting responses. But without systematic methodology, the categories become either too broad to be useful ("27 responses about features" tells you nothing) or too specific to capture meaningful patterns (50 tiny categories with 1-2 responses each).

These approaches fail because they lack the fundamental structure that makes qualitative analysis rigorous and reliable.

The proper way: systematic categorisation and analysis

Professional qualitative researchers follow a clear process when analysing open text responses at scale:

Step 1: Create a robust category framework

Read through a sample of responses (maybe 50-100) to understand the range of topics being discussed. Create categories that are:

Mutually exclusive: Each response should fit clearly into one category
Meaningful: Categories should represent actionable insights, not superficial groupings
Appropriately granular: Not so broad that they lose meaning, not so specific that patterns disappear

For example, instead of a single "product feedback" category, you might create:

Feature requests: Mobile app functionality
Feature requests: Reporting and analytics
Feature requests: Integration capabilities
Product usability issues
Performance and reliability concerns

Step 2: Code every response into categories

Go through each response systematically and assign it to the appropriate category. Some responses will mention multiple topics and need to be coded into several categories. This is time-consuming but essential. You cannot understand what your respondents are actually saying if you only analyse a sample.

Step 3: Summarise each category

Once responses are grouped, write a summary for each category that captures:

The main point or pattern across these responses
The range of opinions or experiences expressed
Representative quotes that illustrate the theme
How many responses fell into this category

For example: "Customer support responsiveness (42 responses): Most respondents praised the speed of initial responses but noted frustration when issues required multiple interactions. Common pattern: fast first reply, slow resolution. 'Your support team replies within hours, but then it takes days to actually fix the problem' - typical of this group."

This systematic approach gives you defensible insights backed by the actual data.

The time problem: doing it properly takes days

Here is where theory meets reality. Following proper qualitative methodology for 312 open text responses can easily take 15-20 hours of focused work:

Creating a robust category framework: 2-3 hours
Coding all responses: 8-10 hours (about 2 minutes per response when done carefully)
Writing category summaries: 3-4 hours
Reviewing for consistency and refining categories: 2-3 hours

That is multiple days of work for a single survey question. If you have multiple open-ended questions or collect feedback regularly, the time requirement becomes prohibitive. Teams either skip the analysis entirely or rush through it so quickly that the categories become meaningless and the summaries superficial.

Fast but sloppy analysis is arguably worse than no analysis. If your categories are poorly defined or your summaries miss key points, you are making decisions based on misleading interpretations of what respondents said. That is more dangerous than admitting you have not looked at the data yet.

Why basic AI tools do not solve this problem

When ChatGPT and similar tools became widely available, many people assumed they had found the solution. Just paste your responses into a chat interface, ask it to "analyse and summarise," and get instant insights. The reality is more complicated.

Basic AI chatbots fail in several critical ways when dealing with real-world open text response datasets:

Problem 1: They forget or hallucinate responses

Ask a chatbot to analyse 312 responses and it will often skip some entirely, especially responses that are longer or touch on multiple topics. Worse, it sometimes invents quotes that sound plausible but no respondent actually said. When you are trying to understand what customers genuinely think, you cannot afford tools that make up data. As we have explored in detail about whether ChatGPT can analyse qualitative data, basic prompting approaches cannot maintain accuracy across large datasets.

Problem 2: Categories lack coherence and robustness

Chatbots will generate categories, but they often lack internal consistency. Run the same analysis twice and you get different categories. Or the categories overlap confusingly ("product improvements" versus "feature requests" versus "enhancement suggestions"). Professional qualitative analysis requires stable, well-defined categories that different people would apply consistently. Generic AI tools do not provide this.

Problem 3: No transparency or verifiability

When a chatbot tells you "35% of responses were about pricing concerns," you have no way to verify this. Which specific responses were counted? What exact criteria defined "pricing concerns"? How did it handle responses that mentioned pricing alongside other topics? The lack of two-way transparency from summary back to source data means you are taking the AI's word for it without any ability to audit or verify.

Problem 4: Shallow summaries that miss nuance

Chatbots excel at producing plausible-sounding text but struggle with capturing the actual nuance in what people said. A summary like "Most users want better mobile functionality" sounds reasonable but misses that 15 users want specific calendar integration, 8 want offline mode, and 12 are frustrated that the mobile app does not have feature parity with desktop. The general summary obscures the actionable specificity.

Basic AI tools give you the illusion of analysis without the rigour. For critical business decisions based on customer feedback, that is an unacceptable trade-off. The challenge is finding tools that combine AI efficiency with methodological soundness.

How to analyse open text responses properly with AI

Analysing open text survey responses at scale requires tools designed specifically for this task, using proper qualitative analysis methodology rather than generic text generation.

Import your responses

Export your survey data as a CSV file with one column containing all the open text responses. Import this into a qualitative analysis platform that can handle structured categorisation.

Let AI create an intelligent category framework

Rather than manually reading through samples to identify themes, AI can rapidly identify the main topics being discussed across all responses. The key difference from chatbots is that proper tools create a stable category framework based on the actual data structure, not just pattern matching that changes each time you run it.

The categories should automatically:

Capture the major themes respondents actually discussed (not predetermined categories you expected)
Be granular enough to be actionable (not generic buckets like "positive" and "negative")
Handle responses that touch on multiple topics by coding them into several categories

Review and refine the categories

AI-generated categories are a starting point, not a final answer. You should be able to:

Rename categories to be more specific or clearer
Merge categories that overlap
Split categories that are too broad
See which responses were assigned to each category and verify the logic

This is where quality becomes the differentiator. Tools that allow this level of refinement and control enable you to maintain analytical rigour while benefiting from AI speed.

Get summaries with full transparency

Each category should produce a summary that shows:

How many responses fell into this category
The key patterns or messages across these responses
Representative quotes illustrating the theme
Complete traceability so you can click through to see every response that was categorised this way

This transparency means you can verify that the AI correctly understood the responses and that the summaries accurately reflect what people said. You are not blindly trusting a black box.

Try it yourself

If you have open text survey responses sitting in a spreadsheet waiting to be analysed, try Skimle for free. Import your CSV file, let the AI create an intelligent category structure, review and refine as needed, and generate summaries that maintain full transparency back to the source data.

Proper analysis of open text responses does not have to take days or sacrifice rigour for speed. With the right tools, you can turn scattered feedback into structured insights that actually inform decisions.

Want to learn more about qualitative analysis methods? Read our guides on thematic analysis methodology and choosing the right qualitative analysis tools. If you want to collect richer qualitative responses in the first place rather than relying on survey text boxes, gathering rich data with AI interviews covers how AI-guided conversations produce more actionable input than open-text fields.

About the Authors

Henri Schildt is a Professor of Strategy at Aalto University School of Business and co-founder of Skimle. He has published over a dozen peer-reviewed articles using qualitative methods, including work in Academy of Management Journal, Organization Science, and Strategic Management Journal. His research focuses on organisational strategy, innovation, and qualitative methodology. Google Scholar profile

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile

How to analyse open text responses at scale - without losing your mind

The dreaded open text answer box on the last page. Researchers fear the cumbersome data it will create, and respondents worry if anybody is listening. Is there a smart way to turn answers to insights?