Building personas from real research data with AI: a practical guide

How to generate user or customer personas from real interview and survey data with AI, the persona types that exist, and how to avoid baking in bias from skewed input.

Cover Image for Building personas from real research data with AI: a practical guide
Share this article:

A persona built from real interview and survey data is only as good as the synthesis behind it: it needs to group respondents who actually share patterns, attach real quotes to back up each trait, and show where the data disagrees rather than smoothing it into one tidy archetype. AI can do the grouping and quote-attribution quickly, but the persona is only trustworthy if you can trace every claim in it back to the data it came from.

One clarification before anything else, because the terminology gets confused constantly: this article is about generating persona documents from real human research data you have already collected, interviews, surveys, support tickets. It is not about synthetic respondents, AI-generated fictional people used to simulate survey answers before you collect any real data. That is a different practice with different risks, and conflating the two leads to bad decisions about when each one is appropriate.

What a persona actually is, and the types that exist

A persona is a composite profile representing a group of real users or customers who share patterns in needs, behaviours, or motivations, built so that a team can make decisions ("would this feature help Maria?") without re-reading every transcript every time. The output is usually a short document: a name, a few defining traits, goals, frustrations, and representative quotes.

Personas are not all the same kind of artefact. A few common types worth distinguishing:

  • Demographic or role-based personas, organised around who someone is (job title, age, industry)
  • Behavioural personas, organised around what someone actually does (how they use a product, how often, in what context), which tend to predict product decisions better than demographic ones because behaviour is closer to the thing you are actually designing for
  • Goal-directed personas, organised around what someone is trying to achieve, which map well onto Jobs-to-be-Done research
  • Proto-personas, quick, low-rigour sketches built from assumptions and a small amount of data, used early when no real research exists yet

Most teams default to the demographic version because it is the easiest to imagine, but it is frequently the weakest predictor of how someone will actually behave. A behavioural or goal-directed persona, grounded in what people in your data actually did and wanted, holds up better once the persona is actually used to make a decision.

Why most personas decay into walls

The well-known failure mode for personas is not that they get built badly. It is that they get built once, printed, pinned to a wall, and never revisited, while the underlying data quietly goes stale or turns out to have been thinner than the polished one-pager suggested. Two structural problems usually cause this:

The synthesis step is slow, so it only happens once. Properly building a persona means reading through dozens of transcripts, grouping similar respondents, and pulling representative quotes for each trait. Done by hand, this takes long enough that most teams do it for a big kickoff project and then never again, even as new research accumulates that should update the picture.

The traceability gets lost in the polish. By the time a persona is a clean one-pager with a stock photo and a catchy name, the link back to which respondents and which quotes actually support each trait is usually gone. This is fine until someone in a meeting asks "wait, do we know that's actually true?" and nobody can answer quickly.

Building personas from your own qualitative data with AI

The practical fix for both problems is the same: keep the synthesis fast enough to redo regularly, and keep every trait linked back to its source.

1. Bring your existing qualitative data into one place. Interview transcripts, open-ended survey responses, support tickets, anything with respondents talking in their own words. Skimle accepts mixed formats and sources in the same project, so you do not need everything in one clean spreadsheet first.

2. Run the thematic analysis before you think about personas at all. Skimle's automatic thematic analysis builds a category structure from what respondents actually said: their goals, frustrations, behaviours, and the language they use. This is the foundation a persona should be built on, not a separate creative exercise that happens after the "real" research is filed away.

3. Use metadata to find natural groupings. If your data includes structured attributes (role, plan tier, usage frequency, tenure), filtering the theme structure by those variables shows you where respondents with different attributes actually diverge in what they said, which is a far more defensible basis for a persona split than guessing at segments in advance. A persona boundary that is invisible in the actual theme data is not a real persona boundary.

4. Draft each persona from the categories that hold together, not from a template. Look for clusters of themes that consistently co-occur in the same respondents. A persona is well-formed when its defining traits are themes that actually travelled together in the data, not separate findings stitched onto one invented character because the slide needed a face and a name.

5. Attach the actual quotes, every time. For each trait you put in a persona, link the specific quote it came from. This is what two-way transparency is for: if someone questions whether "frustration with onboarding" is real for this segment, you can show them the quotes in seconds rather than asking them to trust the document.

6. Re-run it when new data arrives. Because the analysis takes minutes rather than weeks, a persona built this way can be refreshed every time a new batch of interviews or survey responses comes in, rather than becoming a wall decoration from a project that wrapped up eighteen months ago.

Common mistakes that make personas useless

Naming the persona before the data is analysed. Once a persona has a name and a stock photo, it is much harder to revise, because the team has already started treating "Maria" as a real fact rather than a working hypothesis about a segment. Hold off on names and faces until the underlying theme clusters are stable.

Mixing unrelated traits into one persona because there are only three slides left. A persona deck with a fixed number of personas, decided in advance of the analysis, pressures whoever builds it to merge groups that do not actually belong together. Let the number of personas come out of how many distinct clusters the data actually supports, even if that number is inconveniently two or inconveniently seven.

Treating a persona as permanent. A persona describes a segment as it looked in the data available at the time. Markets shift, products change, and a persona built two product cycles ago describing customer goals that no longer match the current product is actively misleading, not just outdated.

Skipping the quote-checking step because the persona "feels right." A persona that confirms what the team already believed is the one most likely to get rubber-stamped without checking the underlying quotes, which is exactly backwards: a finding that conveniently confirms existing assumptions deserves more scrutiny, not less.

This is where product teams most often go wrong with personas: building them once at the start of a product's life and then making roadmap decisions against an increasingly outdated picture for years afterward.

The bias risk that AI persona generation actually has

AI does not remove the most important risk in persona-building, it just moves it. If the underlying data over-represents one kind of respondent (the customers who reply to surveys, the users who bother to give interviews, one market more than another), an AI-assisted synthesis will faithfully summarise that skew into a confident-sounding persona, without flagging that the input was unbalanced in the first place. A clean, well-written persona document can make a biased sample look more authoritative, not less.

The practical defence is checking representativeness in the data before trusting the synthesis: look at who is and is not in your dataset, and use metadata filtering to see whether a trait holds up across different segments or only describes the loudest group in the room. A closer look at how AI bias shows up specifically in qualitative analysis, and what to do about it, is worth reading before you ship a persona deck built on a sample you have not interrogated.

Frequently asked questions

How many interviews or responses do I need to build a reliable persona?

There is no fixed number, but a persona built from fewer than 15-20 respondents per segment is fragile, since one or two unusual respondents can dominate the picture. Larger, AI-assisted analysis makes it practical to base personas on the 40-60+ respondents a confident segment claim really needs.

Should I use behavioural or demographic personas?

Behavioural and goal-directed personas tend to predict product decisions better, because they are built on what people actually do and want rather than who they are on paper. Demographic traits are useful as descriptive colour, not as the primary basis for the persona split.

How is this different from using ChatGPT to write a persona?

A generic chat tool can write a plausible-sounding persona from a prompt, but it cannot show you which of your actual respondents support each trait, and it will happily invent specific-sounding details that are not in your data. The traceability gap is exactly the problem that makes a polished AI-written persona risky to present without verification.

Can I update personas as new research comes in?

Yes, and you should. Because the synthesis step is fast rather than a multi-week project, re-running the analysis on an updated dataset and checking whether the persona structure still holds is realistic to do quarterly or even after every major research project, rather than once a year if at all.

What is the difference between a persona and a market segment?

A segment is usually defined by measurable variables, firm size, spend, region, decided in advance and applied to data afterward. A persona is built from the data itself: the goals, frustrations, and behaviours that actually co-occur in real respondents. The two can align, but a persona built only to match a pre-existing segment, rather than from what the data actually shows, is closer to a label than a research finding.


Ready to build personas you can actually defend? Try Skimle for free on your existing interview or survey data and see the categories a persona should be built from.

Want the research behind better personas? Read how to synthesise user research and Jobs-to-be-Done interview methodology.


About the authors

Henri Schildt is a Professor of Strategy at Aalto University School of Business and co-founder of Skimle. He has published over a dozen peer-reviewed articles using qualitative methods, including work in Academy of Management Journal, Organisation Science, and Strategic Management Journal. Google Scholar profile

Olli Salo is a co-founder at Skimle and former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile


Sources