How to anonymise interview transcripts when conducting sensitive business interviews

You have 20 interview transcripts from talking with senior executives about a sensitive corporate transformation programme. You promised confidential interviews and the people really opened up to you.

But. The transcripts now need to go to a team of four junior analysts, quotes need to be sourced for executive presentations, and the client wants the key pieces of the research shared across business units. You can't take a risk and share the raw files. You also cannot spend a week doing manual find-and-replace of all sensitive data. With Skimle Anonymise, you can de-identify the full set in under an hour — with AI detection across six identifier categories and full user control.

The business case for getting this right

Most research teams know they should anonymise. The discipline often slips in practice because the manual approach is slow and the consequences of doing it imperfectly are not immediately visible.

But the exposure is real. Interview transcripts from a corporate transformation contain full names, job titles, references to specific business units, dates of key events, and details that make individuals identifiable even without a name: "the only female engineer on the production floor", "the CFO who joined from the acquisition", "the manager based in the Tampere office". Uploading these files to a shared drive, pasting quotes into a PowerPoint, or feeding them into an AI tool without proper de-identification is a breach of the promise made to participants. Under GDPR, it may also be a breach of the regulation, which treats interview data about identifiable individuals as personal data.

For consultants working on sensitive mandates — restructurings, due diligence, leadership assessments — the reputational and client-relationship consequences of a de-identification failure can be severe. For market research teams building longitudinal panels or tracking studies, getting anonymisation right once and doing it consistently matters as much as any other methodological decision.

You want to stop sensitive data at the source, before it spreads all across your project. The workflow below is designed for practitioners who need to do this properly without it becoming the longest step in the project.

A quick note on terminology: Anonymisation, pseudonymisation, and redaction are often used interchangeably but describe different levels of protection. In practice, most corporate research requires pseudonymisation: identifiers are replaced with codes or labels, a re-identification key is retained internally, and the pseudonymised transcripts are what gets shared. True anonymisation — where re-identification is not reasonably possible and the key is destroyed — is a higher bar and is typically reserved for the most sensitive academic or clinical contexts. Redaction removes text entirely, which tends to produce transcripts that lose analytical value. For a detailed treatment of the differences and how they map to GDPR and HIPAA, see our guide to IRB-compliant anonymisation for qualitative research. The workflow below uses pseudonymisation throughout, which is the right approach for most corporate research.

The full workflow, step by step

Step 1: Record and transcribe

You cannot anonymise what you do not have in text form. The first step is converting your audio or video recordings into transcripts. Automated transcription tools — we cover the full setup in our guide to practical interview setup with audio recording and automated transcribing — can produce a first draft in minutes, but they will include names, places, and other personal data just as spoken. Treat these raw transcripts as sensitive data from the moment they are created: store them securely, limit access, and do not share them until they have been de-identified.

One useful practice is to note during the interview itself which participants are likely to be highly identifiable — for instance, the only person in a particular role, or someone who described a unique personal experience. You will want to pay extra attention to these transcripts during the review stage.

Step 2: Upload transcripts to Skimle Anonymise

Once you have your raw transcripts, upload them to Skimle Anonymise. You can process the full set in one batch — in our example, all 20 interviews from the transformation programme. Skimle analyses each transcript and detects identifiers across six categories:

Names — personal names of participants, colleagues, and third parties mentioned in conversation
Titles and roles — job titles, seniority levels, and functional designations
Locations — cities, countries, office names, building references
Organisations — company names, division names, subsidiary references
Dates — specific dates, event timing references, and tenure durations that could narrow identification
Other — demographic details, physical descriptions, and other context-dependent identifiers

The AI flags instances across all six categories and presents them for review. It does not make decisions automatically: you stay in control throughout.

Step 3: Review the detected identifiers

This is the step where human judgement matters most. AI is excellent at spotting the obvious cases — a name, a job title, a city — but qualitative transcripts are full of context that only a researcher familiar with the sample can properly interpret.

Take a passage like: "As the CHRO, I find this transformation difficult personally." The role ("CHRO") is an identifier if there is only one CHRO in the organisation, or if participants are drawn from a single company. Skimle flags it; you decide whether to pseudonymise it (replacing "CHRO" with "senior HR leader" or "Participant 01's role") or leave it, depending on the level of protection required.

Or consider: "As the only female on the team, I sometimes feel my perspective is dismissed." There is no name here, no title. But the demographic detail is a direct identifier in any team of known composition. Skimle's sixth category — "Other" — exists precisely to catch these cases. Researchers reviewing the flag can decide whether to replace the phrase with something like "as someone from an underrepresented group" or to remove it entirely if it is not analytically essential.

This review stage is the moment to use your knowledge of the sample. If you interviewed 20 people and three of them mentioned the same internal project by name, the project name may be innocuous. If only one person could plausibly know about it, it is an identifier.

Step 4: Choose the right anonymisation level

Skimle Anonymise offers three protection levels, and choosing the right one depends on your use case.

Level 1 handles direct identifiers only. "David Chen, CHRO" becomes "Participant 01". Names, explicit titles, organisation names, and specific locations are replaced or pseudonymised. This level is appropriate when transcripts will be analysed by a closed research team and quotes will only appear in internal reports with limited distribution.

Level 2 extends pseudonymisation to role-based and contextual identifiers. "As the CHRO, I find this difficult" becomes "As a senior leader, I find this difficult" — the functional role is generalised. This is the appropriate level for most corporate research where quotes will appear in presentations, where the client audience might recognise individuals by role, or where the research covers a single organisation.

Level 3 applies the most aggressive transformation, including demographic details, temporal references, and any contextual clue that could contribute to re-identification. Use this level when research covers particularly sensitive topics — redundancies, misconduct, health issues — or when participants are highly distinctive (a sole founder, a sole person in a specific role) and the findings will be shared publicly or with a broad audience.

For the transformation programme example, Level 2 is probably the right choice for most transcripts but you might want to manually check some sensitive details.

Step 5: Handle edge cases

Real transcripts are messier than clean examples suggest. A few scenarios that come up regularly:

The same person appears under different names across interviews. In a 20-interview project, "David", "David Chen", "the CHRO", "Dave", and "the head of HR" might all refer to the same individual. If each is replaced by a different pseudonym, the translation table becomes a tangle and subsequent analysis is confusing. Skimle allows you to merge entities — confirming that these five references are the same person and assigning them a single consistent pseudonym — before exporting.

An identifier appears in a quote that is analytically essential. Sometimes the specificity of a role is the whole point. If the finding is that senior leaders and frontline workers diverge sharply on a question, you need some indication of seniority in the output. Level 2 pseudonymisation handles this by replacing specific titles with generalised seniority markers ("senior leader", "frontline employee") rather than removing role information entirely.

A participant mentions a third party who has not consented. "My manager, Sarah Thompson, specifically told me to ignore the policy" contains a third party's full name and a potentially damaging statement. The participant was promised anonymity; Sarah Thompson was not even in the room. Skimle flags third-party names and you can redact or pseudonymise them independently of the participant's own identifier.

Dates create a timeline that identifies someone. "I've been here three months, having joined after the redundancy round" may identify a recent hire in a small team. Level 2 and Level 3 both address temporal identifiers, replacing specific durations and event references with vaguer markers.

Step 6: Export the outputs

Once you are satisfied with the review, Skimle Anonymise generates three outputs:

Anonymised DOCX files — one per original transcript, with all identified entities replaced according to your settings. These are the files you share with analysts, use as the basis for coding, and from which you draw quotes for reports. They can be imported directly into Skimle's analysis environment for thematic analysis and qualitative coding.

Excel translation table — the re-identification key, mapping each pseudonym back to the original identifier. This file should be stored separately from the anonymised transcripts, with access restricted to the core research team or a named data controller. If a participant later exercises their right of access or erasure under GDPR, you need this key to locate and remove their data.

PDF audit report — a record of what was detected, what was changed, which level was applied, and which flags were accepted or overridden by the researcher. This is your documentation trail. If a participant, an ethics board, or a regulator ever asks how anonymisation was conducted, this report is your answer.

A few points worth being explicit about, because they come up in almost every corporate research context.

Interview data is personal data. Even if you never intend to publish names, the fact that you hold transcripts about identifiable individuals means GDPR applies from the moment of collection. This means having a lawful basis for processing (typically legitimate interest or consent), storing data securely, limiting access, and being able to respond to subject access requests.

Pseudonymised data is still personal data under GDPR. This is a common misconception. Unless re-identification is genuinely not possible — a high bar — pseudonymised data remains within scope. The regulation does acknowledge that pseudonymisation reduces risk and can satisfy some of the security obligations under Article 32, but it does not take the data outside GDPR's reach entirely.

Promises of anonymity create obligations. When participants agree to be interviewed on the basis that they cannot be identified, a failure to de-identify properly is a breach of trust that can damage your relationship with the organisation, deter future research participation, and in some contexts expose the research commissioning organisation to legal liability.

Sharing raw transcripts with AI tools requires care. Uploading unredacted transcripts to a general-purpose AI tool or a shared cloud service may constitute a transfer of personal data to a third-party processor, which requires a data processing agreement and potentially raises questions about data residency. Our AI qualitative data analysis checklist covers this in detail, and our guide on transparency in AI tools discusses what questions to ask before processing sensitive data with an AI system. Please refer to Skimle's Terms of Service and privacy policy to ensure they match your needs. For our organisational customers we also have separate Data Processing Agreements and even private cloud options.

Fitting anonymisation into the broader research workflow

Anonymisation is one stage in a larger process. For the corporate transformation programme, the full workflow looks roughly like this:

Prepare your interview guide — see our guide on how to write an interview guide and how to conduct effective business interviews
Record and transcribe — automated transcription setup is covered in our practical interview setup guide
Anonymise — using the workflow described in this article
Analyse — coding, theming, and synthesis on the anonymised transcripts; see how to analyse interview transcripts
Synthesise and present — turning findings into a clear story; see how to synthesise user research and presenting qualitative findings to executives

Anonymisation is a steo to consider carefully. If you start coding or sharing raw transcripts before de-identifying them, you might spread personal data into your analysis files and presentations — all of which then inherit the same compliance obligations. It might be smart to build anonymisation into the workflow before analysis begins.

For HR professionals running employee research specifically, this is especially important. Employees are generally more identifiable than external interview participants, employment relationships create additional legal sensitivities, and the consequences of a breach — real or perceived — can affect employee trust across the whole organisation. Our HR and people teams use case covers how Skimle supports HR research teams in detail.

For consultants running due diligence or transformation research, client confidentiality requirements often layer on top of GDPR obligations. Participants may be employees of a company that is the subject of an acquisition or restructuring, and any leak of identifiable interview content can have serious commercial consequences. See our consultants and investors use case for more on how Skimle supports these workflows.

What not to do

A few patterns that seem reasonable but create problems:

Do not rely on "removing names is enough." As the examples above show, job titles, demographic details, and contextual clues can be as identifying as names — sometimes more so in small or distinctive samples.

Do not treat manual find-and-replace as sufficient. Spreadsheet or word-processor replacements miss synonyms, nicknames, and indirect references. A manual pass will catch "David Chen" but may miss "the CHRO", "David", and "the person who joined from the Singapore office."

Do not anonymise after analysis. Once personal data has been embedded in codebooks, synthesis notes, and presentation decks, the compliance obligation follows it to every downstream document.

Do not share the translation key with the same audience as the anonymised transcripts. If the key and the pseudonymised transcripts are both on the same shared drive, pseudonymisation provides essentially no protection.

Do not use a general-purpose AI chat tool for anonymisation. Uploading sensitive interview transcripts to a consumer AI service to "ask it to remove names" creates data transfer and retention risks, and the output is not auditable.

Ready to anonymise your interview transcripts with a proper audit trail? Try Skimle Anonymise for free and process your first batch of transcripts with detection across six identifier categories, researcher-controlled review, and a PDF audit report.

Continuing your research workflow? Read our guides on how to analyse interview transcripts, thematic analysis methodology, and how to synthesise user research.

About the authors

Henri Schildt is a Professor of Strategy at Aalto University School of Business and co-founder of Skimle. He has published over a dozen peer-reviewed articles using qualitative methods, including work in Academy of Management Journal, Organisation Science, and Strategic Management Journal. His research focuses on organisational strategy, innovation, and qualitative methodology. Google Scholar profile

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile

How to anonymise interview transcripts when conducting sensitive business interviews

How to pracrically anonymise interview transcripts in a business setting: replace identifiers, pseudonymise roles, and meet compliance requirements with Skimle Anonymise.

The business case for getting this right