Anonymisation tools for qualitative research in 2026: De-ID, Textwash, Presidio, MAXQDA and Skimle compared

Comparing the best anonymisation and pseudonymisation tools for qualitative researchers in 2026 — features, pricing, GDPR compliance, and which tool fits which research context.

Cover Image for Anonymisation tools for qualitative research in 2026: De-ID, Textwash, Presidio, MAXQDA and Skimle compared
Share this article:

The best anonymisation tool for qualitative research depends on what rigour you need. For researchers who need a full audit trail, cross-file consistency, and GDPR-defensible re-identification key management, Skimle Anonymise is the most complete solution and the only one that integrates directly with qualitative analysis. For researchers on a tight budget without ethics board requirements, Textwash (free, open-source) or De-ID (pay-per-use from $0.0015/word, €0.0014/word) are reasonable standalone options. Generic tools like Microsoft Presidio and MAXQDA's basic anonymise function are not built for qualitative research and miss the indirect identifiers that matter most.

Why anonymisation is harder than it looks for qualitative data

The distinction between pseudonymisation and anonymisation is not just semantic — it has real legal consequences under GDPR and HIPAA, and real practical consequences for research ethics.

Pseudonymisation replaces direct identifiers with stand-ins (names with codes, organisations with labels) but retains a re-identification key. The data is still personal data under GDPR. Subject rights still apply. The key must be stored securely.

Anonymisation goes further: when done properly, re-identification is not reasonably possible even with additional information. Truly anonymised data falls outside GDPR's scope entirely.

The ICO's guidance on anonymisation makes clear that many researchers who believe they have anonymised data have only pseudonymised it — and have often missed the indirect identifiers that still point to specific individuals. A job title, a location, a specific project reference, a combination of demographic details: any of these can be identifying even without a name attached.

Most anonymisation tools handle direct identifiers reasonably well. The hard problem is indirect identifiers, cross-file consistency (when the same person appears under different descriptions across 20 transcripts), and producing the audit trail that ethics boards and compliance officers actually require.

How the tools compare

ToolTypePricingIndirect identifiersCross-file consistencyAudit trailRe-ID key mgmtGDPR/HIPAABest for
Skimle AnonymiseIntegrated research platformIncluded in all plans✓ Full (6 categories)✓ Entity registry✓ PDF report✓ 3-tier (keep/separate/destroy)✓ EU-hostedQualitative researchers, HR, consultants
De-IDDedicated anonymisation$67.50 (€62) / 50k credits✓ Good✓ Yes✓ Yes✓ Yes✓ HIPAA + GDPRAcademic IRB compliance
TextwashOpen-source CLI toolFree✗ Limited✗ No✗ None✗ None✓ Local processingBudget-conscious, EN/NL only
Microsoft PresidioGeneric PII frameworkFree (open-source)✗ Direct IDs only✗ No✗ None✗ None✗ Basic onlyTechnical teams, structured data
MAXQDA AnonymiseBasic built-in featurePart of MAXQDA (€270–€1,090+/year, ~$295–$1,195+)✗ None✗ No✗ None✗ None✗ NoneVery basic internal masking only
ChatGPT / general AIGeneral purposeSubscription varies✗ Inconsistent✗ None✗ None✗ None✗ NoneNot suitable for rigorous research

Skimle Anonymise

[Skimle Anonymise]((introducing-skimle-anonymise-pseudonymisation-for-qualitative-research) is purpose-built for qualitative research data: interview transcripts, expert call notes, focus group recordings, open-text survey responses. Unlike the other tools in this comparison, it is not a standalone anonymisation product — it sits inside Skimle's analysis platform, which means anonymised documents feed directly into qualitative analysis without any export-import step.

How it works: Upload your documents and the AI scans across all files simultaneously for six identifier categories — names, titles and roles, locations, organisations, dates and times, and other contextually sensitive information. Each detected instance is highlighted, categorised, and listed in a panel. You can review and adjust each one. Critically, the tool maintains a shared entity registry across the entire project, so the same person appearing under different names or descriptions in different transcripts is merged into a single entity and pseudonymised consistently throughout.

Protection levels: Three presets map to common compliance scenarios. Level 1 (light pseudonymisation) handles direct identifiers and retains the re-identification key. Level 2 (strong pseudonymisation) extends to indirect identifiers and retains the key. Level 3 (full anonymisation) applies maximum transformation and permanently destroys the key on export, producing HIPAA Safe Harbor-equivalent output. Per-category transformation modes — pseudonymise, neutralise, generalise, approximate, redact — give researchers granular control.

The audit report: This is what separates Skimle Anonymise from every other tool in this list for research contexts. The export package includes a PDF audit report documenting every transformation applied across the project, with timestamps. This is the artefact that an ethics board, IRB, or GDPR compliance officer needs to see. A researcher who can produce it is in a fundamentally different position from one who cannot.

Pricing: Included in all Skimle plans. No per-document or per-word cost. See pricing.

Best for: Academic researchers with ethics board requirements, HR teams conducting exit interviews and engagement surveys, consultants sharing anonymised client data, any researcher who needs to demonstrate rigour to a third party.

De-ID

De-ID is a dedicated qualitative research anonymisation tool. It is the closest purpose-built competitor to Skimle Anonymise and handles qualitative data (transcripts, field notes, interview notes) considerably better than generic PII tools.

Strengths: Built specifically for qualitative research, not structured data. Detects both direct and some indirect identifiers. Supports HIPAA Safe Harbor and GDPR workflows. Cross-file consistency is available. Audit trail generation for ethics documentation. Color-coded suggestions distinguish essential removals from discretionary ones.

Limitations: A typical 60-minute interview transcript (around 8,000-10,000 words) costs roughly $12-15 (€11-14) to process. A project with 30 interviews runs to $360-450 (€330-415) before analysis has started. The Starter Pack at $67.50 (€62) for 50,000 credits covers approximately 5-6 interviews; the Project Pack at $153 (€140) for 120,000 credits covers 12-15. There is no integration with qualitative analysis tools — anonymised documents must be exported and re-imported elsewhere.

Best for: Academic researchers who need dedicated IRB-compliant anonymisation and are processing smaller corpora. A reasonable standalone option when you will analyse the data in NVivo, MAXQDA, or another QDA tool rather than Skimle.

Textwash

Textwash is an open-source text anonymisation tool developed at Tilburg University and University College London, with support from the Dutch Research Council. It is the only genuinely free option with decent accuracy for qualitative text.

Strengths: Free. Locally deployable, meaning data never leaves your machine — important for sensitive research. Reasonably accurate NER-based detection for English and Dutch text. The research behind it is published, so the methodology is transparent.

Limitations: English and Dutch only (base version). The command-line interface requires technical comfort — there is no consumer-friendly GUI in the free version (Textwash Pro offers a GUI but is a commercial product). No cross-file consistency. No re-identification key management. No audit trail. Limited indirect identifier detection. Not suitable for contexts where a documented, auditable process is required.

Best for: Technically capable researchers working in English or Dutch who need basic anonymisation and have no ethics board documentation requirements. Good for low-stakes pre-processing before analysis. Not appropriate for IRB, GDPR compliance officers, or research that will be shared with external parties.

Find it at: github.com/ben-aaron188/textwash

Microsoft Presidio

Presidio is an open-source PII detection and anonymisation framework from Microsoft, designed as a general infrastructure layer for enterprise applications. It is not designed for qualitative research.

Strengths: Free and highly customisable. Supports a wide range of direct identifier types: names, emails, phone numbers, credit card numbers, social security numbers, addresses. Deployable locally or in the cloud. Technically sophisticated with regex, NER, and context-aware detection.

Limitations: Not built for qualitative research indirect identifiers. Presidio will reliably detect that "John Smith" is a name. It will not reliably detect that "the former director of the Helsinki division who moved to the advisory board in 2022" is identifying. Cross-file consistency is not built in. No re-identification key management. No audit trail. Requires significant technical implementation — it is a framework, not a finished tool. High false positive and negative rates on the nuanced indirect identifiers that matter most in interview data.

Best for: Technical teams building enterprise applications that need to strip PII from structured or semi-structured data. Not suitable for qualitative research without substantial custom development.

MAXQDA's anonymise feature

MAXQDA includes a basic anonymisation function that works as follows: researchers manually code text passages with a specific code; when saving an anonymised copy of the project, all text tagged with that code is replaced with "XXX". That is the entirety of the feature.

There is no AI detection. No indirect identifier handling. No cross-file consistency. No re-identification key. No audit trail. No transformation modes. It is a manual find-and-mark workflow that produces redacted output.

For researchers who want to share a version of their MAXQDA project without specific text passages, this is occasionally useful. For researchers who need to demonstrate to an ethics board that their anonymisation process was rigorous and systematic, it is not sufficient. The MAXQDA documentation confirms this is essentially a redaction convenience feature rather than a compliance workflow.

NVivo and ATLAS.ti have even less built-in anonymisation capability — both recommend anonymising data externally before importing into the platform.

What about ChatGPT?

A researcher who uploads a transcript to ChatGPT and asks it to "remove all names and identifying information" will get something that looks anonymised. But this approach has several fundamental problems for research use:

  • No cross-file consistency — the same person may receive different pseudonyms in different conversations
  • No re-identification key — there is no way to reverse the process or trace back to source
  • No audit trail — there is no documentation of what was detected and what was changed
  • Data processing concerns — sending sensitive interview data to an external LLM raises GDPR issues
  • No indirect identifier detection — the model applies no structured analysis for combinations that are identifying

The output is plausible-looking but not verifiably rigorous. For more on why general-purpose AI tools are not appropriate for serious qualitative analysis, see hallucinations, limited context and black boxes: the three problems of AI qualitative analysis.

When to use which tool

You need IRB approval or ethics board documentation: Skimle Anonymise (audit report built in) or De-ID (dedicated IRB-compliant tool).

You are processing EU data and need to demonstrate GDPR compliance: Skimle Anonymise (EU-hosted, re-ID key management, audit trail). Textwash for local processing only.

You have a large corpus (30+ interviews) and cost is a concern: Skimle Anonymise (flat rate included in subscription) is significantly cheaper than De-ID at scale.

You are a technical researcher working in English or Dutch with no compliance requirements: Textwash (free, local, open-source).

You already use MAXQDA and need basic redaction for internal sharing: MAXQDA's built-in anonymise function — but do not use it for external sharing or compliance documentation.

You want to anonymise and then immediately analyse without re-importing: Skimle Anonymise, which feeds directly into Skimle's qualitative analysis environment.

You are building enterprise infrastructure for PII stripping at scale: Microsoft Presidio (open-source framework, customisable, not research-specific).

Frequently asked questions

What is the difference between anonymisation and pseudonymisation?

Pseudonymisation replaces identifying information with stand-ins but retains a re-identification key. The data is still personal data under GDPR. Anonymisation goes further: when the key is destroyed and indirect identifiers are sufficiently transformed, re-identification is not reasonably possible and the data falls outside GDPR's scope entirely. Most "anonymised" research data is actually only pseudonymised — usually because indirect identifiers have not been fully addressed. See how to anonymise qualitative research data for IRB compliance for the full treatment.

What are indirect identifiers in qualitative research?

Direct identifiers include names, contact details, and account numbers — obviously identifying. Indirect identifiers are details that, in combination or context, can still identify a person even without a name: a job title unusual enough to point to one person, a specific city combined with a seniority level, a named project or event, a distinctive phrase or viewpoint. Manual anonymisation and generic PII tools miss indirect identifiers far more often than direct ones. The legal standard for anonymisation under GDPR requires that re-identification not be reasonably possible — which means indirect identifiers must be addressed.

Is anonymising data before analysis required under GDPR?

Not always required, but often the most practical approach. Unprocessed interview data is personal data and must be handled accordingly (lawful basis, data subject rights, storage limits, access controls). Anonymising or pseudonymising the data before wider sharing, analysis by third parties, or long-term retention is often the cleanest way to manage these obligations. The ICO's anonymisation guidance is the authoritative reference for UK/EU researchers.

How long does anonymising a corpus of interviews take with AI tools?

With Skimle Anonymise, a corpus of 20-30 one-hour interview transcripts (roughly 200,000-300,000 words) can be processed in under an hour, with review and confirmation of detected identifiers taking 1-3 additional hours depending on complexity. Manual anonymisation of the same corpus typically takes 2-4 analyst weeks and produces less consistent results.

Do I need to anonymise data before uploading to qualitative analysis tools like NVivo?

NVivo and ATLAS.ti both recommend anonymising data externally before importing. Neither has built-in de-identification. If you use Skimle, the anonymisation and analysis steps are integrated — you anonymise within the platform and move directly to analysis with no re-import step.


Ready to anonymise your research data with a full audit trail? Try Skimle Anonymise for free — upload your transcripts, review the detected identifiers across all six categories, and export anonymised documents with a timestamped audit report.

Related reading:


About the authors

Henri Schildt is a Professor of Strategy at Aalto University School of Business and co-founder of Skimle. He has published over a dozen peer-reviewed articles using qualitative methods, including work in Academy of Management Journal, Organisation Science, and Strategic Management Journal. His research focuses on organisational strategy, innovation, and qualitative methodology. Google Scholar profile

Olli Salo is a former Partner at McKinsey & Company where he spent 18 years helping clients understand the markets and themselves, develop winning strategies and improve their operating models. He has done over 1000 client interviews and published over 10 articles on McKinsey.com and beyond. LinkedIn profile


Sources