> For the complete documentation index, see [llms.txt](https://docs.helvia.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.helvia.ai/security/data-privacy-and-handling.md). # Data Privacy & Handling Conversation data moves through a clear lifecycle: collection, anonymization, storage, and deletion. The platform gives you direct control over the two stages that vary most between organizations: how personal information (PII) is identified and replaced, and how long conversation data is kept before it is deleted. {% hint style="info" %} For certifications, the security FAQ, and how Helvia.ai protects your data end-to-end, see the [Security Overview](/security/overview.md) page. {% endhint %} ### Anonymization and PII Handling The platform automatically detects personal information in conversations and replaces it before that data is stored or sent to third parties. Anonymization is activated and configured **per agent** under **Designer > Settings > Privacy**, so each agent can carry rules appropriate to its domain.

Three mechanisms work together, each addressing a different sensitivity level:

Title	Description
Entity-Based Detection	Recognize named entities like people, locations, dates, and money, or match custom regex patterns
Custom Anonymization Service	Plug in your own detection endpoint when the built-in entities are not enough
Full Message Obfuscation	Replace the entire user message before storage for the highest-sensitivity scenarios

Title

Description

Entity-Based Detection

Recognize named entities like people, locations, dates, and money, or match custom regex patterns

Custom Anonymization Service

Plug in your own detection endpoint when the built-in entities are not enough

Full Message Obfuscation

Replace the entire user message before storage for the highest-sensitivity scenarios

#### Entity-Based Anonymization Entity-based anonymization uses Named Entity Recognition to detect Personally Identifiable Information (PII) in free text and substitute it with placeholder values. The detector recognizes a fixed set of categories, such as names, locations, dates, and monetary values. Detection runs on every incoming user message when configured, before it enters the agent's processing pipeline, so the data is replaced before it ever reaches the language model or persistent storage. It can also be applied to the full conversation transcript before export to external systems. Contact [support](/resources/support.md) for special configurations beyond the defaults. To add a detection and replace rule: {% stepper %} {% step %} #### Open Privacy Settings for the Agent Go to **Designer > Settings > Privacy**. {% endstep %} {% step %} #### Add a Detection Rule Under **Anonymization Settings**, select **Add Data Type** to insert a new row. You can add as many rows as you need, one per entity type or regex pattern you want to detect.

{% endstep %} {% step %} #### Choose What to Detect Pick the **Data Type** category. The full set of supported categories is listed below. {% endstep %} {% step %} #### Set the Replacement Enter the text that will replace matches in **Replacement Text**. Leave it blank to fall back to the **Default Replacement** value set for the section. {% endstep %} {% step %} #### Provide a Regex Pattern Only required when **Custom Regex** is selected as a data type. Enter the **Regex pattern** and the **Replacement Text**. Use this for identifiers specific to your domain, such as account numbers or internal case IDs.

{% endstep %} {% step %} #### Save the Rule Select **Save Changes** to apply. {% endstep %} {% endstepper %} The full list of supported entities is:

Entity	What It Covers
`PERSON`	People, including fictional
`GPE`	Countries, cities, states
`NORP`	Nationalities, religious or political groups
`FAC`	Buildings, airports, highways, bridges
`ORG`	Companies, agencies, institutions
`LOC`	Non-GPE locations, mountain ranges, bodies of water
`PRODUCT`	Objects, vehicles, foods (not services)
`EVENT`	Named hurricanes, battles, wars, sports events
`WORK_OF_ART`	Titles of books, songs, and other works
`LAW`	Named documents made into laws
`LANGUAGE`	Any named language
`DATE`	Absolute or relative dates and periods
`TIME`	Times smaller than a day
`PERCENT`	Percentages
`MONEY`	Monetary values, including currency
`QUANTITY`	Measurements such as weight or distance
`ORDINAL`	First, second, third, and so on
`CARDINAL`	Numerals that do not fit another type

{% hint style="info" %} **Tune detection to your data:** Entity recognition is model-based, so unusual names or domain-specific identifiers can slip past the default categories. Add **Custom Regex** rules for the most important patterns. {% endhint %} #### Custom Anonymization Service If the built-in detector does not cover a category specific to your domain, point the agent at your own service instead. In the agent privacy settings, enable **Service Configuration**, supply the endpoint **URL**, and add any HTTP headers your service requires for authentication. {% hint style="info" %} **When to use a custom service:** industry-specific identifiers (medical record numbers, account numbers, internal case IDs) or jurisdictions where you need detection beyond the standard entity set. {% endhint %} #### Full Message Obfuscation For the highest-sensitivity scenarios, enable **Obfuscate User Input** to replace the entire user message with an obfuscated string before it is stored or sent downstream. Use this when entity-level redaction is not enough and no portion of the original message should be preserved.

{% hint style="danger" %} With full obfuscation on, the language model only sees `_censored_` in place of the user message and cannot respond to the original content. {% endhint %} ### Data Retention The platform retains conversation transcripts for a configurable window and deletes them automatically when that window expires. Data retention has two levels: {% tabs %} {% tab title="Workspace Default" %} Set under **Workspace > Settings > Configuration** in the **Data Retention** field. Required at Workspace creation. * Range: 1 to 24 months * New Workspace default to 3 months. * Applies to every agent in the Workspace unless overridden {% endtab %} {% tab title="Per-Agent Override" %} Set under **Designer > Settings > Privacy** in the **Data Retention** field for an agent, and selectable when creating a new agent. * Range: 1 to 24 months * Overrides the Workspace default for this agent * Select **Inherit from Workspace (N months)** to drop the per-agent override {% endtab %} {% endtabs %} When the retention period expires, the data is completely removed from production databases. Archived copies remain in backup storage for up to two years for disaster recovery, unless a shorter window is agreed in your contract. Retention covers stored conversation transcripts (chat sessions). Audit logs, knowledge base content, and Workspace configuration follow separate retention rules. {% hint style="danger" %} **Data Loss Risk:** Setting or shortening retention permanently deletes data older than the new window on the next cleanup run. {% endhint %} ### Where Your Data Lives All Helvia.ai Agent Platform services and databases operate within the European Union, on managed cloud infrastructure with geographic redundancy for disaster recovery.

Title	Description
EU-only Hosting	All processing and storage happens inside European Union data centers
Cloud-Native	Hosted on AWS under their ISO 27001 and SOC 2 certified programs
Geographic Redundancy	Backups and replicas distributed across availability zones for continuity

Title

Description

EU-only Hosting

All processing and storage happens inside European Union data centers

Cloud-Native

Hosted on AWS under their ISO 27001 and SOC 2 certified programs

Geographic Redundancy

Backups and replicas distributed across availability zones for continuity

### Encryption All customer data is encrypted in transit and at rest. The same encryption applies to backups and to data flowing between platform services.

State of Data	Protection
In transit	TLS/SSL across all internet communications, including traffic to LLM providers
At rest	AES-256 encryption applied at the storage layer
Database-level	Sensitive fields encrypted inside the database so data stays protected even on direct access
Passwords	Hashed with modern algorithms; never visible to administrators
Backups	Encrypted with the same standards as primary storage and held in secure, segregated locations

### Data Minimization The platform collects only what each processing purpose requires and removes data when that purpose ends. These principles apply across the platform, from the data your agents receive to the records kept in Observatory.

Title	Description
Purpose-Bound Collection	Each data point is tied to a defined processing purpose and not collected for hypothetical uses
Least-Privilege Access	Role-based access ensures users see only the data their role requires
Routine Review	Stored data is reviewed regularly and removed if no longer necessary
Automatic Deletion	Retention windows enforce removal without relying on manual cleanup

### Data Flows to Third Parties Conversations sometimes need data to leave the platform, whether to generate a response with a language model or to update an external system. Every transfer is protected by the same controls applied across the platform.

Encrypted in transit

All data leaving the platform travels over TLS-encrypted channels, so it stays protected end-to-end between Helvia.ai and the destination.

Anonymization available before export

Personal data can be detected and replaced before any of it is sent to a language model or downstream system, using the same anonymization rules configured per agent.

Vetted providers only

Every third-party provider Helvia.ai integrates with is reviewed against recognized security standards such as ISO 27001 and SOC 2 before being added.

{% hint style="info" %} All third-party integrations, LLM providers included, run through your own provider account and credentials. This means two things: * the relationship is governed directly by your contract with that provider, including any data-use and training terms * the processing region follows the credentials you supply, which can be configured to be inside or outside the EU. {% endhint %} ### Best Practices * **Configure anonymization per agent:** match the rules to the data each agent actually handles, rather than applying one set across every agent * **Use a custom service for domain identifiers:** the built-in entities are broad; plug in your own service when you need medical IDs, account numbers, or other domain-specific patterns recognized * **Tune data retention per agent:** adjust the per-agent retention for agents handling more sensitive conversations according to your contractual obligations * **Audit which provider your LLM calls use:** customer-owned accounts give you direct control over training opt-outs and data-use terms {% hint style="success" %} You now know where your data lives, how the platform anonymizes and retains it, and what reaches third parties at each step of the lifecycle. {% endhint %} --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.helvia.ai/security/data-privacy-and-handling.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

:tags:

Entity-Based Detection

:plug:

Custom Anonymization Service

:mask:

Full Message Obfuscation

:earth-europe:

EU-only Hosting

:cloud:

Cloud-Native

:server:

Geographic Redundancy

:bullseye:

Purpose-Bound Collection

:key:

Least-Privilege Access

:magnifying-glass:

Routine Review

:trash-can:

Automatic Deletion