Documentation in progress. New content is added regularly.

AI Safety

How Helvia keeps AI agents grounded, bounded, and accountable

Conversational AI introduces a different category of risk. Agents can invent answers, follow instructions hidden in user input, drift across long conversations, and produce content nobody can trace back to a source. This page covers the controls Helvia uses to address each of them.

Our Responsible AI Principles

These principles shape every product decision and every safeguard described on this page.

Grounded Answers

Agents speak from your verified sources, with citations on every response

vector-square

Bounded Behavior

Agents stay within their defined role, topic, and audience

Privacy by Design

Personal data is anonymized, encrypted, and retained only as long as configured

Observable Model Calls

Every LLM input and output is logged and reviewable in Observatory

Human Oversight

Agents can escalate or hand off when they should not answer alone

Automated Testing

Synthetic conversations stress-test agent behavior at scale

Risks We Address

AI risk is different from traditional software risk because the output is generated, not retrieved. That single property opens up failure modes that do not exist elsewhere: invented facts, hijacked instructions, unsafe content, and leaked data. Each has a specific control on the platform.

Risk
How Helvia Handles It

Hallucinated or out-of-scope answers

RAG+C pipeline enforces answers from retrieved sources, with citations

Prompt injection and jailbreak attempts

Input guardrails inspect user messages before they reach the model

Inappropriate or unsafe model outputs

Output guardrails filter responses before delivery

Sensitive data leaking to the model

User messages are optionally anonymized per agent, before reaching storage or the model

Unbounded behavior in multi-step agents

Low-code workflow design, automated end-to-end testing, and human escalation

Grounded Answers With RAG+C

The single largest source of AI risk in production is the model inventing a confident but incorrect answer. Helvia addresses this with a Retrieval Augmented Generation with Citation (RAG+C) toolkit: a knowledge base of indexed sources, a semantic search node that retrieves the relevant ones, and LLM nodes that rewrite queries and write cited answers from the retrieved context.

How those tools come together is the builder's call, with the agent template's workflow as a starting point. The builder connects the sources, tunes how queries are rewritten, adds ranking or filtering, and decides how citations are presented. Grounding is a workflow you shape, not a fixed pipeline.

1

Sources Become Citable Chunks

When a knowledge source is connected to an agent, the platform parses it into self-contained, human-readable segments. Each segment is small enough to serve as a citation and large enough to carry its own meaning, so every answer can point back to the exact passage that produced it.

2

Retrieval Filters the Context

Before any text reaches the language model, a semantic search node shortlists the relevant segments from the connected knowledge base. The LLM only ever sees content selected by retrieval, never the full corpus and never arbitrary internet content.

3

Selection Is Enforced

The workflow's prompt and selection logic keep the model answering strictly from the retrieved segments. If nothing relevant was retrieved, the agent says so rather than improvising.

Two additional mechanisms reinforce grounding:

  • Citations on every response: users can see which source produced each answer and verify it

  • Query rewriting: ambiguous or under-specified questions are rewritten before retrieval, so the right context reaches the model

Workflow Guardrails

Guardrails in Helvia are built into the workflow. Place an LLM node upstream of the main LLM to inspect user messages, or downstream to check responses before they reach the user. You define what the LLM flags, blocks, or rewrites.

Add an LLM node upstream of the main LLM, with a prompt that inspects the user's message before it reaches the model. Depending on what the node returns, the workflow:

  • Blocks the message and returns a safe refusal

  • Forwards it untouched

  • Forwards a sanitized version

Common uses include screening for prompt injection patterns, jailbreak attempts, or instructions that conflict with the agent's defined role.

Safe LLM Integration

Every LLM call the platform makes runs through the same controls:

Control
How It Works

Encrypted transmission

All traffic to LLM providers is encrypted with TLS, end-to-end

Anonymization before invocation

Optional PII detection runs on user messages and replaces sensitive entities before they reach the model

Customer-owned accounts

Calls run through your own provider credentials, so data-use terms are governed by your contract

Full input and output logging

Every prompt sent and response received is logged at session level and reviewable in Observatory

Vetted providers

Each LLM provider is reviewed against recognized standards (ISO 27001, SOC 2) before integration

Testing and Evaluation

Agents respond differently to the same input every time, so a single manual test only tells you what happened once. Helvia's testing framework runs synthetic conversations at scale and judges each one against your success criteria, so non-deterministic failures surface as a pass rate instead of slipping through.

Every test involves three roles:

  • Synthetic user: an LLM-powered persona that simulates a real user, following the scenario and goals you define in its prompt

  • Agent under test: the agent being evaluated, talking to the synthetic user as it would to a real one

  • Evaluator: a separate LLM that reads the full transcript and returns a pass-or-fail verdict with an explanation, judged against your success criteria

Each test runs across many sessions, so behavior you would otherwise see only intermittently surfaces as a statistical pattern. Configure and run tests to cover hallucination prevention, scenario handling, tone, adversarial inputs, and any other behavior worth validating before deployment.

Agent Behavior Boundaries

Several behavior bounds are set by the builder during agent construction, not by the platform automatically.

Personality and role variable

Each agent has a configurable text variable that describes how it should respond and what role it plays. Update it to refine the agent's behavior without touching the rest of the workflow.

Escalation paths

Helvia agents do not escalate automatically. The builder decides when and how a conversation should hand off to a human, and wires those conditions into the workflow as explicit nodes. Common triggers include user intent, sentiment, or repeated unanswered questions.

Last updated

Was this helpful?