The Privacy Layer for your AI Stack.

Sanitize prompts in real-time with our open-source SDK. Run PII detection locally via WebAssembly or deploy the Sidecar to your VPC for centralized governance.

import { createAnonymizer } from 'rehydra';

const anonymizer = createAnonymizer({
  ner: { mode: 'quantized' },
  semantic: { enabled: true }
});

await anonymizer.initialize();

const result = await anonymizer.anonymize(
  'Hello John Smith from Acme Corp in Berlin!'
);
Hello <PII type="PERSON" gender="male" id="1" /> 
from <PII type="ORG" id="2" /> in 
<PII type="LOCATION" scope="city" country="Germany" id="3" />

FAQ

Frequently Asked Questions

Does the SDK require an internet connection?

An internet connection is required only for the first run to download the NER model and semantic datasets (if enabled) from the Hugging Face Hub. Once downloaded, these files are cached locally: Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

Can I detect custom identifiers specific to my domain (e.g., Order Numbers)?

You can define custom regex-based recognizers using createCustomIdRecognizer. You provide a pattern, a name, and a PII type (usually CASE_ID or CUSTOMER_ID). Example:

Is the PII mapping data secure?

Yes. The PII map (which links placeholder IDs to original values) is encrypted using AES-256-GCM. Development: `InMemoryKeyProvider` generates a random key on startup (not secure for persistence). Production: Use `ConfigKeyProvider` to inject a secure, persistent 32-byte key from your environment variables or secrets manager. Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

What is the difference between the "standard" and "quantized" NER models?

The SDK supports two modes for the ONNX-powered Named Entity Recognition (NER) model: Quantized (~280 MB): The default recommended model. It uses int8 quantization for a smaller footprint and faster inference with minimal accuracy loss. Standard (~1.1 GB): The full precision FP32 model. It provides the highest accuracy but requires significantly more memory and bandwidth. Both models are automatically downloaded and cached locally upon first initialization. Development: `InMemoryKeyProvider` generates a random key on startup (not secure for persistence). Production: Use `ConfigKeyProvider` to inject a secure, persistent 32-byte key from your environment variables or secrets manager. Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

How does rehydra improve Machine Translation (MT) context for anonymized entities?

You can enable Semantic Enrichment by setting { semantic: { enabled: true } } in the configuration. This downloads additional datasets (~12 MB) to infer attributes for detected entities: Person Names: Adds a gender attribute (e.g., <PII type="PERSON" gender="female" .../>) to help MT engines preserve grammatical agreement. Locations: Adds a scope attribute (e.g., city, country, region) to help select correct prepositions (e.g., "in Berlin" vs "in Germany").

Does the SDK require an internet connection?

An internet connection is required only for the first run to download the NER model and semantic datasets (if enabled) from the Hugging Face Hub. Once downloaded, these files are cached locally: Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

Can I detect custom identifiers specific to my domain (e.g., Order Numbers)?

You can define custom regex-based recognizers using createCustomIdRecognizer. You provide a pattern, a name, and a PII type (usually CASE_ID or CUSTOMER_ID). Example:

Is the PII mapping data secure?

Yes. The PII map (which links placeholder IDs to original values) is encrypted using AES-256-GCM. Development: `InMemoryKeyProvider` generates a random key on startup (not secure for persistence). Production: Use `ConfigKeyProvider` to inject a secure, persistent 32-byte key from your environment variables or secrets manager. Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

What is the difference between the "standard" and "quantized" NER models?

The SDK supports two modes for the ONNX-powered Named Entity Recognition (NER) model: Quantized (~280 MB): The default recommended model. It uses int8 quantization for a smaller footprint and faster inference with minimal accuracy loss. Standard (~1.1 GB): The full precision FP32 model. It provides the highest accuracy but requires significantly more memory and bandwidth. Both models are automatically downloaded and cached locally upon first initialization. Development: `InMemoryKeyProvider` generates a random key on startup (not secure for persistence). Production: Use `ConfigKeyProvider` to inject a secure, persistent 32-byte key from your environment variables or secrets manager. Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

How does rehydra improve Machine Translation (MT) context for anonymized entities?

You can enable Semantic Enrichment by setting { semantic: { enabled: true } } in the configuration. This downloads additional datasets (~12 MB) to infer attributes for detected entities: Person Names: Adds a gender attribute (e.g., <PII type="PERSON" gender="female" .../>) to help MT engines preserve grammatical agreement. Locations: Adds a scope attribute (e.g., city, country, region) to help select correct prepositions (e.g., "in Berlin" vs "in Germany").

Does the SDK require an internet connection?

An internet connection is required only for the first run to download the NER model and semantic datasets (if enabled) from the Hugging Face Hub. Once downloaded, these files are cached locally: Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

Can I detect custom identifiers specific to my domain (e.g., Order Numbers)?

You can define custom regex-based recognizers using createCustomIdRecognizer. You provide a pattern, a name, and a PII type (usually CASE_ID or CUSTOMER_ID). Example:

Is the PII mapping data secure?

Yes. The PII map (which links placeholder IDs to original values) is encrypted using AES-256-GCM. Development: `InMemoryKeyProvider` generates a random key on startup (not secure for persistence). Production: Use `ConfigKeyProvider` to inject a secure, persistent 32-byte key from your environment variables or secrets manager. Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

What is the difference between the "standard" and "quantized" NER models?

The SDK supports two modes for the ONNX-powered Named Entity Recognition (NER) model: Quantized (~280 MB): The default recommended model. It uses int8 quantization for a smaller footprint and faster inference with minimal accuracy loss. Standard (~1.1 GB): The full precision FP32 model. It provides the highest accuracy but requires significantly more memory and bandwidth. Both models are automatically downloaded and cached locally upon first initialization. Development: `InMemoryKeyProvider` generates a random key on startup (not secure for persistence). Production: Use `ConfigKeyProvider` to inject a secure, persistent 32-byte key from your environment variables or secrets manager. Node.js: In standard cache directories (e.g., ~/.cache/rehydra on Linux). Browser: In the Origin Private File System (OPFS) for large model files and IndexedDB for metadata.

How does rehydra improve Machine Translation (MT) context for anonymized entities?

You can enable Semantic Enrichment by setting { semantic: { enabled: true } } in the configuration. This downloads additional datasets (~12 MB) to infer attributes for detected entities: Person Names: Adds a gender attribute (e.g., <PII type="PERSON" gender="female" .../>) to help MT engines preserve grammatical agreement. Locations: Adds a scope attribute (e.g., city, country, region) to help select correct prepositions (e.g., "in Berlin" vs "in Germany").