# Asymmetric Embedding Configuration

LightRAG keeps embedding behavior symmetric by default. Query/document asymmetric
embedding is enabled only when `EMBEDDING_ASYMMETRIC=true` is explicitly set.

This avoids accidental retrieval changes when prefix variables are present in an
environment but the user did not intentionally enable asymmetric embeddings.

Before enabling asymmetric embeddings for any model, check the model's current
model card or provider documentation. Do not infer the right behavior from the
API binding alone: an `openai`-compatible endpoint can serve instruction-free
models, prefix-based models, or provider-specific models behind the same API
shape.

## Reindexing Requirement

Changing asymmetric embedding settings changes the vectors produced for stored
documents and for future queries. After enabling, disabling, or changing any of
these settings, clear the existing LightRAG data for the workspace and re-index
the source files:

- `EMBEDDING_ASYMMETRIC`
- `EMBEDDING_QUERY_PREFIX`
- `EMBEDDING_DOCUMENT_PREFIX`
- Provider task behavior such as Jina `task`, Gemini `task_type`, or VoyageAI
  `input_type`

Do not reuse an existing vector store across asymmetric embedding configuration
changes. Mixing vectors generated with different query/document behavior can
make retrieval quality unpredictable.

## Binding Types

LightRAG distinguishes two asymmetric embedding styles:

| Style | Bindings | How asymmetric behavior is applied |
| --- | --- | --- |
| Provider task parameters | `jina`, `gemini`, `voyageai` | LightRAG passes query/document context to the provider-specific `task`, `task_type`, or `input_type` parameter. |
| Text task prefixes | `openai`, `azure_openai`, `ollama` | LightRAG prepends configured text prefixes before calling the embedding API. Use this only when the model card explicitly requires separate query/document prefixes. |

Other server embedding bindings do not currently support
`EMBEDDING_ASYMMETRIC=true`.

## Default: Symmetric Embeddings

When `EMBEDDING_ASYMMETRIC` is unset, LightRAG does not enable asymmetric
embedding behavior, even if prefix variables exist:

```env
# EMBEDDING_ASYMMETRIC is unset
# EMBEDDING_QUERY_PREFIX="search_query: "
# EMBEDDING_DOCUMENT_PREFIX="search_document: "
```

The prefixes are ignored and a warning is logged.

The same is true when the flag is explicitly false:

```env
EMBEDDING_ASYMMETRIC=false
```

## Instruction-Free Models: Keep Symmetric

Some embedding models are instruction-free, sometimes described as using
implicit intent. They are trained to handle query/document matching from the raw
text itself and do not require query/document prefixes or provider task
parameters. For these models, do not set `EMBEDDING_ASYMMETRIC=true`; leave it
unset or set it to `false`, and do not configure `EMBEDDING_QUERY_PREFIX` or
`EMBEDDING_DOCUMENT_PREFIX`.

Common examples that should normally stay in symmetric mode:

| Model family | Example model IDs | Notes |
| --- | --- | --- |
| BGE-M3 | `BAAI/bge-m3` | Use plain text input. Do not add `search_query:` / `search_document:` unless the specific serving wrapper's model card says otherwise. |
| OpenAI Text Embedding 3 | `text-embedding-3-small`, `text-embedding-3-large` | The OpenAI embeddings API uses text input plus the model name; it does not expose a query/document task parameter. |
| Mistral Embed | `mistral-embed` | Use the provider's plain embedding input. Do not invent task prefixes. |
| Alibaba GTE base models | `gte-large`, `gte-large-zh` | Base GTE models use plain text for normal retrieval. This does not apply to newer `instruct` variants such as `gte-Qwen2-1.5B-instruct`; check that model card. |
| Jina Embeddings v2 | `jina-embeddings-v2-base-en`, `jina-embeddings-v2-base-zh` | Jina v2 is plain-text input. Jina v3/v4 are different and use the `task` parameter for retrieval tasks. |

If a model is instruction-free, enabling LightRAG's asymmetric mode can make the
input different from what the model was trained or documented to expect. That can
reduce retrieval quality even though the server starts successfully.

## Provider Task Parameter Bindings

Use this mode for providers that expose separate query/document embedding tasks.
Do not configure prefix variables for these bindings.

Jina example:

```env
EMBEDDING_BINDING=jina
EMBEDDING_ASYMMETRIC=true
EMBEDDING_MODEL=jina-embeddings-v4
```

Gemini example:

```env
EMBEDDING_BINDING=gemini
EMBEDDING_ASYMMETRIC=true
EMBEDDING_MODEL=gemini-embedding-001
```

VoyageAI example:

```env
EMBEDDING_BINDING=voyageai
EMBEDDING_ASYMMETRIC=true
EMBEDDING_MODEL=voyage-3
```

If `EMBEDDING_QUERY_PREFIX` or `EMBEDDING_DOCUMENT_PREFIX` is also configured
for these bindings, LightRAG logs a warning and ignores the prefixes.

## Text Task Prefix Bindings

Use this mode for embedding models that expect task instructions in the input
text, such as models whose card documents prefixes like `search_query:`,
`search_document:`, `query:`, or `passage:`. Do not enable this mode just
because the model is served through `openai`, `azure_openai`, or `ollama`.

Both prefix variables must be explicitly configured:

```env
EMBEDDING_ASYMMETRIC=true
EMBEDDING_QUERY_PREFIX="search_query: "
EMBEDDING_DOCUMENT_PREFIX="search_document: "
```

If one side should intentionally have no prefix, use the sentinel `NO_PREFIX`:

```env
EMBEDDING_ASYMMETRIC=true
EMBEDDING_QUERY_PREFIX="search_query: "
EMBEDDING_DOCUMENT_PREFIX=NO_PREFIX
```

`NO_PREFIX` is converted to an empty string internally. It is different from an
unset variable: it means the side was reviewed and intentionally left without a
prefix.

At least one side must have a non-empty prefix. This is invalid:

```env
EMBEDDING_ASYMMETRIC=true
EMBEDDING_QUERY_PREFIX=NO_PREFIX
EMBEDDING_DOCUMENT_PREFIX=NO_PREFIX
```

## Invalid Empty Prefixes

Do not use an empty environment value for an intentional empty prefix:

```env
EMBEDDING_DOCUMENT_PREFIX=
```

Use `NO_PREFIX` instead. Empty values are rejected because shell, `.env`, and
Docker Compose handling can make empty strings indistinguishable from accidental
missing configuration.

## Validation Summary

| Configuration | Result |
| --- | --- |
| `EMBEDDING_ASYMMETRIC` unset | Symmetric mode; prefixes ignored with a warning. |
| `EMBEDDING_ASYMMETRIC=false` | Symmetric mode; prefixes ignored with a warning. |
| Instruction-free model such as `BAAI/bge-m3`, `text-embedding-3-small`, `mistral-embed`, base GTE, or Jina v2 | Keep symmetric mode; do not configure prefixes or provider tasks unless the model card says to. |
| `EMBEDDING_ASYMMETRIC=true` with `jina`/`gemini`/`voyageai` | Provider task mode; prefixes ignored with a warning. |
| `EMBEDDING_ASYMMETRIC=true` with `openai`/`azure_openai`/`ollama` and both prefix variables configured | Prefix mode. |
| Prefix mode with a missing prefix variable | Startup error; use a real prefix or `NO_PREFIX`. |
| Prefix mode with both sides `NO_PREFIX` | Startup error; no asymmetric behavior would occur. |
| Prefix variable set to an empty value | Startup error; use `NO_PREFIX`. |

Any valid change from one asymmetric embedding configuration to another still
requires clearing the workspace data and re-indexing the source files.