# Role-Specific LLM/VLM Configuration Guide

LightRAG supports configuring different LLMs or VLMs for different processing stages. This mechanism is useful when using a lower-cost model for extraction, a stronger model for final answers, or a dedicated vision-language model for multimodal analysis.

## Role Overview

Four roles are currently supported:

| Role | Purpose |
| --- | --- |
| `EXTRACT` | Entity/relation extraction and entity/relation description summarization. |
| `KEYWORD` | Query keyword extraction for high-level / low-level keyword generation before retrieval. |
| `QUERY` | Final QA, regular queries, bypass queries, and the query path of the Ollama-compatible API. |
| `VLM` | Multimodal analysis stage for VLM analysis of images, tables, formulas, and similar content. |

If a role has no dedicated configuration, LightRAG uses the base `LLM_*` configuration.

## Base LLM Configuration

The base configuration defines the default LLM provider, model, service endpoint, authentication information, and concurrency control:

```env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key

# Default timeout for all LLM requests
LLM_TIMEOUT=180

# Default maximum concurrency for all LLM calls
MAX_ASYNC=4
```

Common fields:

| Variable | Description |
| --- | --- |
| `LLM_BINDING` | Base LLM provider. Supported values are `openai`, `ollama`, `lollms`, `azure_openai`, `bedrock`, and `gemini`. |
| `LLM_MODEL` | Base model name. For Azure OpenAI, this is usually the deployment name. |
| `LLM_BINDING_HOST` | Base provider endpoint. For SDK default endpoints, use the corresponding sentinel, such as `DEFAULT_GEMINI_ENDPOINT` or `DEFAULT_BEDROCK_ENDPOINT`. |
| `LLM_BINDING_API_KEY` | Base API key. Bedrock does not use this field. |
| `LLM_TIMEOUT` | Base LLM timeout. A role inherits it when no role timeout is set. |
| `MAX_ASYNC` | Base maximum LLM concurrency. A role inherits it when `{ROLE}_MAX_ASYNC_LLM` is not set. |

## Role Override Variables

Each role can override the binding, model, endpoint, API key, concurrency, and timeout:

```env
QUERY_LLM_BINDING=openai
QUERY_LLM_MODEL=gpt-5
QUERY_LLM_BINDING_HOST=https://api.openai.com/v1
QUERY_LLM_BINDING_API_KEY=your_query_api_key
QUERY_MAX_ASYNC_LLM=2
QUERY_LLM_TIMEOUT=240
```

Variable format:

| Variable | Description |
| --- | --- |
| `{ROLE}_LLM_BINDING` | Overrides the role provider. `ROLE` can be `EXTRACT`, `KEYWORD`, `QUERY`, or `VLM`. |
| `{ROLE}_LLM_MODEL` | Overrides the role model name. |
| `{ROLE}_LLM_BINDING_HOST` | Overrides the role endpoint. |
| `{ROLE}_LLM_BINDING_API_KEY` | Overrides the role API key. Bedrock does not support it. |
| `{ROLE}_MAX_ASYNC_LLM` | Overrides the role maximum concurrency. Inherits `MAX_ASYNC` when unset. |
| `{ROLE}_LLM_TIMEOUT` | Overrides the role timeout. Inherits `LLM_TIMEOUT` when unset. |

## Provider Option Overrides

Provider-specific options use the following format:

```env
{ROLE}_{PROVIDER_PREFIX}_{FIELD}
```

Examples:

```env
# Override only the OpenAI reasoning effort for the QUERY role
QUERY_OPENAI_LLM_REASONING_EFFORT=medium

# Override only Bedrock generation parameters for the EXTRACT role
EXTRACT_BEDROCK_LLM_TEMPERATURE=0.0
EXTRACT_BEDROCK_LLM_MAX_TOKENS=2048

# Override only Gemini generation parameters for the VLM role
VLM_GEMINI_LLM_MAX_OUTPUT_TOKENS=4096
VLM_GEMINI_LLM_TEMPERATURE=0.2
```

Common provider prefixes:

| Provider | Base option prefix | Role option example |
| --- | --- | --- |
| `openai` / `azure_openai` | `OPENAI_LLM_*` | `QUERY_OPENAI_LLM_REASONING_EFFORT` |
| `ollama` | `OLLAMA_LLM_*` | `EXTRACT_OLLAMA_LLM_NUM_PREDICT` |
| `lollms` | Uses the Ollama-compatible option set | `QUERY_OLLAMA_LLM_TEMPERATURE` |
| `bedrock` | `BEDROCK_LLM_*` | `EXTRACT_BEDROCK_LLM_MAX_TOKENS` |
| `gemini` | `GEMINI_LLM_*` | `VLM_GEMINI_LLM_THINKING_CONFIG` |

## Inheritance Rules

### Overrides Within the Same Provider

If a role does not set `{ROLE}_LLM_BINDING`, or sets it to the same value as the base `LLM_BINDING`, the role inherits the base configuration:

- Inherits `LLM_MODEL` when `{ROLE}_LLM_MODEL` is not set.
- Inherits `LLM_BINDING_HOST` when `{ROLE}_LLM_BINDING_HOST` is not set.
- Inherits `LLM_BINDING_API_KEY` when `{ROLE}_LLM_BINDING_API_KEY` is not set.
- Inherits `LLM_TIMEOUT` when `{ROLE}_LLM_TIMEOUT` is not set.
- Inherits `MAX_ASYNC` when `{ROLE}_MAX_ASYNC_LLM` is not set.
- Provider options first inherit the base provider options, then apply role-specific provider options.

Therefore, when you only want to change the model within the same provider, you only need to set the model name:

```env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key
OPENAI_LLM_REASONING_EFFORT=minimal

# QUERY inherits host, API key, timeout, concurrency, and OPENAI_LLM_REASONING_EFFORT
QUERY_LLM_MODEL=gpt-5
```

### Cross-Provider Overrides

If a role's `{ROLE}_LLM_BINDING` differs from the base `LLM_BINDING`, it is a cross-provider configuration. The current rules are:

- `{ROLE}_LLM_MODEL` must be set.
- Non-Bedrock providers must set `{ROLE}_LLM_BINDING_API_KEY`.
- If `{ROLE}_LLM_BINDING_HOST` is not set, LightRAG tries to use that provider's default host.
- Provider options do not inherit base provider options. They start empty and only apply role-specific provider options.

Example: use Ollama as the base for local extraction, then use OpenAI for final answers:

```env
LLM_BINDING=ollama
LLM_MODEL=qwen3.5:9b
LLM_BINDING_HOST=http://localhost:11434
OLLAMA_LLM_NUM_CTX=32768

QUERY_LLM_BINDING=openai
QUERY_LLM_MODEL=gpt-5-mini
QUERY_LLM_BINDING_HOST=https://api.openai.com/v1
QUERY_LLM_BINDING_API_KEY=your_openai_api_key
QUERY_OPENAI_LLM_REASONING_EFFORT=minimal
```

For cross-provider configurations, explicitly setting `{ROLE}_LLM_BINDING_HOST` is recommended to avoid confusion between the default host and the base provider endpoint.

### Bedrock Authentication Rules

Bedrock does not use `LLM_BINDING_API_KEY` and does not support `{ROLE}_LLM_BINDING_API_KEY`. Available authentication methods are:

- Global SigV4: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, and `AWS_REGION`.
- Role-level SigV4: `{ROLE}_AWS_ACCESS_KEY_ID`, `{ROLE}_AWS_SECRET_ACCESS_KEY`, `{ROLE}_AWS_SESSION_TOKEN`, and `{ROLE}_AWS_REGION`.
- Process-level bearer token: `AWS_BEARER_TOKEN_BEDROCK`. This is an AWS SDK process-level setting and cannot be overridden per role.

Role-level Bedrock example:

```env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_openai_api_key

EXTRACT_LLM_BINDING=bedrock
EXTRACT_LLM_MODEL=us.amazon.nova-lite-v1:0
EXTRACT_LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT
EXTRACT_AWS_REGION=us-west-2
EXTRACT_AWS_ACCESS_KEY_ID=your_extract_access_key
EXTRACT_AWS_SECRET_ACCESS_KEY=your_extract_secret_key
EXTRACT_AWS_SESSION_TOKEN=your_optional_session_token
EXTRACT_BEDROCK_LLM_TEMPERATURE=0.0
EXTRACT_BEDROCK_LLM_MAX_TOKENS=2048
```

## Provider Behavior Matrix

| Provider | Role-level host/base_url | Role-level API key | Authentication limitations |
| --- | --- | --- | --- |
| `openai` | Supported, passed to the OpenAI-compatible client through `{ROLE}_LLM_BINDING_HOST`. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base `LLM_BINDING_API_KEY`. | Currently mainly API key / Bearer mode. |
| `ollama` | Supported, passed to the Ollama client through `{ROLE}_LLM_BINDING_HOST`. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base key. If no key reaches the lower layer, it falls back to `OLLAMA_API_KEY`. | Bearer header. |
| `lollms` | Supported, using `{ROLE}_LLM_BINDING_HOST` as `base_url`. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base key. | Bearer header. |
| `azure_openai` | Supported, using `{ROLE}_LLM_BINDING_HOST` as the Azure endpoint. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base key and may also fall back to `AZURE_OPENAI_API_KEY`. | `AZURE_OPENAI_API_VERSION` is a global environment variable and does not support role-level overrides. |
| `bedrock` | Supported, using `{ROLE}_LLM_BINDING_HOST` as `endpoint_url`; `DEFAULT_BEDROCK_ENDPOINT` means letting the AWS SDK choose. | Generic API keys are not supported. | Uses global or role-level SigV4. `AWS_BEARER_TOKEN_BEDROCK` is process-level and cannot be overridden per role. |
| `gemini` | Supported, passed to the Google GenAI client through `{ROLE}_LLM_BINDING_HOST`; `DEFAULT_GEMINI_ENDPOINT` means using the SDK default endpoint. | AI Studio mode supports `{ROLE}_LLM_BINDING_API_KEY`. | Vertex AI is controlled by `GOOGLE_GENAI_USE_VERTEXAI`, `GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`; all are process-level settings. |

## Recommended Configuration Patterns

### 1. Same Provider, Only Change the Model

Suitable when using the same OpenAI key and endpoint, but using a stronger model for final answers:

```env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key
OPENAI_LLM_REASONING_EFFORT=minimal

QUERY_LLM_MODEL=gpt-5
QUERY_MAX_ASYNC_LLM=2
```

`QUERY` inherits the base host, API key, and `OPENAI_LLM_REASONING_EFFORT`.

### 2. Same Provider, Change the Model and Tune Options

Suitable when the base model is used for extraction and final answers use a higher reasoning effort:

```env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key
OPENAI_LLM_REASONING_EFFORT=minimal
OPENAI_LLM_MAX_COMPLETION_TOKENS=4096

QUERY_LLM_MODEL=gpt-5
QUERY_OPENAI_LLM_REASONING_EFFORT=medium
QUERY_OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
QUERY_LLM_TIMEOUT=240
```

### 3. Same Provider with Different Endpoints and API Keys

Suitable when all roles use the `openai` binding, but some roles access the official OpenAI API while others access a local vLLM, SGLang, OpenRouter, or another OpenAI-compatible endpoint. In the example below:

- `EXTRACT` uses the official OpenAI `gpt-5-mini`.
- `QUERY` uses the official OpenAI `gpt-5.4` with a separate OpenAI key.
- `KEYWORD` uses `Qwen3.5-35B-A3B` deployed by local vLLM.

```env
###########################################################################
# Base LLM fallback. Keep it aligned with EXTRACT so unspecified roles still
# have a valid OpenAI configuration.
###########################################################################
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_extract_openai_api_key
LLM_TIMEOUT=180
MAX_ASYNC=4

###########################################################################
# IMPORTANT:
# Do not set global OPENAI_LLM_REASONING_EFFORT here if any same-provider role
# points to a local OpenAI-compatible server that does not support it.
# Use role-specific OPENAI options instead.
###########################################################################
# OPENAI_LLM_REASONING_EFFORT=none

###########################################################################
# EXTRACT: OpenAI official API, gpt-5-mini
###########################################################################
EXTRACT_LLM_BINDING=openai
EXTRACT_LLM_MODEL=gpt-5-mini
EXTRACT_LLM_BINDING_HOST=https://api.openai.com/v1
EXTRACT_LLM_BINDING_API_KEY=your_extract_openai_api_key
EXTRACT_OPENAI_LLM_REASONING_EFFORT=low
EXTRACT_OPENAI_LLM_MAX_COMPLETION_TOKENS=4096
EXTRACT_MAX_ASYNC_LLM=4
EXTRACT_LLM_TIMEOUT=180

###########################################################################
# QUERY: OpenAI official API, gpt-5.4, separate API key
###########################################################################
QUERY_LLM_BINDING=openai
QUERY_LLM_MODEL=gpt-5.4
QUERY_LLM_BINDING_HOST=https://api.openai.com/v1
QUERY_LLM_BINDING_API_KEY=your_query_openai_api_key
QUERY_OPENAI_LLM_REASONING_EFFORT=medium
QUERY_OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
QUERY_MAX_ASYNC_LLM=2
QUERY_LLM_TIMEOUT=240

###########################################################################
# KEYWORD: local vLLM OpenAI-compatible endpoint, Qwen3.5-35B-A3B
###########################################################################
KEYWORD_LLM_BINDING=openai
KEYWORD_LLM_MODEL=Qwen3.5-35B-A3B
KEYWORD_LLM_BINDING_HOST=http://localhost:8000/v1
# If vLLM was started with --api-key, use the same value here.
# If vLLM has no auth, still set a non-empty dummy value to avoid falling
# back to the official OpenAI key.
KEYWORD_LLM_BINDING_API_KEY=local-vllm-api-key
KEYWORD_OPENAI_LLM_MAX_TOKENS=2048
# Optional for Qwen-style models served by vLLM when you want to disable thinking.
KEYWORD_OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}'
KEYWORD_MAX_ASYNC_LLM=4
KEYWORD_LLM_TIMEOUT=180
```

This pattern is not cross-provider because all three roles use the `openai` binding. LightRAG passes each role's `*_LLM_BINDING_HOST` and `*_LLM_BINDING_API_KEY` to the OpenAI-compatible client separately.

Note: provider options within the same provider inherit the base `OPENAI_LLM_*`. If the local vLLM server does not support official OpenAI parameters such as `reasoning_effort`, do not set the global `OPENAI_LLM_REASONING_EFFORT`; use role-level variables such as `EXTRACT_OPENAI_LLM_REASONING_EFFORT` and `QUERY_OPENAI_LLM_REASONING_EFFORT` instead.

### 4. One Role Crosses Provider

Suitable when the base uses an official OpenAI model and only keyword extraction uses local Ollama:

```env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_openai_api_key
OPENAI_LLM_REASONING_EFFORT=medium

KEYWORD_LLM_BINDING=ollama
KEYWORD_LLM_MODEL=qwen3.5:9b
KEYWORD_LLM_BINDING_HOST=http://localhost:11434
KEYWORD_LLM_BINDING_API_KEY=ollama-local-key
KEYWORD_OLLAMA_LLM_NUM_CTX=32768
```

For cross-provider configurations, Ollama options do not inherit OpenAI options. For local Ollama, `KEYWORD_LLM_BINDING_API_KEY` can usually use a placeholder value; the current cross-provider validation requires non-Bedrock roles to explicitly provide a role-level API key.

### 5. Specify a Dedicated Multimodal Model for VLM

Suitable when text tasks use a cheaper model and multimodal analysis uses a vision-language model:

```env
VLM_PROCESS_ENABLE=true

LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_api_key

VLM_LLM_BINDING=openai
VLM_LLM_MODEL=gpt-4o
VLM_OPENAI_LLM_MAX_TOKENS=4096
VLM_MAX_ASYNC_LLM=2
VLM_LLM_TIMEOUT=240
```

If VLM uses the same provider and key, `VLM_LLM_BINDING_HOST` and `VLM_LLM_BINDING_API_KEY` can be omitted.

`VLM_PROCESS_ENABLE` is the master switch for multimodal analysis. When `false`, the pipeline emits a warning and skips every multimodal item without invoking the VLM. When `true`, the effective VLM binding (`VLM_LLM_BINDING` if set, otherwise `LLM_BINDING`) must support image inputs. The following providers are vision-capable: `openai`, `azure_openai`, `gemini`, `bedrock`, `ollama`, `anthropic`. `lollms` is rejected at startup because it cannot accept image inputs.

### 6. Bedrock Role-Level SigV4 Credentials

Suitable when only one role accesses Bedrock and uses independent IAM/STS credentials:

```env
LLM_BINDING=openai
LLM_MODEL=gpt-5-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your_openai_api_key

QUERY_LLM_BINDING=bedrock
QUERY_LLM_MODEL=us.amazon.nova-lite-v1:0
QUERY_LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT
QUERY_AWS_REGION=us-east-1
QUERY_AWS_ACCESS_KEY_ID=your_query_access_key
QUERY_AWS_SECRET_ACCESS_KEY=your_query_secret_key
QUERY_AWS_SESSION_TOKEN=your_optional_session_token
QUERY_BEDROCK_LLM_MAX_TOKENS=4096
QUERY_BEDROCK_LLM_TEMPERATURE=0.2
```

Do not set `QUERY_LLM_BINDING_API_KEY`; Bedrock rejects that configuration.

## Caveats

- Within the same provider, provider options such as `OPENAI_LLM_REASONING_EFFORT`, `OPENAI_LLM_MAX_TOKENS`, `OLLAMA_LLM_NUM_CTX`, and `GEMINI_LLM_THINKING_CONFIG` are inherited automatically.
- There is currently no clean role-level semantic for "unsetting an inherited provider option". If a model in a same-provider role does not support a base option, explicitly override that option for the role with a supported value, or configure the role as cross-provider and set only the role-specific provider options it supports.
- `AZURE_OPENAI_DEPLOYMENT` and `AZURE_OPENAI_API_VERSION` for `azure_openai` are global environment variables. If `AZURE_OPENAI_DEPLOYMENT` is set, it may take precedence over the role model name.
- Gemini Vertex AI mode is controlled by process-level Google environment variables. In the same LightRAG process, some roles cannot use Vertex AI while others use AI Studio API keys.
- In Docker/Compose, `LLM_BINDING_HOST` usually needs to use a container-reachable address such as `host.docker.internal`; role-level hosts follow the same principle.
- Restart LightRAG Server after modifying `.env`. Some IDE terminals preload `.env`, so opening a new terminal session is recommended to confirm that environment variables take effect.