# Role-Specific LLM/VLM Configuration Guide LightRAG supports configuring different LLMs or VLMs for different processing stages. This mechanism is useful when using a lower-cost model for extraction, a stronger model for final answers, or a dedicated vision-language model for multimodal analysis. ## Role Overview Four roles are currently supported: | Role | Purpose | | --- | --- | | `EXTRACT` | Entity/relation extraction and entity/relation description summarization. | | `KEYWORD` | Query keyword extraction for high-level / low-level keyword generation before retrieval. | | `QUERY` | Final QA, regular queries, bypass queries, and the query path of the Ollama-compatible API. | | `VLM` | Multimodal analysis stage for VLM analysis of images, tables, formulas, and similar content. | If a role has no dedicated configuration, LightRAG uses the base `LLM_*` configuration. ## Base LLM Configuration The base configuration defines the default LLM provider, model, service endpoint, authentication information, and concurrency control: ```env LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_api_key # Default timeout for all LLM requests LLM_TIMEOUT=180 # Default maximum concurrency for all LLM calls MAX_ASYNC=4 ``` Common fields: | Variable | Description | | --- | --- | | `LLM_BINDING` | Base LLM provider. Supported values are `openai`, `ollama`, `lollms`, `azure_openai`, `bedrock`, and `gemini`. | | `LLM_MODEL` | Base model name. For Azure OpenAI, this is usually the deployment name. | | `LLM_BINDING_HOST` | Base provider endpoint. For SDK default endpoints, use the corresponding sentinel, such as `DEFAULT_GEMINI_ENDPOINT` or `DEFAULT_BEDROCK_ENDPOINT`. | | `LLM_BINDING_API_KEY` | Base API key. Bedrock does not use this field. | | `LLM_TIMEOUT` | Base LLM timeout. A role inherits it when no role timeout is set. | | `MAX_ASYNC` | Base maximum LLM concurrency. A role inherits it when `{ROLE}_MAX_ASYNC_LLM` is not set. | ## Role Override Variables Each role can override the binding, model, endpoint, API key, concurrency, and timeout: ```env QUERY_LLM_BINDING=openai QUERY_LLM_MODEL=gpt-5 QUERY_LLM_BINDING_HOST=https://api.openai.com/v1 QUERY_LLM_BINDING_API_KEY=your_query_api_key QUERY_MAX_ASYNC_LLM=2 QUERY_LLM_TIMEOUT=240 ``` Variable format: | Variable | Description | | --- | --- | | `{ROLE}_LLM_BINDING` | Overrides the role provider. `ROLE` can be `EXTRACT`, `KEYWORD`, `QUERY`, or `VLM`. | | `{ROLE}_LLM_MODEL` | Overrides the role model name. | | `{ROLE}_LLM_BINDING_HOST` | Overrides the role endpoint. | | `{ROLE}_LLM_BINDING_API_KEY` | Overrides the role API key. Bedrock does not support it. | | `{ROLE}_MAX_ASYNC_LLM` | Overrides the role maximum concurrency. Inherits `MAX_ASYNC` when unset. | | `{ROLE}_LLM_TIMEOUT` | Overrides the role timeout. Inherits `LLM_TIMEOUT` when unset. | ## Provider Option Overrides Provider-specific options use the following format: ```env {ROLE}_{PROVIDER_PREFIX}_{FIELD} ``` Examples: ```env # Override only the OpenAI reasoning effort for the QUERY role QUERY_OPENAI_LLM_REASONING_EFFORT=medium # Override only Bedrock generation parameters for the EXTRACT role EXTRACT_BEDROCK_LLM_TEMPERATURE=0.0 EXTRACT_BEDROCK_LLM_MAX_TOKENS=2048 # Override only Gemini generation parameters for the VLM role VLM_GEMINI_LLM_MAX_OUTPUT_TOKENS=4096 VLM_GEMINI_LLM_TEMPERATURE=0.2 ``` Common provider prefixes: | Provider | Base option prefix | Role option example | | --- | --- | --- | | `openai` / `azure_openai` | `OPENAI_LLM_*` | `QUERY_OPENAI_LLM_REASONING_EFFORT` | | `ollama` | `OLLAMA_LLM_*` | `EXTRACT_OLLAMA_LLM_NUM_PREDICT` | | `lollms` | Uses the Ollama-compatible option set | `QUERY_OLLAMA_LLM_TEMPERATURE` | | `bedrock` | `BEDROCK_LLM_*` | `EXTRACT_BEDROCK_LLM_MAX_TOKENS` | | `gemini` | `GEMINI_LLM_*` | `VLM_GEMINI_LLM_THINKING_CONFIG` | ## Inheritance Rules ### Overrides Within the Same Provider If a role does not set `{ROLE}_LLM_BINDING`, or sets it to the same value as the base `LLM_BINDING`, the role inherits the base configuration: - Inherits `LLM_MODEL` when `{ROLE}_LLM_MODEL` is not set. - Inherits `LLM_BINDING_HOST` when `{ROLE}_LLM_BINDING_HOST` is not set. - Inherits `LLM_BINDING_API_KEY` when `{ROLE}_LLM_BINDING_API_KEY` is not set. - Inherits `LLM_TIMEOUT` when `{ROLE}_LLM_TIMEOUT` is not set. - Inherits `MAX_ASYNC` when `{ROLE}_MAX_ASYNC_LLM` is not set. - Provider options first inherit the base provider options, then apply role-specific provider options. Therefore, when you only want to change the model within the same provider, you only need to set the model name: ```env LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_api_key OPENAI_LLM_REASONING_EFFORT=minimal # QUERY inherits host, API key, timeout, concurrency, and OPENAI_LLM_REASONING_EFFORT QUERY_LLM_MODEL=gpt-5 ``` ### Cross-Provider Overrides If a role's `{ROLE}_LLM_BINDING` differs from the base `LLM_BINDING`, it is a cross-provider configuration. The current rules are: - `{ROLE}_LLM_MODEL` must be set. - Non-Bedrock providers must set `{ROLE}_LLM_BINDING_API_KEY`. - If `{ROLE}_LLM_BINDING_HOST` is not set, LightRAG tries to use that provider's default host. - Provider options do not inherit base provider options. They start empty and only apply role-specific provider options. Example: use Ollama as the base for local extraction, then use OpenAI for final answers: ```env LLM_BINDING=ollama LLM_MODEL=qwen3.5:9b LLM_BINDING_HOST=http://localhost:11434 OLLAMA_LLM_NUM_CTX=32768 QUERY_LLM_BINDING=openai QUERY_LLM_MODEL=gpt-5-mini QUERY_LLM_BINDING_HOST=https://api.openai.com/v1 QUERY_LLM_BINDING_API_KEY=your_openai_api_key QUERY_OPENAI_LLM_REASONING_EFFORT=minimal ``` For cross-provider configurations, explicitly setting `{ROLE}_LLM_BINDING_HOST` is recommended to avoid confusion between the default host and the base provider endpoint. ### Bedrock Authentication Rules Bedrock does not use `LLM_BINDING_API_KEY` and does not support `{ROLE}_LLM_BINDING_API_KEY`. Available authentication methods are: - Global SigV4: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, and `AWS_REGION`. - Role-level SigV4: `{ROLE}_AWS_ACCESS_KEY_ID`, `{ROLE}_AWS_SECRET_ACCESS_KEY`, `{ROLE}_AWS_SESSION_TOKEN`, and `{ROLE}_AWS_REGION`. - Process-level bearer token: `AWS_BEARER_TOKEN_BEDROCK`. This is an AWS SDK process-level setting and cannot be overridden per role. Role-level Bedrock example: ```env LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_openai_api_key EXTRACT_LLM_BINDING=bedrock EXTRACT_LLM_MODEL=us.amazon.nova-lite-v1:0 EXTRACT_LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT EXTRACT_AWS_REGION=us-west-2 EXTRACT_AWS_ACCESS_KEY_ID=your_extract_access_key EXTRACT_AWS_SECRET_ACCESS_KEY=your_extract_secret_key EXTRACT_AWS_SESSION_TOKEN=your_optional_session_token EXTRACT_BEDROCK_LLM_TEMPERATURE=0.0 EXTRACT_BEDROCK_LLM_MAX_TOKENS=2048 ``` ## Provider Behavior Matrix | Provider | Role-level host/base_url | Role-level API key | Authentication limitations | | --- | --- | --- | --- | | `openai` | Supported, passed to the OpenAI-compatible client through `{ROLE}_LLM_BINDING_HOST`. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base `LLM_BINDING_API_KEY`. | Currently mainly API key / Bearer mode. | | `ollama` | Supported, passed to the Ollama client through `{ROLE}_LLM_BINDING_HOST`. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base key. If no key reaches the lower layer, it falls back to `OLLAMA_API_KEY`. | Bearer header. | | `lollms` | Supported, using `{ROLE}_LLM_BINDING_HOST` as `base_url`. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base key. | Bearer header. | | `azure_openai` | Supported, using `{ROLE}_LLM_BINDING_HOST` as the Azure endpoint. | Supports `{ROLE}_LLM_BINDING_API_KEY`; when unset within the same provider, it inherits the base key and may also fall back to `AZURE_OPENAI_API_KEY`. | `AZURE_OPENAI_API_VERSION` is a global environment variable and does not support role-level overrides. | | `bedrock` | Supported, using `{ROLE}_LLM_BINDING_HOST` as `endpoint_url`; `DEFAULT_BEDROCK_ENDPOINT` means letting the AWS SDK choose. | Generic API keys are not supported. | Uses global or role-level SigV4. `AWS_BEARER_TOKEN_BEDROCK` is process-level and cannot be overridden per role. | | `gemini` | Supported, passed to the Google GenAI client through `{ROLE}_LLM_BINDING_HOST`; `DEFAULT_GEMINI_ENDPOINT` means using the SDK default endpoint. | AI Studio mode supports `{ROLE}_LLM_BINDING_API_KEY`. | Vertex AI is controlled by `GOOGLE_GENAI_USE_VERTEXAI`, `GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`; all are process-level settings. | ## Recommended Configuration Patterns ### 1. Same Provider, Only Change the Model Suitable when using the same OpenAI key and endpoint, but using a stronger model for final answers: ```env LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_api_key OPENAI_LLM_REASONING_EFFORT=minimal QUERY_LLM_MODEL=gpt-5 QUERY_MAX_ASYNC_LLM=2 ``` `QUERY` inherits the base host, API key, and `OPENAI_LLM_REASONING_EFFORT`. ### 2. Same Provider, Change the Model and Tune Options Suitable when the base model is used for extraction and final answers use a higher reasoning effort: ```env LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_api_key OPENAI_LLM_REASONING_EFFORT=minimal OPENAI_LLM_MAX_COMPLETION_TOKENS=4096 QUERY_LLM_MODEL=gpt-5 QUERY_OPENAI_LLM_REASONING_EFFORT=medium QUERY_OPENAI_LLM_MAX_COMPLETION_TOKENS=9000 QUERY_LLM_TIMEOUT=240 ``` ### 3. Same Provider with Different Endpoints and API Keys Suitable when all roles use the `openai` binding, but some roles access the official OpenAI API while others access a local vLLM, SGLang, OpenRouter, or another OpenAI-compatible endpoint. In the example below: - `EXTRACT` uses the official OpenAI `gpt-5-mini`. - `QUERY` uses the official OpenAI `gpt-5.4` with a separate OpenAI key. - `KEYWORD` uses `Qwen3.5-35B-A3B` deployed by local vLLM. ```env ########################################################################### # Base LLM fallback. Keep it aligned with EXTRACT so unspecified roles still # have a valid OpenAI configuration. ########################################################################### LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_extract_openai_api_key LLM_TIMEOUT=180 MAX_ASYNC=4 ########################################################################### # IMPORTANT: # Do not set global OPENAI_LLM_REASONING_EFFORT here if any same-provider role # points to a local OpenAI-compatible server that does not support it. # Use role-specific OPENAI options instead. ########################################################################### # OPENAI_LLM_REASONING_EFFORT=none ########################################################################### # EXTRACT: OpenAI official API, gpt-5-mini ########################################################################### EXTRACT_LLM_BINDING=openai EXTRACT_LLM_MODEL=gpt-5-mini EXTRACT_LLM_BINDING_HOST=https://api.openai.com/v1 EXTRACT_LLM_BINDING_API_KEY=your_extract_openai_api_key EXTRACT_OPENAI_LLM_REASONING_EFFORT=low EXTRACT_OPENAI_LLM_MAX_COMPLETION_TOKENS=4096 EXTRACT_MAX_ASYNC_LLM=4 EXTRACT_LLM_TIMEOUT=180 ########################################################################### # QUERY: OpenAI official API, gpt-5.4, separate API key ########################################################################### QUERY_LLM_BINDING=openai QUERY_LLM_MODEL=gpt-5.4 QUERY_LLM_BINDING_HOST=https://api.openai.com/v1 QUERY_LLM_BINDING_API_KEY=your_query_openai_api_key QUERY_OPENAI_LLM_REASONING_EFFORT=medium QUERY_OPENAI_LLM_MAX_COMPLETION_TOKENS=9000 QUERY_MAX_ASYNC_LLM=2 QUERY_LLM_TIMEOUT=240 ########################################################################### # KEYWORD: local vLLM OpenAI-compatible endpoint, Qwen3.5-35B-A3B ########################################################################### KEYWORD_LLM_BINDING=openai KEYWORD_LLM_MODEL=Qwen3.5-35B-A3B KEYWORD_LLM_BINDING_HOST=http://localhost:8000/v1 # If vLLM was started with --api-key, use the same value here. # If vLLM has no auth, still set a non-empty dummy value to avoid falling # back to the official OpenAI key. KEYWORD_LLM_BINDING_API_KEY=local-vllm-api-key KEYWORD_OPENAI_LLM_MAX_TOKENS=2048 # Optional for Qwen-style models served by vLLM when you want to disable thinking. KEYWORD_OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}' KEYWORD_MAX_ASYNC_LLM=4 KEYWORD_LLM_TIMEOUT=180 ``` This pattern is not cross-provider because all three roles use the `openai` binding. LightRAG passes each role's `*_LLM_BINDING_HOST` and `*_LLM_BINDING_API_KEY` to the OpenAI-compatible client separately. Note: provider options within the same provider inherit the base `OPENAI_LLM_*`. If the local vLLM server does not support official OpenAI parameters such as `reasoning_effort`, do not set the global `OPENAI_LLM_REASONING_EFFORT`; use role-level variables such as `EXTRACT_OPENAI_LLM_REASONING_EFFORT` and `QUERY_OPENAI_LLM_REASONING_EFFORT` instead. ### 4. One Role Crosses Provider Suitable when the base uses an official OpenAI model and only keyword extraction uses local Ollama: ```env LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_openai_api_key OPENAI_LLM_REASONING_EFFORT=medium KEYWORD_LLM_BINDING=ollama KEYWORD_LLM_MODEL=qwen3.5:9b KEYWORD_LLM_BINDING_HOST=http://localhost:11434 KEYWORD_LLM_BINDING_API_KEY=ollama-local-key KEYWORD_OLLAMA_LLM_NUM_CTX=32768 ``` For cross-provider configurations, Ollama options do not inherit OpenAI options. For local Ollama, `KEYWORD_LLM_BINDING_API_KEY` can usually use a placeholder value; the current cross-provider validation requires non-Bedrock roles to explicitly provide a role-level API key. ### 5. Specify a Dedicated Multimodal Model for VLM Suitable when text tasks use a cheaper model and multimodal analysis uses a vision-language model: ```env VLM_PROCESS_ENABLE=true LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_api_key VLM_LLM_BINDING=openai VLM_LLM_MODEL=gpt-4o VLM_OPENAI_LLM_MAX_TOKENS=4096 VLM_MAX_ASYNC_LLM=2 VLM_LLM_TIMEOUT=240 ``` If VLM uses the same provider and key, `VLM_LLM_BINDING_HOST` and `VLM_LLM_BINDING_API_KEY` can be omitted. `VLM_PROCESS_ENABLE` is the master switch for multimodal analysis. When `false`, the pipeline emits a warning and skips every multimodal item without invoking the VLM. When `true`, the effective VLM binding (`VLM_LLM_BINDING` if set, otherwise `LLM_BINDING`) must support image inputs. The following providers are vision-capable: `openai`, `azure_openai`, `gemini`, `bedrock`, `ollama`, `anthropic`. `lollms` is rejected at startup because it cannot accept image inputs. ### 6. Bedrock Role-Level SigV4 Credentials Suitable when only one role accesses Bedrock and uses independent IAM/STS credentials: ```env LLM_BINDING=openai LLM_MODEL=gpt-5-mini LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_openai_api_key QUERY_LLM_BINDING=bedrock QUERY_LLM_MODEL=us.amazon.nova-lite-v1:0 QUERY_LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT QUERY_AWS_REGION=us-east-1 QUERY_AWS_ACCESS_KEY_ID=your_query_access_key QUERY_AWS_SECRET_ACCESS_KEY=your_query_secret_key QUERY_AWS_SESSION_TOKEN=your_optional_session_token QUERY_BEDROCK_LLM_MAX_TOKENS=4096 QUERY_BEDROCK_LLM_TEMPERATURE=0.2 ``` Do not set `QUERY_LLM_BINDING_API_KEY`; Bedrock rejects that configuration. ## Caveats - Within the same provider, provider options such as `OPENAI_LLM_REASONING_EFFORT`, `OPENAI_LLM_MAX_TOKENS`, `OLLAMA_LLM_NUM_CTX`, and `GEMINI_LLM_THINKING_CONFIG` are inherited automatically. - There is currently no clean role-level semantic for "unsetting an inherited provider option". If a model in a same-provider role does not support a base option, explicitly override that option for the role with a supported value, or configure the role as cross-provider and set only the role-specific provider options it supports. - `AZURE_OPENAI_DEPLOYMENT` and `AZURE_OPENAI_API_VERSION` for `azure_openai` are global environment variables. If `AZURE_OPENAI_DEPLOYMENT` is set, it may take precedence over the role model name. - Gemini Vertex AI mode is controlled by process-level Google environment variables. In the same LightRAG process, some roles cannot use Vertex AI while others use AI Studio API keys. - In Docker/Compose, `LLM_BINDING_HOST` usually needs to use a container-reachable address such as `host.docker.internal`; role-level hosts follow the same principle. - Restart LightRAG Server after modifying `.env`. Some IDE terminals preload `.env`, so opening a new terminal session is recommended to confirm that environment variables take effect.