### All configurable environment variable must show up in this sample file in active or comment out status ### Setup tool `make env-*` uses this file to generate final .env file ### Target environment of this env file: host/compose (compose is for Dokcer or Kubernetes) # LIGHTRAG_RUNTIME_TARGET=host ########################### ### Server Configuration ########################### HOST=0.0.0.0 PORT=9621 WEBUI_TITLE='My Graph KB' WEBUI_DESCRIPTION='Simple and Fast Graph Based RAG System' # WORKERS=2 ### gunicorn worker timeout(as default LLM request timeout if LLM_TIMEOUT is not set) # TIMEOUT=150 # CORS_ORIGINS=http://localhost:3000,http://localhost:8080 ### Path Prefix Configuration (Optional) ### Used to host multiple LightRAG instances on one host behind a reverse ### proxy that routes by site prefix. Leave unset (or empty) for a ### single-instance deployment. ### ### - LIGHTRAG_API_PREFIX : reverse-proxy prefix the upstream proxy strips ### before forwarding (passed to FastAPI as root_path). ### ### See docs/MultiSiteDeployment.md for end-to-end examples. # LIGHTRAG_API_PREFIX=/site01 ### Optional SSL Configuration ### Docker note: generated compose files mount staged certs at /app/data/certs/ inside the container # SSL=true # SSL_CERTFILE=/path/to/cert.pem # SSL_KEYFILE=/path/to/key.pem ### Directory Configuration (defaults to current working directory) ### Default value is: ./inputs ./rag_storage # INPUT_DIR= # WORKING_DIR= ### Tiktoken cache directory (Store cached files in this folder for offline deployment) # TIKTOKEN_CACHE_DIR=/app/data/tiktoken ### Ollama Emulating Model and Tag # OLLAMA_EMULATING_MODEL_NAME=lightrag OLLAMA_EMULATING_MODEL_TAG=latest ### Max nodes for graph retrieval (Ensure WebUI local settings are also updated, which is limited to this value) # MAX_GRAPH_NODES=1000 ### Logging level # LOG_LEVEL=INFO # VERBOSE=False # LOG_MAX_BYTES=10485760 # LOG_BACKUP_COUNT=5 ### Logfile location (defaults to current working directory) # LOG_DIR=/path/to/log/directory # LIGHTRAG_PERFORMANCE_TIMING_LOGS=false ##################################### ### Login and API-Key Configuration ##################################### # AUTH_ACCOUNTS='admin:admin123,user1:{bcrypt}$2b$12$S8Yu.gCbuAbNTJFB.231gegTwr5pgrFxc8H9kXQ4/sduFBHkhM8Ka' # TOKEN_SECRET=lightrag-jwt-default-secret-key! # JWT_ALGORITHM=HS256 # TOKEN_EXPIRE_HOURS=48 # GUEST_TOKEN_EXPIRE_HOURS=24 ### Token Auto-Renewal Configuration (Sliding Window Expiration) ### Enable automatic token renewal to prevent active users from being logged out ### When enabled, tokens will be automatically renewed when remaining time < threshold # TOKEN_AUTO_RENEW=true ### Token renewal threshold (0.0 - 1.0) ### Renew token when remaining time < (total time * threshold) ### Default: 0.5 (renew when 50% time remaining) ### Examples: ### 0.5 = renew when 24h token has 12h left ### 0.25 = renew when 24h token has 6h left # TOKEN_RENEW_THRESHOLD=0.5 ### Note: Token renewal is automatically skipped for certain endpoints: ### - /health: Health check endpoint (no authentication required) ### - /documents/paginated: Frequently polled by client (5-30s interval) ### - /documents/pipeline_status: Very frequently polled by client (2s interval) ### - Rate limit: Minimum 60 seconds between renewals for same user ### API-Key to access LightRAG Server API ### Use this key in HTTP requests with the 'X-API-Key' header ### Example: curl -H "X-API-Key: your-secure-api-key-here" http://localhost:9621/query # LIGHTRAG_API_KEY=your-secure-api-key-here # WHITELIST_PATHS=/health,/api/* ###################################################################################### ### Query Configuration ### ### How to control the context length sent to LLM: ### MAX_ENTITY_TOKENS + MAX_RELATION_TOKENS < MAX_TOTAL_TOKENS ### Chunk_Tokens = MAX_TOTAL_TOKENS - Actual_Entity_Tokens - Actual_Relation_Tokens ###################################################################################### # LLM response cache for query (Not valid for streaming response) # ENABLE_LLM_CACHE=true # COSINE_THRESHOLD=0.2 ### Number of entities or relations retrieved from KG # TOP_K=40 ### Maximum number or chunks for naive vector search # CHUNK_TOP_K=20 ### control the actual entities send to LLM # MAX_ENTITY_TOKENS=6000 ### control the actual relations send to LLM # MAX_RELATION_TOKENS=8000 ### control the maximum tokens send to LLM (include entities, relations and chunks) # MAX_TOTAL_TOKENS=30000 ### chunk selection strategies ### VECTOR: Pick KG chunks by vector similarity, delivered chunks to the LLM aligning more closely with naive retrieval ### WEIGHT: Pick KG chunks by entity and chunk weight, delivered more solely KG related chunks to the LLM ### If reranking is enabled, the impact of chunk selection strategies will be diminished. # KG_CHUNK_PICK_METHOD=VECTOR ### maximum number of related chunks per source entity or relation ### The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph) ### Higher values increase re-ranking time # RELATED_CHUNK_NUMBER=5 ######################################################### ### Reranking configuration ### RERANK_BINDING type: null, cohere, jina, aliyun ### For rerank model deployed by vLLM use cohere binding ### If LightRAG deployed in Docker: ### uses host.docker.internal instead of localhost in RERANK_BINDING_HOST ######################################################### RERANK_BINDING=null # RERANK_MODEL=BAAI/bge-reranker-v2-m3 # RERANK_BINDING_HOST=http://localhost:8000/rerank # RERANK_BINDING_API_KEY=your_rerank_api_key_here ### rerank score chunk filter(set to 0.0 to keep all chunks, 0.6 or above if LLM is not strong enough) # MIN_RERANK_SCORE=0.0 ### Enable rerank by default in query params when RERANK_BINDING is not null # RERANK_BY_DEFAULT=True ### Rerank concurrency and timeout (independent from base LLM settings) ### MAX_ASYNC_RERANK falls back to MAX_ASYNC when unset. ### RERANK_TIMEOUT has its own default (30s) since reranker calls are ### typically much shorter than full LLM generation. # MAX_ASYNC_RERANK=4 # RERANK_TIMEOUT=30 ### Cohere AI # # RERANK_MODEL=rerank-v3.5 # # RERANK_BINDING_HOST=https://api.cohere.com/v2/rerank # # RERANK_BINDING_API_KEY=your_rerank_api_key_here ### Cohere rerank chunking configuration (useful for models with token limits like ColBERT) # RERANK_ENABLE_CHUNKING=true # RERANK_MAX_TOKENS_PER_DOC=480 ### Aliyun Dashscope # # RERANK_MODEL=gte-rerank-v2 # # RERANK_BINDING_HOST=https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank # # RERANK_BINDING_API_KEY=your_rerank_api_key_here ### Jina AI # # RERANK_MODEL=jina-reranker-v2-base-multilingual # # RERANK_BINDING_HOST=https://api.jina.ai/v1/rerank # # RERANK_BINDING_API_KEY=your_rerank_api_key_here ### For local deployment Embedding and Reranker with vLLM (OpenAI-compatible API) ### Wizard metadata used to preserve the chosen deployment provider across setup reruns # LIGHTRAG_SETUP_EMBEDDING_PROVIDER=vllm # LIGHTRAG_SETUP_RERANK_PROVIDER=vllm # VLLM_EMBED_MODEL=BAAI/bge-m3 # VLLM_EMBED_PORT=8001 # VLLM_EMBED_DEVICE=cpu ### VLLM_EMBED_API_KEY is passed as --api-key to vLLM; synced to EMBEDDING_BINDING_API_KEY; auto-generated if blank # VLLM_EMBED_API_KEY= # VLLM_EMBED_EXTRA_ARGS= # VLLM_RERANK_MODEL=BAAI/bge-reranker-v2-m3 # VLLM_RERANK_PORT=8000 # VLLM_RERANK_DEVICE=cuda ### VLLM_RERANK_API_KEY is passed as --api-key to vLLM; synced to RERANK_BINDING_API_KEY; auto-generated if blank # VLLM_RERANK_API_KEY= ### Use float16 for GPU mode. CPU mode uses the official vLLM CPU image. # VLLM_USE_CPU=1 ### Set to 1 for CPU mode, unset for GPU mode # CUDA_VISIBLE_DEVICES=-1 ### Set to -1 to disable CUDA (CPU mode), or specific GPU IDs for GPU mode # NVIDIA_VISIBLE_DEVICES=0 ### Optional Docker runtime equivalent; generated GPU compose honors either variable. # VLLM_RERANK_EXTRA_ARGS= ######################################## ### Document processing configuration ######################################## ### Document processing output language: English, Chinese, French, German ... SUMMARY_LANGUAGE=English ### Enable JSON-structured output for entity extraction ### Default behavior: JSON output is disabled when ENTITY_EXTRACTION_USE_JSON is unset ### JSON output incurs higher latency but delivers improved reliability ENTITY_EXTRACTION_USE_JSON=true ### Optional external YAML profile for entity type guidance and extraction examples ### Profiles are loaded from PROMPT_DIR/entity_type (PROMPT_DIR defaults to ./prompts). ### A reference template is shipped at prompts/samples/entity_type_prompt.sample.yml; # ENTITY_TYPE_PROMPT_FILE=entity_type_prompt.yml # PROMPT_DIR= ### Multimodal parsing/analyze integration ### Optional parser routing rules. Example for VLM & MinerU enabled configuration: ### LIGHTRAG_PARSER=*:native-iteP,*:mineru-iteP,*:legacy-R ### Rules may be separated with commas or semicolons. Rules match file suffixes ### (pdf), not full names (*.pdf), and are checked left-to-right. ### If mineru/docling appears in LIGHTRAG_PARSER, the corresponding endpoint ### below must be configured before server startup. ### See docs/FileProcessingPipeline.md for detail LIGHTRAG_PARSER=*:native-teP,*:legacy-R ### Async parser service protocol (optional) ### Configure these when using remote MinerU/Docling async services ### ---- MinerU shared parameters (both local and official modes) ---- ### MinerU API protocol. Choose one active mode. ### - official: MinerU precision API v4. Requires MINERU_API_TOKEN. ### - local: self-hosted mineru-api / mineru-router base URL. MINERU_API_MODE=local # MINERU_POLL_INTERVAL_SECONDS=2 # MINERU_MAX_POLLS=180 # MINERU_LANGUAGE=ch # MINERU_ENABLE_TABLE=true # MINERU_ENABLE_FORMULA=true # MINERU_PAGE_RANGES= ### MINERU_PAGE_RANGES semantics differ by mode: ### - official: forwarded verbatim, supports e.g. "1-3,5,7-9". ### - local: only a single page ("3") or simple range ("1-10"); comma ### lists are rejected at startup. ### When switching modes, double-check this constraint. ### ---- MinerU local-only (MINERU_API_MODE=local) ---- MINERU_LOCAL_ENDPOINT=http://127.0.0.1:8000 ### MINERU_LOCAL_BACKEND: which mineru-api backend handles the parse. ### Accepted values (per mineru-api POST /tasks form parameter `backend`): ### hybrid-auto-engine - pipeline + VLM combo with auto-selected local ### engine (mineru-api's default). GPU required. ### pipeline - CPU-friendly traditional pipeline; no VLM step. ### vlm-auto-engine - VLM with auto-selected local inference engine ### (sglang-engine / vllm-engine if GPU is available); ### requires the matching engine extra preinstalled ### on the mineru-api side, plus model weights. ### We ship `hybrid-auto-engine` -- requires the target mineru-api ### deployment to have a GPU plus the matching inference engine ### (sglang / vllm) and model weights installed. Switch to `pipeline` ### for CPU-only deployments without those dependencies. MINERU_LOCAL_BACKEND=hybrid-auto-engine ### MINERU_LOCAL_PARSE_METHOD: parsing strategy for the pipeline component. ### Accepted values: ### auto - auto-detect embedded text-layer vs OCR per page (default). ### txt - extract text from the embedded text layer only; fastest, ### but yields empty output on scanned PDFs without a text layer. ### ocr - force OCR on every page regardless of text-layer quality; ### slowest, reliable on scanned or low-quality PDFs. ### Only consumed when MINERU_LOCAL_BACKEND is `pipeline` or ### `hybrid-auto-engine` (the pipeline arm of the hybrid pipeline). ### Pure VLM backends (`vlm-auto-engine`, `vlm-http-client`) ignore this ### parameter -- the VLM model handles layout/OCR natively. MINERU_LOCAL_PARSE_METHOD=auto ### MINERU_LOCAL_IMAGE_ANALYSIS: enable VLM image/chart analysis pass for ### better caption an footnote recognition. ### Only consumed by `vlm-auto-engine`, `vlm-http-client`, ### `hybrid-auto-engine`, `hybrid-http-client`. The `pipeline` backend ### silently drops this flag -- its `_process_pipeline` does not accept ### the kwarg, so setting `false` under pipeline does NOT speed parsing ### up; pipeline never invokes the VLM image pass to begin with. ### Disable (`false`) on VLM / hybrid backends to skip the extra VLM ### round, trading image / chart semantic descriptions for faster parsing ### and lower GPU cost. MINERU_LOCAL_IMAGE_ANALYSIS=true # MINERU_LOCAL_START_PAGE_ID=0 # MINERU_LOCAL_END_PAGE_ID=99999 ### ---- MinerU official-only (MINERU_API_MODE=official) ---- # MINERU_API_TOKEN=your-api-key # MINERU_OFFICIAL_ENDPOINT=https://mineru.net # MINERU_MODEL_VERSION=vlm # MINERU_IS_OCR=false ### Force re-upload of file to MinerU on every retry after failure ### Disables caching of result outcomes # LIGHTRAG_FORCE_REPARSE_MINERU=false ### Docling parser (docling-serve v1 / async API). ### ### Endpoint: base URL only — the client appends /v1/convert/file/async, ### /v1/status/poll/{task_id}?wait=, ### /v1/result/{task_id} itself. ### Pipeline shape (pipeline=standard, target_type=zip, ### to_formats=[json,md], image_export_mode=referenced) is fixed in ### code so the sidecar flow stays self-consistent — flipping any of ### these would break the adapter and is therefore not exposed as env. ### ### OCR tunables: ### - DOCLING_DO_OCR: master switch; when false the engine relies only on ### text-layer extraction. ### - DOCLING_FORCE_OCR: when true, OCR every page regardless of text-layer ### quality (slower, useful for scanned PDFs with bad text layers). ### - DOCLING_OCR_ENGINE: explicit engine selection (DEPRECATED in the ### docling-serve OpenAPI but still honored for older deployments). ### - DOCLING_OCR_PRESET: recommended replacement for DOCLING_OCR_ENGINE. ### - DOCLING_OCR_LANG: JSON array (e.g. ["en","zh"]) or comma-separated ### list. Empty (default) lets the OCR engine pick its default. ### - DOCLING_DO_FORMULA_ENRICHMENT: when true, the code-formula model runs ### and `texts[*].label="formula"` items carry LaTeX in `text`. Default ### false because the model may not be present on every deployment; ### adapter falls back to plain-text formulas when disabled. ### ### Polling budget (server-side long-poll; client does NOT add extra sleep): ### - DOCLING_POLL_INTERVAL_SECONDS: ``?wait=N`` value sent to ### /v1/status/poll/{task_id}. Larger N = fewer round trips per parse; ### bound by your reverse-proxy idle timeout. Default 5. ### - DOCLING_MAX_POLLS: max polling rounds before raising TimeoutError. ### Worst-case wall-clock budget ≈ ### DOCLING_POLL_INTERVAL_SECONDS × DOCLING_MAX_POLLS. Default 240 ### (≈ 20 minutes at wait=5s); raise for very large PDFs. ### ### Bundle cache controls: ### - DOCLING_ENGINE_VERSION: recorded in .docling_raw/_manifest.json. ### Mismatch with the recorded value forces a cache miss → re-download. ### Leave empty to skip this check. ### - LIGHTRAG_FORCE_REPARSE_DOCLING: when truthy ("1"/"true"), bypass the ### docling raw cache and re-upload on every parse_docling call. ### - DOCLING_BBOX_ATTRIBUTES: override the doc-level bbox_attributes ### written into .blocks.jsonl meta. Default ### {"origin":"LEFTBOTTOM"} matches docling's default coordinate system. DOCLING_ENDPOINT=http://localhost:5001 DOCLING_DO_OCR=true DOCLING_FORCE_OCR=true DOCLING_DO_FORMULA_ENRICHMENT=false # DOCLING_OCR_ENGINE=auto # DOCLING_OCR_PRESET=auto # DOCLING_OCR_LANG= # DOCLING_POLL_INTERVAL_SECONDS=5 # DOCLING_MAX_POLLS=240 # DOCLING_BBOX_ATTRIBUTES={"origin":"LEFTBOTTOM"} ### Force re-upload of file to Docling on every retry after failure ### Disables caching of result outcomes # LIGHTRAG_FORCE_REPARSE_DOCLING=false ### File upload size limit (in bytes) ### Default: 104857600 (100MB) ### Set to 0 or None for unlimited upload size ### Examples: ### 52428800 = 50MB ### 104857600 = 100MB (default) ### 209715200 = 200MB ### Note: If using Nginx as reverse proxy, also configure client_max_body_size # MAX_UPLOAD_SIZE=104857600 ### Global chunk size, 500~1500 is recommended. ### Chunker inherits the global value here only when its own var is unset. ### Exception: P never inherits CHUNK_SIZE — it uses CHUNK_P_SIZE (default 2000). # CHUNK_SIZE=1200 # CHUNK_OVERLAP_SIZE=100 ### Fixed-token chunker (process_options=F, default) settings ### CHUNK_F_SIZE: per-strategy chunk_token_size override; falls back to CHUNK_SIZE when unset ### CHUNK_F_OVERLAP_SIZE: token overlap; falls back to CHUNK_OVERLAP_SIZE when unset ### CHUNK_F_SPLIT_BY_CHARACTER: optional separator string; pre-segment before token windowing ### CHUNK_F_SPLIT_BY_CHARACTER_ONLY: when true, raise on oversize segment instead of token re-split # CHUNK_F_SIZE=1200 # CHUNK_F_OVERLAP_SIZE=100 # CHUNK_F_SPLIT_BY_CHARACTER= # CHUNK_F_SPLIT_BY_CHARACTER_ONLY=false ### Recursive character chunker (process_options=R) settings ### CHUNK_R_SIZE: per-strategy chunk_token_size override; falls back to CHUNK_SIZE when unset ### CHUNK_R_OVERLAP_SIZE: token overlap between adjacent chunks; falls back to CHUNK_OVERLAP_SIZE when unset ### CHUNK_R_SEPARATORS: JSON array of cascaded separators tried by RecursiveCharacterTextSplitter. ### Default includes CJK sentence-ending punctuation so Chinese / mixed-language ### documents split at semantic boundaries. Order: paragraph (\n\n) > line (\n) > ### Chinese sentence-end (。!?) > Chinese semi-clause (;,) > space > char. ### English ".?!" are intentionally omitted (literal match would split "0.95" / ### "e.g."); the English path falls through space / char as before. # CHUNK_R_SIZE=1200 # CHUNK_R_OVERLAP_SIZE=100 # CHUNK_R_SEPARATORS=["\n\n","\n","。","!","?",";",","," ",""] ### Semantic vector chunker (process_options=V) settings ### CHUNK_V_SIZE: per-strategy chunk_token_size hard cap (oversized pieces are ### re-split via R before being emitted); falls back to CHUNK_SIZE when unset ### CHUNK_V_BREAKPOINT_THRESHOLD_TYPE: percentile | standard_deviation | interquartile | gradient ### CHUNK_V_BREAKPOINT_THRESHOLD_AMOUNT: leave empty to use the LangChain per-type default (e.g. 95 for percentile) ### CHUNK_V_BUFFER_SIZE: number of adjacent sentences combined when computing distances ### CHUNK_V_SENTENCE_SPLIT_REGEX: regex fed to LangChain SemanticChunker for the ### initial sentence split. Default extends the upstream English-only pattern ### with CJK sentence-end punctuation (。?!). Override if you need a ### different language mix. Note: env value is the raw regex string, no JSON ### quoting. # CHUNK_V_SIZE=1200 # CHUNK_V_BREAKPOINT_THRESHOLD_TYPE=percentile # CHUNK_V_BREAKPOINT_THRESHOLD_AMOUNT= # CHUNK_V_BUFFER_SIZE=1 # CHUNK_V_SENTENCE_SPLIT_REGEX=(?<=[.?!])\s+|(?<=[。?!]) ### Paragraph semantic chunker (process_options=P) settings ### CHUNK_P_SIZE: per-strategy chunk_token_size override; defaults to 2000 when unset ### (does NOT fall back to CHUNK_SIZE — paragraph-semantic merging needs more ### headroom than the global default to keep related paragraphs together). ### CHUNK_P_OVERLAP_SIZE: overlap for prose fallback and table-bridge context; ### falls back to CHUNK_OVERLAP_SIZE when unset # CHUNK_P_SIZE=2000 # CHUNK_P_OVERLAP_SIZE=100 ### Number of summary segments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommended) # FORCE_LLM_SUMMARY_ON_MERGE=8 ### Max description token size to trigger LLM summary # SUMMARY_MAX_TOKENS = 1200 ### Recommended LLM summary output length in tokens # SUMMARY_LENGTH_RECOMMENDED=600 ### Maximum context size sent to LLM for description summary # SUMMARY_CONTEXT_SIZE=12000 ### Maximum token size allowed for entity extraction input context # MAX_EXTRACT_INPUT_TOKENS=20480 ### Multimodal surrounding-context budget (per-half token cap for the ### `leading` / `trailing` text injected into VLM and extract prompts). ### Computed at analyze_multimodal entry; the two halves are independent ### so deployments can bias context forward or backward as needed. # SURROUNDING_LEADING_MAX_TOKENS=2000 # SURROUNDING_TRAILING_MAX_TOKENS=2000 ### Per-response cap on total entity+relationship rows/records emitted by the LLM # MAX_EXTRACTION_RECORDS=100 ### Per-response cap on entity rows/objects emitted by the LLM # MAX_EXTRACTION_ENTITIES=40 ### control the maximum chunk_ids stored in vector and graph db # MAX_SOURCE_IDS_PER_ENTITY=300 # MAX_SOURCE_IDS_PER_RELATION=300 ### control chunk_ids limitation method: FIFO, KEEP ### FIFO: First in first out ### KEEP: Keep oldest (less merge action and faster) # SOURCE_IDS_LIMIT_METHOD=FIFO ### Maximum number of file paths stored in entity/relation file_path field ### For displayed only, does not affect query performance # MAX_FILE_PATHS=100 ### PDF decryption password for protected PDF files # PDF_DECRYPT_PASSWORD=your_pdf_password_here ### LLM cache for entity/relation extract is enable by default ### Disabling it will prevent graph reconstruction after document deletion # ENABLE_LLM_CACHE_FOR_EXTRACT=true ######################################## ### Pipeline Concurrency Configuration ######################################## ### Number of parallel processing documents(between 2~10, MAX_ASYNC/3 is recommended) MAX_PARALLEL_INSERT=2 ### Optional per-stage document pipeline concurrency # MAX_PARALLEL_PARSE_NATIVE=5 # MAX_PARALLEL_PARSE_MINERU=1 # MAX_PARALLEL_PARSE_DOCLING=1 # MAX_PARALLEL_ANALYZE=5 ### Optional queue sizes for staged pipeline workers # QUEUE_SIZE_DEFAULT=100 # QUEUE_SIZE_INSERT=4 ### Max concurrency requests for Embedding # EMBEDDING_FUNC_MAX_ASYNC=8 ### Num of chunks send to Embedding in single request (default is 10) EMBEDDING_BATCH_NUM=32 ########################################################################### ### Gloabal LLM Configuration ### LLM_BINDING type: openai, ollama, lollms, azure_openai, bedrock, gemini ### LLM_BINDING_HOST: Service endpoint (left empty if using the provider SDK default endpoint) ### LLM_BINDING_API_KEY: api key ### If LightRAG deployed in Docker: ### uses host.docker.internal instead of localhost in LLM_BINDING_HOST ########################################################################### ### LLM request timeout setting for all llm (0 means no timeout for Ollma) # LLM_TIMEOUT=180 LLM_BINDING=openai LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_api_key LLM_MODEL=gpt-5.4-mini ### Max concurrency requests of LLM MAX_ASYNC=4 ########################################################################### ### Role-specific LLM/VLM overrides ### Available roles: EXTRACT, KEYWORD, QUERY, VLM ### If unset, each role falls back to gloabal LLM configuration above. ### For detail information, refer to: ### docs/RoleSpecificLLMConfiguration.md ### docs/RoleSpecificLLMConfiguration-zh.md ########################################################################### # KEYWORD_LLM_MODEL=gpt-5.4-nano # KEYWORD_MAX_ASYNC_LLM # KEYWORD_LLM_TIMEOUT=180 # KEYWORD_LLM_BINDING=openai # KEYWORD_LLM_BINDING_HOST=https://api.openai.com/v1 # KEYWORD_LLM_BINDING_API_KEY=your_api_key # QUERY_LLM_MODEL=gpt-5.4 # QUERY_MAX_ASYNC_LLM # QUERY_LLM_TIMEOUT=180 # QUERY_LLM_BINDING=openai # QUERY_LLM_BINDING_HOST=https://api.openai.com/v1 # QUERY_LLM_BINDING_API_KEY=your_api_key # VLM_LLM_MODEL=gpt-5.4-mini # VLM_MAX_ASYNC_LLM=4 # VLM_LLM_TIMEOUT=180 # VLM_LLM_BINDING=openai # VLM_LLM_BINDING_HOST=https://api.example.com/v1 # VLM_LLM_BINDING_API_KEY=your_vlm_api_key ### Master switch for VLM multimodal analysis (i/t/e items). ### When false, multimodal item is skipped regardless of document process_options ### When true, VLM_LLM_BINDING (or the base LLM_BINDING) must be vision-capable ### lollms is rejected at startup VLM_PROCESS_ENABLE=false ### Maximum image bytes sent to VLM (5242880=5MB) VLM_MAX_IMAGE_BYTES=5242880 ########################################################################### ### Provider sepecific LLM options ### Increasing the temperature setting may help mitigate infinite inference ### loops during entity/elation extraction, particularly when using ### models with more limited capabilities, such as Qwen3-30B ### Set a max output token limit to prevent endless output from certain LLMs, ### which may trigger timeout errors during entity and relation extraction. ### max_output_token < LLM_TIMEOUT * llm_tokens_per_second ### i.e. max_output_token = 9000 = 180s * 50 tokens/s ### Sample commands to list all supported options specific LLM_BINDING: ### lightrag-server --llm-binding openai --help ### lightrag-server --llm-binding bedrock --help ### lightrag-server --llm-binding gemini --help ########################################################################### ### OpenAI Specific Parameters (Openrouter of other OpenAI compatible API): ### LLM_BINDING=openai ### LLM_BINDING_HOST=https://openrouter.ai/api/v1 ### LLM_MODEL=google/gemini-2.5-flash # OPENAI_LLM_TEMPERATURE=0.9 ### For vLLM/SGLang and most of OpenAI compatible API provider # OPENAI_LLM_MAX_TOKENS=9000 ### For OpenAI o1-mini or newer modles utilizes max_completion_tokens instead of max_tokens # OPENAI_LLM_MAX_COMPLETION_TOKENS=9000 ### For OpenAI reason control # OPENAI_LLM_REASONING_EFFORT=minimal ### For OpenRouter reasoning control # OPENAI_LLM_EXTRA_BODY='{"reasoning": {"enabled": false}}' ### For Qwen3 reasoning control deploy by vLLM # OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}' ### Azure OpenAI Specific Parameters: ### LLM_BINDING=azure_openai ### LLM_BINDING_HOST=https://xxxx.openai.azure.com/ ### LLM_BINDING_API_KEY=your_api_key ### LLM_MODEL=my-gpt-mini-deployment ### You may use deployment name for LLM_MODEL or set AZURE_OPENAI_DEPLOYMENT instead # AZURE_OPENAI_DEPLOYMEN=my—deplyment-name # AZURE_OPENAI_API_VERSION=2024-08-01-preview ### Google AI Studio Gemini Specific Parameters: ### DEFAULT_GEMINI_ENDPOINT means selecting endpoit by SDK automatically ### LLM_BINDING=gemini ### LLM_BINDING_HOST=DEFAULT_GEMINI_ENDPOINT ### LLM_BINDING_API_KEY=your_gemini_api_key ### LLM_MODEL=gemini-flash-latest # GEMINI_LLM_TEMPERATURE=0.7 # GEMINI_LLM_MAX_OUTPUT_TOKENS=9000 ### Enable or disable thinking ### GEMINI_LLM_THINKING_CONFIG='{"thinking_budget": -1, "include_thoughts": true}' ### GEMINI_LLM_THINKING_CONFIG='{"thinking_budget": 0, "include_thoughts": false}' # GEMINI_LLM_THINKING_CONFIG='{"thinking_budget": 0, "include_thoughts": false}' ### Google Vertex AI Gemini Specific Parameters: ### Vertex AI use GOOGLE_APPLICATION_CREDENTIALS instead of API-KEY for authentication # GOOGLE_GENAI_USE_VERTEXAI=true # GOOGLE_CLOUD_PROJECT='your-project-id' # GOOGLE_CLOUD_LOCATION='us-central1' # GOOGLE_APPLICATION_CREDENTIALS='/Users/xxxxx/your-service-account-credentials-file.json' ### Bedrock Specific Parameters: ### LLM_BINDING=bedrock ### LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT ### LLM_MODEL=us.amazon.nova-lite-v1:0 ### Region is required for all three modes (Bedrock endpoints are regional). # AWS_REGION=us-west-1 ### Bedrock Authentication (choose ONE of the following three approaches): ### Bedrock API key (bearer token). Bedrock ignores LLM_BINDING_API_KEY; ### set AWS_BEARER_TOKEN_BEDROCK directly before startup. This is a ### process-level AWS SDK setting and cannot be overridden per role. # AWS_BEARER_TOKEN_BEDROCK=your_bedrock_api_key ### SigV4 credentials (classic IAM user / STS / instance profile). # AWS_ACCESS_KEY_ID=your_aws_access_key_id # AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key # AWS_SESSION_TOKEN=your_optional_aws_session_token ### Ambient credentials (AWS SDK default credential chain). ### To use this mode, leave AWS_BEARER_TOKEN_BEDROCK, AWS_ACCESS_KEY_ID, ### AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN above commented out — the ### AWS SDK will then resolve credentials from ~/.aws/credentials, IAM role, ### instance profile, SSO, or environment variables outside .env. ### Activating any of the lines above forces that explicit mode and bypasses ### the credential chain. # BEDROCK_LLM_TEMPERATURE=1.0 # BEDROCK_LLM_MAX_TOKENS=9000 # BEDROCK_LLM_TOP_P=1.0 # BEDROCK_LLM_STOP_SEQUENCES='[""]' ### Bedrock model reasoning control # BEDROCK_LLM_EXTRA_FIELDS='{"reasoningConfig": {"type": "enabled", "maxReasoningEffort": "low"}}' ### Ollama Specific Parameters: ### LLM_BINDING=ollama ### LLM_BINDING_HOST=http://localhost:11434 ### LLM_MODEL=qwen3.5:9b ### OLLAMA_LLM_NUM_CTX must be provided, and should at least larger than MAX_TOTAL_TOKENS + 2000 OLLAMA_LLM_NUM_CTX=32768 # OLLAMA_LLM_NUM_PREDICT=9000 # OLLAMA_LLM_TEMPERATURE=0.85 # OLLAMA_LLM_STOP='["", "<|EOT|>"]' ####################################################################################### ### Embedding Configuration (Should not be changed after the first file processed) ### EMBEDDING_BINDING: ollama, openai, azure_openai, jina, lollms, bedrock ### EMBEDDING_BINDING_HOST: Service endpoint (left empty if using default endpoint provided by openai or gemini SDK) ### EMBEDDING_BINDING_API_KEY: api key ### If LightRAG deployed in Docker: ### uses host.docker.internal instead of localhost in EMBEDDING_BINDING_HOST ### Control whether to send embedding_dim parameter to embedding API ### For OpenAI: Set EMBEDDING_SEND_DIM=true to enable dynamic dimension adjustment ### For OpenAI: Set EMBEDDING_SEND_DIM=false (default) to disable sending dimension parameter ### For Gemini: Allways set EMBEDDING_SEND_DIM=true ### Control whether to use base64 encoding format for embeddings (improves performance for OpenAI) ### For OpenAI: Set EMBEDDING_USE_BASE64=true (default) to use base64 encoding ### For Yandex Cloud and other providers that don't support it: Set EMBEDDING_USE_BASE64=false ####################################################################################### # EMBEDDING_TIMEOUT=30 ### OpenAI compatible embedding EMBEDDING_BINDING=openai EMBEDDING_BINDING_HOST=https://api.openai.com/v1 EMBEDDING_BINDING_API_KEY=your_api_key EMBEDDING_MODEL=text-embedding-3-large EMBEDDING_DIM=3072 EMBEDDING_TOKEN_LIMIT=8192 EMBEDDING_SEND_DIM=false EMBEDDING_USE_BASE64=true ### Optional: asymmetric embeddings (query/document behavior split) ### Leave EMBEDDING_ASYMMETRIC unset or set false to keep symmetric behavior. ### Set true only when the selected embedding backend supports asymmetric mode. # EMBEDDING_ASYMMETRIC=true ### Provider-task bindings such as Jina/Gemini/VoyageAI use provider parameters ### and should not configure the prefix variables below. ### Prefix-based models such as BGE/E5/GTE require both prefix variables. ### Wrap non-empty values with quotes if there are trailing spaces. # EMBEDDING_DOCUMENT_PREFIX="search_document: " ### Use NO_PREFIX for a side that should intentionally have no prefix. ### EMBEDDING_DOCUMENT_PREFIX=NO_PREFIX # EMBEDDING_QUERY_PREFIX="search_query: " ########################################################################### ### Provider sepecific Embedding options ### Increasing the temperature setting may help mitigate infinite inference ### loops during entity/elation extraction, particularly when using ### models with more limited capabilities, such as Qwen3-30B ### Set a max output token limit to prevent endless output from certain LLMs, ### which may trigger timeout errors during entity and relation extraction. ### max_output_token < LLM_TIMEOUT * llm_tokens_per_second ### i.e. max_output_token = 9000 = 180s * 50 tokens/s ### Sample commands to list all supported options specific EMBEDDING_BINDING: ### lightrag-server --embedding-binding openai --help ### lightrag-server --embedding-binding ollama --help ### lightrag-server --embedding-binding bedrock --help ########################################################################### ### Azure Embedding Specific Parameters: ### Use deployment name as model name or set AZURE_EMBEDDING_DEPLOYMENT instead ### EMBEDDING_BINDING=azure_openai ### EMBEDDING_BINDING_HOST=https://xxxx.openai.azure.com/ ### EMBEDDING_API_KEY=your_api_key ### EMBEDDING_MODEL==my-text-embedding-3-large-deployment ### EMBEDDING_DIM=3072 # AZURE_EMBEDDING_API_VERSION=2024-08-01-preview ### Ollama Embedding Specific Parameters: ### EMBEDDING_BINDING=ollama ### EMBEDDING_BINDING_HOST=http://localhost:11434 ### EMBEDDING_BINDING_API_KEY=your_api_key ### EMBEDDING_MODEL=qwen3-embedding:4b ### EMBEDDING_DIM=2560 ### Ollama should set num_ctx option inaddition to EMBEDDING_TOKEN_LIMIT OLLAMA_EMBEDDING_NUM_CTX=8192 ### Gemini Embedding Specific Parameters: ### DEFAULT_GEMINI_ENDPOINT means selecting endpoit by SDK automatically ### Gemini embedding requires sending dimension to server ### EMBEDDING_BINDING=gemini ### EMBEDDING_BINDING_HOST=DEFAULT_GEMINI_ENDPOINT ### EMBEDDING_BINDING_API_KEY=your_api_key ### EMBEDDING_MODEL=gemini-embedding-001 ### EMBEDDING_DIM=1536 ### EMBEDDING_TOKEN_LIMIT=2048 ### EMBEDDING_SEND_DIM=true ### Bedrock Embedding Specific Parameters: ### EMBEDDING_BINDING=bedrock ### EMBEDDING_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT ### EMBEDDING_MODEL=amazon.titan-embed-text-v2:0 ### EMBEDDING_DIM=1024 ### Share the same region and authentication settings as LLMs, no reconfiguration here ### AWS_REGION=us-west-1 ### AWS_BEARER_TOKEN_BEDROCK=your_bedrock_api_key ### AWS_ACCESS_KEY_ID=your_aws_access_key_id ### AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key ### AWS_SESSION_TOKEN=your_optional_aws_session_token ### Jina AI Embedding Specific Parameters: ### EMBEDDING_BINDING=jina ### EMBEDDING_BINDING_HOST=https://api.jina.ai/v1/embeddings ### EMBEDDING_MODEL=jina-embeddings-v4 ### EMBEDDING_DIM=2048 ### EMBEDDING_BINDING_API_KEY=your_api_key #################################################################### ### WORKSPACE sets workspace name for all storage types ### for the purpose of isolating data from LightRAG instances. ### Valid workspace name constraints: a-z, A-Z, 0-9, and _ #################################################################### # WORKSPACE= ############################ ### Data storage selection ############################ ### Default storage: JSON/Nano/NetworkX (Recommended for test deployment) LIGHTRAG_KV_STORAGE=JsonKVStorage LIGHTRAG_DOC_STATUS_STORAGE=JsonDocStatusStorage LIGHTRAG_GRAPH_STORAGE=NetworkXStorage LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage ### Wizard metadata used to preserve env-storage Docker deployment defaults across setup reruns # LIGHTRAG_SETUP_POSTGRES_DEPLOYMENT=docker # LIGHTRAG_SETUP_NEO4J_DEPLOYMENT=docker # LIGHTRAG_SETUP_MONGODB_DEPLOYMENT=docker # LIGHTRAG_SETUP_MONGODB_DEPLOYMENT=atlas-capable # LIGHTRAG_SETUP_REDIS_DEPLOYMENT=docker # LIGHTRAG_SETUP_MILVUS_DEPLOYMENT=docker # LIGHTRAG_SETUP_QDRANT_DEPLOYMENT=docker # LIGHTRAG_SETUP_MEMGRAPH_DEPLOYMENT=docker # LIGHTRAG_SETUP_OPENSEARCH_DEPLOYMENT=docker ### PostgreSQL Configuration POSTGRES_HOST=localhost POSTGRES_PORT=5432 POSTGRES_USER=your_username POSTGRES_PASSWORD='your_password' POSTGRES_DATABASE=rag POSTGRES_MAX_CONNECTIONS=25 ### DB specific workspace should not be set, keep for compatible only # POSTGRES_WORKSPACE=forced_workspace_name ### Use HNSW_HALFVEC for large embeddings (2000+ dim). ### Requires pgvector extension >= 0.7.0. ### Vector storage type: HNSW, HNSW_HALFVEC, IVFFlat, VCHORDRQ POSTGRES_VECTOR_INDEX_TYPE=HNSW POSTGRES_HNSW_M=16 POSTGRES_HNSW_EF=200 POSTGRES_IVFFLAT_LISTS=100 POSTGRES_VCHORDRQ_BUILD_OPTIONS= POSTGRES_VCHORDRQ_PROBES= POSTGRES_VCHORDRQ_EPSILON=1.9 ### PostgreSQL Connection Retry Configuration (Network Robustness) ### NEW DEFAULTS (v1.4.10+): Optimized for HA deployments with ~30s switchover time ### These defaults provide out-of-the-box support for PostgreSQL High Availability setups ### ### Number of retry attempts (1-100, default: 10) ### - Default 10 attempts allows ~225s total retry time (sufficient for most HA scenarios) ### - For extreme cases: increase up to 20-50 ### Initial retry backoff in seconds (0.1-300.0, default: 3.0) ### - Default 3.0s provides reasonable initial delay for switchover detection ### - For faster recovery: decrease to 1.0-2.0 ### Maximum retry backoff in seconds (must be >= backoff, max: 600.0, default: 30.0) ### - Default 30.0s matches typical switchover completion time ### - For longer switchovers: increase to 60-90 ### Connection pool close timeout in seconds (1.0-30.0, default: 5.0) # POSTGRES_CONNECTION_RETRIES=10 # POSTGRES_CONNECTION_RETRY_BACKOFF=3.0 # POSTGRES_CONNECTION_RETRY_BACKOFF_MAX=30.0 # POSTGRES_POOL_CLOSE_TIMEOUT=5.0 ### PostgreSQL SSL Configuration (Optional) # POSTGRES_SSL_MODE=require # POSTGRES_SSL_CERT=/path/to/client-cert.pem # POSTGRES_SSL_KEY=/path/to/client-key.pem # POSTGRES_SSL_ROOT_CERT=/path/to/ca-cert.pem # POSTGRES_SSL_CRL=/path/to/crl.pem ### PostgreSQL Server Settings (for Supabase Supavisor) # Use this to pass extra options to the PostgreSQL connection string. # For Supabase, you might need to set it like this: # POSTGRES_SERVER_SETTINGS='options=reference%3D[project-ref]' # Default is 100 set to 0 to disable # POSTGRES_STATEMENT_CACHE_SIZE=100 ### Neo4j Configuration NEO4J_URI=neo4j+s://xxxxxxxx.databases.neo4j.io NEO4J_USERNAME=neo4j NEO4J_PASSWORD='your_password' NEO4J_DATABASE=neo4j NEO4J_MAX_CONNECTION_POOL_SIZE=100 NEO4J_CONNECTION_TIMEOUT=30 NEO4J_CONNECTION_ACQUISITION_TIMEOUT=30 NEO4J_MAX_TRANSACTION_RETRY_TIME=30 NEO4J_MAX_CONNECTION_LIFETIME=300 NEO4J_LIVENESS_CHECK_TIMEOUT=30 NEO4J_KEEP_ALIVE=true ### DB specific workspace should not be set, keep for compatible only # NEO4J_WORKSPACE=forced_workspace_name ### MongoDB Configuration # For MongoVectorDBStorage, MONGO_URI must point to a MongoDB endpoint with # Atlas Search / Vector Search support, such as MongoDB Atlas or Atlas local. MONGO_URI=mongodb://localhost:27017/ MONGO_DATABASE=LightRAG ### DB specific workspace should not be set, keep for compatible only # MONGODB_WORKSPACE=forced_workspace_name # Community/local Docker MongoDB example for KV, graph, or doc-status storage only: # MONGO_URI=mongodb://localhost:27017/ ### OpenSearch Configuration ### OpenSearch can be used for all storage types: KV, Vector, Graph, DocStatus ### Connection settings (comma-separated host:port entries; do not include http:// or https://) ### This setup wizard supports authenticated OpenSearch clusters only. ### OPENSEARCH_USE_SSL controls whether those hosts are reached over TLS. OPENSEARCH_HOSTS=localhost:9200 OPENSEARCH_USER=admin OPENSEARCH_PASSWORD=LightRAG2026_!@ OPENSEARCH_USE_SSL=true OPENSEARCH_VERIFY_CERTS=false # OPENSEARCH_TIMEOUT=30 # OPENSEARCH_MAX_RETRIES=3 ### Index Settings (for 3-AZ Amazon OpenSearch Service, set replicas to 2) # OPENSEARCH_NUMBER_OF_SHARDS=1 # OPENSEARCH_NUMBER_OF_REPLICAS=0 ### k-NN Settings for Vector Storage (HNSW algorithm) # OPENSEARCH_KNN_EF_CONSTRUCTION=200 # OPENSEARCH_KNN_M=16 # OPENSEARCH_KNN_EF_SEARCH=100 ### PPL graphlookup for server-side graph traversal (auto-detected if not set) # OPENSEARCH_USE_PPL_GRAPHLOOKUP=true ### DB specific workspace should not be set, keep for compatible only # OPENSEARCH_WORKSPACE=forced_workspace_name ### Milvus Configuration MILVUS_URI=http://localhost:19530 MILVUS_DB_NAME=lightrag # MILVUS_DEVICE=cpu # MILVUS_USER=root # MILVUS_PASSWORD=your_password # MILVUS_TOKEN=your_token # Required for the bundled Docker Milvus stack; may come from .env or exported shell variables. # MINIO_ACCESS_KEY_ID=minioadmin # MINIO_SECRET_ACCESS_KEY=minioadmin ### DB specific workspace should not be set, keep for compatible only # MILVUS_WORKSPACE=forced_workspace_name ### Milvus Vector Index Configuration ### Index type: AUTOINDEX (default), HNSW, HNSW_SQ, HNSW_PQ, IVF_FLAT, IVF_SQ8, DISKANN # MILVUS_INDEX_TYPE=AUTOINDEX ### Metric type: COSINE (default), L2, IP # MILVUS_METRIC_TYPE=COSINE ### HNSW / HNSW_SQ / HNSW_PQ Parameters (aligned with Milvus 2.4+ defaults) ### M: Maximum number of connections per node [2-2048], default 16 # MILVUS_HNSW_M=16 ### efConstruction: Size of dynamic candidate list during build [8-512], default 360 # MILVUS_HNSW_EF_CONSTRUCTION=360 ### ef: Size of dynamic candidate list during search, default 200 # MILVUS_HNSW_EF=200 ### HNSW_SQ Specific Parameters (requires Milvus 2.6.8+) ### sq_type: Scalar quantization type - SQ4U, SQ6, SQ8 (default), BF16, FP16 # MILVUS_HNSW_SQ_TYPE=SQ8 ### refine: Enable refinement step for higher precision, default false # MILVUS_HNSW_SQ_REFINE=false ### refine_type: Refinement precision (must be higher than sq_type) - SQ6, SQ8, BF16, FP16, FP32 # MILVUS_HNSW_SQ_REFINE_TYPE=FP32 ### refine_k: Refinement expansion factor, default 10 # MILVUS_HNSW_SQ_REFINE_K=10 ### IVF_FLAT / IVF_SQ8 Parameters ### nlist: Number of cluster units [1-65536], recommended sqrt(n) for n>1M, default 1024 # MILVUS_IVF_NLIST=1024 ### nprobe: Number of units to query [1-nlist], default 16 # MILVUS_IVF_NPROBE=16 ### Qdrant QDRANT_URL=http://localhost:6333 # QDRANT_DEVICE=cpu # QDRANT_API_KEY=your-api-key ### Qdrant upsert batching (enabled by default) ### Split large upserts by estimated JSON payload size and point count ### Default 16MB keeps safe headroom below common 32MB gateway/request limits # QDRANT_UPSERT_MAX_PAYLOAD_BYTES=16777216 # QDRANT_UPSERT_MAX_POINTS_PER_BATCH=128 ### DB specific workspace should not be set, keep for compatible only # QDRANT_WORKSPACE=forced_workspace_name ### Redis REDIS_URI=redis://localhost:6379 REDIS_SOCKET_TIMEOUT=30 REDIS_CONNECT_TIMEOUT=10 REDIS_MAX_CONNECTIONS=100 REDIS_RETRY_ATTEMPTS=3 ### DB specific workspace should not be set, keep for compatible only # REDIS_WORKSPACE=forced_workspace_name ### Memgraph Configuration MEMGRAPH_URI=bolt://localhost:7687 MEMGRAPH_USERNAME= MEMGRAPH_PASSWORD= MEMGRAPH_DATABASE=memgraph ### DB specific workspace should not be set, keep for compatible only # MEMGRAPH_WORKSPACE=forced_workspace_name ########################################################### ### Langfuse Observability Configuration ### Only works with LLM provided by OpenAI compatible API ### Install with: pip install lightrag-hku[observability] ### Sign up at: https://cloud.langfuse.com or self-host ########################################################### # LANGFUSE_SECRET_KEY='' # LANGFUSE_PUBLIC_KEY='' # LANGFUSE_HOST='https://cloud.langfuse.com' # LANGFUSE_ENABLE_TRACE=true ############################ ### Evaluation Configuration ############################ ### RAGAS evaluation models (used for RAG quality assessment) ### ⚠️ IMPORTANT: Both LLM and Embedding endpoints MUST be OpenAI-compatible ### Default uses OpenAI models for evaluation ### LLM Configuration for Evaluation # EVAL_LLM_MODEL=gpt-4o-mini ### API key for LLM evaluation (fallback to OPENAI_API_KEY if not set) # EVAL_LLM_BINDING_API_KEY=your_api_key ### Custom OpenAI-compatible endpoint for LLM evaluation (optional) # EVAL_LLM_BINDING_HOST=https://api.openai.com/v1 ### Embedding Configuration for Evaluation # EVAL_EMBEDDING_MODEL=text-embedding-3-large ### API key for embeddings (fallback: EVAL_LLM_BINDING_API_KEY -> OPENAI_API_KEY) # EVAL_EMBEDDING_BINDING_API_KEY=your_embedding_api_key ### Custom OpenAI-compatible endpoint for embeddings (fallback: EVAL_LLM_BINDING_HOST) # EVAL_EMBEDDING_BINDING_HOST=https://api.openai.com/v1 ### Performance Tuning ### Number of concurrent test case evaluations ### Lower values reduce API rate limit issues but increase evaluation time # EVAL_MAX_CONCURRENT=2 ### TOP_K query parameter of LightRAG (default: 10) ### Number of entities or relations retrieved from KG # EVAL_QUERY_TOP_K=10 ### LLM request retry and timeout settings for evaluation # EVAL_LLM_MAX_RETRIES=5 # EVAL_LLM_TIMEOUT=180 ########################################################################## ### ----- Preserved custom environment variables from previous .env ----- ### ----- Comments in this session will persist across regenerations ----- ### (This must be the final session; ensure the preceding lines unchanged) ########################################################################## ### The "make env*" wizard will leave the following lines unchanged ### You may add additional env vars or commnets here for your own purpose ########################################################################## ### AWS Bedrock # LLM_BINDING=bedrock # LLM_BINDING_HOST=DEFAULT_BEDROCK_ENDPOINT # LLM_MODEL=us.anthropic.claude-haiku-4-5-20251001-v1:0 ### ----- Extra setting from previous .env -----