Skip to main content

Configuration Guide

Environment Configuration

WaterCrawl uses a single .env file in the docker directory for all configuration settings. This file contains all the necessary environment variables for the application, database, frontend, and other services.

⚠️ Security Warning: For security in production environments, always change default values for passwords, secrets, and API keys. Using default values creates significant security vulnerabilities.

Core Settings

General Settings

VariableDefault ValueDescription
VERSIONv0.7.1Application version. Always check GitHub Releases for the latest version
NGINX_PORT80Port for the Nginx service

Django Settings

VariableDefault ValueDescription
SECRET_KEYdjango-insecure-el4wo4a4--=f0+ag#omp@^w4eq^8v4(scda&1a(td_y2@=sh6&Secret key for Django application. Generate a new one using openssl rand -base64 32. MUST be changed in production!
DEBUGTrueDebug mode (set to False in production)
ALLOWED_HOSTS*List of allowed hosts (comma-separated)
LANGUAGE_CODEen-usDefault language code
TIME_ZONEUTCServer time zone
USE_I18NTrueEnable internationalization
USE_TZTrueEnable timezone support
STATIC_ROOTstorage/static/Static files directory
MEDIA_ROOTstorage/media/Media files directory
LOG_LEVELINFOApplication log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
FRONTEND_URLhttp://localhostURL for the frontend application (used for CORS and redirects)

Database Configuration

VariableDefault ValueDescription
POSTGRES_HOSTdbPostgreSQL host
POSTGRES_PORT5432PostgreSQL port
POSTGRES_PASSWORDpostgresPostgreSQL password. MUST be changed in production!
POSTGRES_USERpostgresPostgreSQL username
POSTGRES_DBpostgresPostgreSQL database name

Redis and Celery Configuration

VariableDefault ValueDescription
REDIS_URLredis://redis:6379/1Redis URL for Django cache
CELERY_BROKER_URLredis://redis:6379/0Redis URL for Celery broker
CELERY_RESULT_BACKENDdjango-dbBackend for storing Celery results
REDIS_LOCKER_URLredis://redis:6379/3Redis URL for Django locks

MinIO Configuration

VariableDefault ValueDescription
MINIO_ENDPOINTminio:9000MinIO server internal endpoint
MINIO_EXTERNAL_ENDPOINTlocalhostPublic MinIO endpoint (domain or IP, no port)
MINIO_REGIONus-east-1MinIO region (optional)
MINIO_ACCESS_KEYminioMinIO access key (username). MUST be changed in production!
MINIO_SECRET_KEYminio123MinIO secret key (password). MUST be changed in production!
MINIO_USE_HTTPSFalseUse HTTPS for internal endpoint
MINIO_EXTERNAL_ENDPOINT_USE_HTTPSFalseUse HTTPS for external endpoint
MINIO_URL_EXPIRY_HOURS7Signed URL expiration time in hours
MINIO_CONSISTENCY_CHECK_ON_STARTTrueCheck bucket consistency on startup
MINIO_PRIVATE_BUCKETprivatePrivate bucket name
MINIO_PUBLIC_BUCKETpublicPublic bucket name
MINIO_BUCKET_CHECK_ON_SAVEFalseCheck bucket existence on save
MINIO_BROWSER_REDIRECT_URLhttp://localhost/minio-console/URL for MinIO browser console redirects
MINIO_SERVER_URLhttp://localhost/MinIO server URL for browser redirects

CORS Settings

VariableDefault ValueDescription
CSRF_TRUSTED_ORIGINS``List of trusted origins for CSRF protection
CORS_ALLOWED_ORIGINS``List of allowed CORS origins
CORS_ALLOWED_ORIGIN_REGEXES``Regex patterns for allowed origins
CORS_ALLOW_ALL_ORIGINSFalseAllow all origins (not recommended for production)

Authentication Settings

VariableDefault ValueDescription
IS_ENTERPRISE_MODE_ACTIVEFalseEnable enterprise mode features
IS_LOGIN_ACTIVETrueEnable login functionality
IS_SIGNUP_ACTIVEFalseEnable signup functionality
IS_GITHUB_LOGIN_ACTIVEFalseEnable GitHub OAuth login
IS_GOOGLE_LOGIN_ACTIVEFalseEnable Google OAuth login
GITHUB_CLIENT_ID``GitHub OAuth client ID
GITHUB_CLIENT_SECRET``GitHub OAuth client secret
GOOGLE_CLIENT_ID``Google OAuth client ID
GOOGLE_CLIENT_SECRET``Google OAuth client secret
ACCESS_TOKEN_LIFETIME_MINUTES5JWT access token lifetime in minutes
REFRESH_TOKEN_LIFETIME_DAYS30JWT refresh token lifetime in days

Email Settings

VariableDefault ValueDescription
EMAIL_BACKENDdjango.core.mail.backends.smtp.EmailBackendDjango email backend
EMAIL_HOST``SMTP server host
EMAIL_PORT587SMTP server port
EMAIL_USE_TLSTrueUse TLS for SMTP
EMAIL_HOST_USER``SMTP username
EMAIL_HOST_PASSWORD``SMTP password. MUST be changed if using email functionality!
DEFAULT_FROM_EMAIL``Default sender email address

Scrapy Settings

VariableDefault ValueDescription
SCRAPY_USER_AGENTWaterCrawl/0.5.0 (+https://github.com/watercrawl/watercrawl)User agent for web scraping
SCRAPY_ROBOTSTXT_OBEYTrueObey robots.txt rules
SCRAPY_CONCURRENT_REQUESTS16Maximum concurrent requests
SCRAPY_DOWNLOAD_DELAY0Delay between requests in seconds
SCRAPY_CONCURRENT_REQUESTS_PER_DOMAIN4Maximum concurrent requests per domain
SCRAPY_CONCURRENT_REQUESTS_PER_IP4Maximum concurrent requests per IP
SCRAPY_COOKIES_ENABLEDFalseEnable cookies
SCRAPY_HTTPCACHE_ENABLEDTrueEnable HTTP cache
SCRAPY_HTTPCACHE_EXPIRATION_SECS3600HTTP cache expiration time in seconds
SCRAPY_HTTPCACHE_DIRhttpcacheHTTP cache directory
SCRAPY_LOG_LEVELERRORScrapy log level
SCRAPY_GOOGLE_API_KEY``Google API key (for search integration)
SCRAPY_GOOGLE_CSE_ID``Google Custom Search Engine ID

Playwright Settings

VariableDefault ValueDescription
PLAYWRIGHT_SERVERhttp://playwright:8000Playwright service URL
PLAYWRIGHT_API_KEYyour-secret-api-keyPlaywright API key for authentication. MUST be changed in production!
PLAYWRIGHT_PORT8000Playwright service port
PLAYWRIGHT_HOST0.0.0.0Playwright service host

Integration Settings

VariableDefault ValueDescription
OPENAI_API_KEY``OpenAI API key
STRIPE_SECRET_KEY``Stripe secret key
STRIPE_WEBHOOK_SECRET``Stripe webhook secret
GOOGLE_ANALYTICS_ID``Google Analytics tracking ID

Feature Flags

VariableDefault ValueDescription
MAX_CRAWL_DEPTH-1Maximum crawl depth (-1 for unlimited)
CAPTURE_USAGE_HISTORYTrueCapture user usage history

Example Configuration

Below is a basic configuration example for the .env file:

# Core Settings
VERSION=v0.7.1 # Check https://github.com/watercrawl/WaterCrawl/releases for latest version
NGINX_PORT=80
SECRET_KEY=your_secure_secret_key_here # Generate with: openssl rand -base64 32
DEBUG=False
ALLOWED_HOSTS=example.com,www.example.com
FRONTEND_URL=https://example.com

# Database Settings
POSTGRES_PASSWORD=secure_database_password # Use a strong password!

# MinIO Settings (Important for production)
MINIO_EXTERNAL_ENDPOINT=example.com
MINIO_BROWSER_REDIRECT_URL=https://example.com/minio-console/
MINIO_SERVER_URL=https://example.com/
MINIO_ACCESS_KEY=secure_access_key # Use a strong username
MINIO_SECRET_KEY=secure_secret_key # Use a strong password!

# Email Settings (for user notifications)
EMAIL_HOST=smtp.example.com
EMAIL_PORT=587
EMAIL_USE_TLS=True
[email protected]
EMAIL_HOST_PASSWORD=secure_email_password # Use a strong password or app-specific password
[email protected]

⚠️ Security Warning: Never commit your .env file to version control or share it publicly. It contains sensitive information that should be kept private.