Configuration Guide
Environment Configuration
The application uses several environment files for configuration. Here's a detailed breakdown of each:
app.env
Contains the main application configuration including:
Django Settings
SECRET_KEY
: Secret key for Django applicationDEBUG
: Debug mode (True/False)ALLOWED_HOSTS
: List of allowed hostsLANGUAGE_CODE
: Default language code (e.g., en-us)TIME_ZONE
: Server time zone (e.g., UTC)USE_I18N
: Enable internationalizationUSE_TZ
: Enable timezone support
Database Configuration
DATABASE_URL
: PostgreSQL connection URL
JWT Settings
ACCESS_TOKEN_LIFETIME_MINUTES
: Access token expiration time in minutesREFRESH_TOKEN_LIFETIME_DAYS
: Refresh token expiration time in days
Celery Configuration
CELERY_BROKER_URL
: Redis URL for CeleryCELERY_RESULT_BACKEND
: Backend for storing results
Storage Configuration
STATICFILES_STORAGE
: Storage backend for static filesDEFAULT_FILE_STORAGE
: Storage backend for media files
MinIO Settings (Optional)
MINIO_ENDPOINT
: MinIO server endpointMINIO_EXTERNAL_ENDPOINT
: Public MinIO endpointMINIO_EXTERNAL_ENDPOINT_USE_HTTPS
: Use HTTPS for external endpointMINIO_REGION
: MinIO regionMINIO_ACCESS_KEY
: MinIO access keyMINIO_SECRET_KEY
: MinIO secret keyMINIO_USE_HTTPS
: Use HTTPS for internal endpointMINIO_URL_EXPIRY_HOURS
: Signed URL expiration timeMINIO_PRIVATE_BUCKET
: Private bucket nameMINIO_PUBLIC_BUCKET
: Public bucket name
CORS Settings
CSRF_TRUSTED_ORIGINS
: List of trusted origins for CSRFCORS_ALLOWED_ORIGINS
: List of allowed CORS originsCORS_ALLOWED_ORIGIN_REGEXES
: Regex patterns for allowed originsCORS_ALLOW_ALL_ORIGINS
: Allow all origins (not recommended for production)
Plugins Configuration
WATER_CRAWL_PLUGINS
: List of enabled plugins
OpenAI Settings (Optional)
OPENAI_API_KEY
: OpenAI API key for AI features
Scrapy Settings
SCRAPY_USER_AGENT
: Custom user agent stringSCRAPY_ROBOTSTXT_OBEY
: Respect robots.txt rulesSCRAPY_CONCURRENT_REQUESTS
: Maximum concurrent requestsSCRAPY_DOWNLOAD_DELAY
: Delay between requestsSCRAPY_CONCURRENT_REQUESTS_PER_DOMAIN
: Max requests per domainSCRAPY_CONCURRENT_REQUESTS_PER_IP
: Max requests per IPSCRAPY_COOKIES_ENABLED
: Enable cookie handlingSCRAPY_HTTPCACHE_ENABLED
: Enable HTTP cachingSCRAPY_HTTPCACHE_EXPIRATION_SECS
: Cache expiration timeSCRAPY_HTTPCACHE_DIR
: Cache directorySCRAPY_LOG_LEVEL
: Logging level
Authentication Settings
IS_LOGIN_ACTIVE
: Enable/disable loginIS_SIGNUP_ACTIVE
: Enable/disable signupIS_GITHUB_LOGIN_ACTIVE
: Enable GitHub OAuthIS_GOOGLE_LOGIN_ACTIVE
: Enable Google OAuth
OAuth Settings (Optional)
GOOGLE_CLIENT_ID
: Google OAuth client IDGOOGLE_CLIENT_SECRET
: Google OAuth client secretGITHUB_CLIENT_ID
: GitHub OAuth client IDGITHUB_CLIENT_SECRET
: GitHub OAuth client secret
Email Settings (Optional)
EMAIL_BACKEND
: Email backend configurationEMAIL_HOST
: SMTP server hostEMAIL_PORT
: SMTP server portEMAIL_USE_TLS
: Use TLS for emailEMAIL_HOST_USER
: SMTP usernameEMAIL_HOST_PASSWORD
: SMTP passwordDEFAULT_FROM_EMAIL
: Default sender email
db.env
PostgreSQL database configuration:
POSTGRES_USER
: Database userPOSTGRES_PASSWORD
: Database passwordPOSTGRES_DB
: Database name
for more information. check official documentation https://hub.docker.com/_/postgres
minio.env
MinIO configuration:
MINIO_BROWSER_REDIRECT_URL
: MinIO browser redirect URLMINIO_SERVER_URL
: MinIO server URLMINIO_ROOT_USER
: MinIO root userMINIO_ROOT_PASSWORD
: MinIO root password
for more information. check official documentation https://hub.docker.com/r/minio/minio
Example Configurations
app.env
# Django settings
SECRET_KEY=django-insecure-example-key-change-this-in-production
DEBUG=True
ALLOWED_HOSTS=*
LANGUAGE_CODE=en-us
TIME_ZONE=UTC
USE_I18N=True
USE_TZ=True
# Database
DATABASE_URL=postgresql://postgres:postgres@postgres:5432/postgres
# JWT settings
ACCESS_TOKEN_LIFETIME_MINUTES=5
REFRESH_TOKEN_LIFETIME_DAYS=30
# Celery settings
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=django-db
# Enterprise mode
IS_ENTERPRISE_MODE_ACTIVE=False
# Storage settings
STATICFILES_STORAGE=django.contrib.staticfiles.storage.StaticFilesStorage
DEFAULT_FILE_STORAGE=django.core.files.storage.FileSystemStorage
# MinIO settings (optional)
# MINIO_ENDPOINT=minio:9000
# MINIO_EXTERNAL_ENDPOINT=localhost:9000
# MINIO_EXTERNAL_ENDPOINT_USE_HTTPS=False
# MINIO_REGION=us-east-1
# MINIO_ACCESS_KEY=minioadmin
# MINIO_SECRET_KEY=minioadmin
# MINIO_USE_HTTPS=False
# MINIO_URL_EXPIRY_HOURS=7
# MINIO_PRIVATE_BUCKET=private
# MINIO_PUBLIC_BUCKET=public
# CORS settings
CSRF_TRUSTED_ORIGINS=http://localhost:8000
CORS_ALLOWED_ORIGINS=http://localhost:8000
CORS_ALLOW_ALL_ORIGINS=True
# Plugins (optional)
# WATER_CRAWL_PLUGINS=watercrawl_openai.OpenAIPlugin
# OpenAI settings (optional)
# OPENAI_API_KEY=your-openai-api-key
# Scrapy settings
SCRAPY_USER_AGENT="WaterCrawl/0.1 (+https://github.com/watercrawl/watercrawl)"
SCRAPY_ROBOTSTXT_OBEY=True
SCRAPY_CONCURRENT_REQUESTS=32
SCRAPY_DOWNLOAD_DELAY=0
SCRAPY_CONCURRENT_REQUESTS_PER_DOMAIN=16
SCRAPY_CONCURRENT_REQUESTS_PER_IP=16
SCRAPY_COOKIES_ENABLED=False
SCRAPY_HTTPCACHE_ENABLED=True
SCRAPY_HTTPCACHE_EXPIRATION_SECS=3600
SCRAPY_HTTPCACHE_DIR=httpcache
SCRAPY_LOG_LEVEL=ERROR
# Authentication settings
IS_LOGIN_ACTIVE=True
IS_SIGNUP_ACTIVE=True
IS_GITHUB_LOGIN_ACTIVE=False
IS_GOOGLE_LOGIN_ACTIVE=False
# OAuth settings (optional)
# GOOGLE_CLIENT_ID=your-google-client-id
# GOOGLE_CLIENT_SECRET=your-google-client-secret
# GITHUB_CLIENT_ID=your-github-client-id
# GITHUB_CLIENT_SECRET=your-github-client-secret
# Email settings (optional)
# EMAIL_BACKEND=django.core.mail.backends.smtp.EmailBackend
# EMAIL_HOST=smtp.gmail.com
# EMAIL_PORT=587
# EMAIL_USE_TLS=True
# [email protected]
# EMAIL_HOST_PASSWORD=your-app-specific-password
# [email protected]
db.env
# Database settings
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=watercrawl
minio.env
# MinIO settings
MINIO_BROWSER_REDIRECT_URL=http://localhost:9001
MINIO_SERVER_URL=http://localhost:9000
MINIO_ROOT_USER=minio
MINIO_ROOT_PASSWORD=minio123
Configuration Tips
-
Security
- Always change default passwords in production
- Use strong, unique passwords
- Enable HTTPS in production
- Restrict CORS origins
-
Performance
- Adjust Celery worker settings based on load
- Configure proper database connection pools
- Set appropriate cache settings
-
Storage
- Configure backup policies for MinIO
- Set appropriate storage quotas
- Monitor storage usage