Services Documentation
WaterCrawl consists of several microservices working together. Here's a detailed overview of each service:
Core Services
Core (Django Application)
- Purpose: Main application server
- Tech Stack: Django with Gunicorn
- Default Port: 9000
- Dependencies: PostgreSQL, Redis
- Key Features:
- REST API endpoints
- User authentication
- Crawl job management
- Plugin system
- Data processing
- Configuration: Uses
app.env
- Storage: Mounts
/var/www/storage
(If you setup without MinIO it will be requied)
Frontend
- Purpose: Web interface
- Tech Stack: React
- Configuration: Uses
frontend.env
- Dependencies: Core API
- Key Features:
- User interface
- Interactive dashboard
- Job management interface
Nginx
- Purpose: Web server and reverse proxy
- Tech Stack: Nginx Alpine 1.25
- Default Port: 80
- Dependencies: Frontend, Core
- Volumes:
- Configuration:
./nginx:/etc/nginx/conf.d:ro
- Storage:
/var/www/storage
- Configuration:
Celery (Task Queue)
- Purpose: Background task processing
- Tech Stack: Celery
- Dependencies: Redis
- Configuration: Uses
app.env
- State Directory:
.celery/worker.state
- Key Features:
- Asynchronous task execution
- Crawl job scheduling
- Data processing tasks
- Plugin execution
- Periodic tasks (beat)
Database Services
PostgreSQL
- Purpose: Primary database
- Version: 17.2 Alpine
- Default Port: 5432
- Configuration: Uses
db.env
- Data Location:
/var/lib/postgresql/data
- Health Check:
- Command:
pg_isready -U postgres
- Interval: 5s
- Command:
- Key Features:
- User data storage
- Crawl job metadata
- Plugin configurations
- System settings
Redis
- Purpose: Message broker and caching
- Version: Latest
- Default Port: 6379
- Key Features:
- Celery message broker
- Cache storage
- Real-time updates
- Session management
Optional Services
MinIO (Optional)
- Purpose: Object storage
- Version: RELEASE.2024-11-07T00-52-20Z
- Default Ports:
- API: 9000
- Console: 9001
- Configuration: Uses
minio.env
- Data Location:
/data
- Health Check:
- Command:
curl -f http://localhost:9001/minio/health/live
- Interval: 30s
- Command:
- Key Features:
- File storage
- Media storage
- Crawl data storage
- Plugin storage
Volumes
The following persistent volumes are used:
postgres-db
: PostgreSQL datastorage_volume
: Shared storage for app and nginxredis-data
: Redis dataminio-data
: MinIO data (optional)
Service Management
Starting Services
# Start all services
docker compose up -d
# Start specific service
docker compose up -d [service_name]
Stopping Services
# Stop all services
docker compose down
# Stop specific service
docker compose stop [service_name]
Viewing Logs
# View all logs
docker compose logs
# View specific service logs
docker compose logs [service_name]
# Follow logs
docker compose logs -f [service_name]
Service Health Checks
# Check service status
docker compose ps
# Check service health
docker compose ps --format "table {{.Name}}\t{{.Status}}\t{{.Health}}"
Scaling Services
Horizontal Scaling
Celery workers can be scaled as needed:
# Scale celery workers
docker compose up -d --scale celery=3