API Overview
Welcome to the WaterCrawl API documentation. This guide will help you understand and integrate with our API.
Client Libraries
- Python
- Node.js
pip install watercrawl-py
from watercrawl import WaterCrawlAPIClient
npm install @watercrawl/nodejs
import { WaterCrawlAPIClient } from '@watercrawl/nodejs';
Authentication
All API requests require authentication using a JWT token. Include the token in the Authorization header:
- Python
- cURL
- Node.js
from watercrawl import WaterCrawlAPIClient
# The client handles authentication automatically
client = WaterCrawlAPIClient('your_api_key')
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.watercrawl.dev/api/v1/...
import { WaterCrawlAPIClient } from '@watercrawl/nodejs';
// Initialize the client with your API key
const client = new WaterCrawlAPIClient('your-api-key');
Status Values
Crawl requests can have the following status values:
new
: Crawl request created but not startedrunning
: Crawl is in progressfinished
: Crawl completed successfullycancelling
: Crawl is being cancelledcanceled
: Crawl was cancelledfailed
: Crawl failed due to an error
API Endpoints
- Scrape URL: Start a new crawl
- Create Crawl Request: Start a new crawl
- List Crawl Requests: Get all crawls
- Get Crawl Request: Get crawl details
- Cancel Crawl Request: Stop a crawl
- Monitor Crawl Status: Track progress
- List Crawl Results: Get results
Best Practices
- Rate Limiting: Implement appropriate rate limiting in your client applications to avoid overwhelming the target websites.
- Resource Management: Use the
page_limit
andmax_depth
options to control the scope of your crawls. - Error Handling: Always check the status of your crawl requests and implement proper error handling.
- Content Extraction: Use
exclude_tags
andinclude_tags
to precisely target the content you need. - Domain Restrictions: Use
allowed_domains
to prevent the crawler from accessing unintended domains.