WaterCrawl API (1.0.0)
Download OpenAPI specification:Download
This documentation covers all the external APIs that can be accessed using a Team API Key. These APIs are designed for integration with external systems and services.
All endpoints in this documentation require authentication using an API Key. To authenticate your requests:
- Generate an API Key from the API Keys Dashboard
- Include the API Key in your requests using the
X-API-Key
header:
curl -H "X-API-Key: your_api_key_here" https://api.watercrawl.dev/v1/...
Important Notes:
- Keep your API Key secure and never share it publicly
- You can generate multiple API Keys for different purposes
- You can revoke API Keys at any time from the dashboard
- Each API Key is associated with a specific team
API requests are rate-limited based on your team's plan. Please refer to your plan details for specific limits.
List crawl requests
Retrieve a list of all crawl requests for your team.
The response includes:
- Request status (pending, running, completed, failed)
- Creation and completion timestamps
- Crawling configuration details
- Progress statistics
Results are paginated and can be filtered by status.
Authorizations:
query Parameters
page | integer A page number within the paginated result set. |
Responses
Response samples
- 200
- 500
{- "count": 123,
- "results": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "status": "new",
- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}, - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z",
- "duration": "string",
- "number_of_documents": "string"
}
]
}
Start a new crawl request
Start a new web crawling task with specified configuration.
This endpoint allows you to:
- Start a new crawl with custom settings
- Configure crawling depth and scope
- Set specific data extraction rules
- Define crawling boundaries and limits
The crawl request will be queued and processed based on your team's plan limits.
Authorizations:
Request Body schema: required
url required | string <uri> <= 255 characters |
required | object (CrawlOption) |
Responses
Request samples
- Payload
{- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}
}
Response samples
- 201
- 400
- 500
{- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "status": "new",
- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}, - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z",
- "duration": "string",
- "number_of_documents": "string"
}
Get crawl request
Get detailed information about a specific crawl request.
Returns comprehensive information including:
- Current status and progress
- Configuration settings used
- Error details (if any)
- Resource usage statistics
- Associated results and downloads
Authorizations:
path Parameters
id required | string |
Responses
Response samples
- 200
- 404
- 500
{- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "status": "new",
- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}, - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z",
- "duration": "string",
- "number_of_documents": "string"
}
Cancel a running crawl
Cancel an active crawling task.
This will:
- Stop the crawler immediately
- Save any data collected so far
- Free up crawling resources
- Mark the request as cancelled
Note: Cancelled requests cannot be resumed.
Authorizations:
path Parameters
id required | string |
Responses
Response samples
- 404
- 500
{- "message:": "Not found.",
- "errors": null,
- "code": 404
}
Download crawl result
Download the collected data from a completed crawl request.
Supports multiple formats:
- JSON (structured data)
- CSV (tabular data)
- ZIP (compressed with attachments)
The response includes:
- Extracted data points
- Metadata and timestamps
- Error logs (if any)
- Downloaded resources (based on configuration)
Authorizations:
path Parameters
id required | string |
Responses
Response samples
- 200
- 404
- 500
{- "property1": null,
- "property2": null
}
Check crawl status
Real-time status monitoring using Server-Sent Events (SSE).
The endpoint streams updates every second with:
- Current crawling status
- Pages crawled/remaining
- Data extracted
- Error counts
- Resource usage
Message Types:
- 'state': Contains crawl request status updates
- 'result': Contains new crawl results as they arrive
Connection remains open until:
- Crawl completes
- Error occurs
- Client disconnects
Authorizations:
path Parameters
id required | string |
Responses
Response samples
- 200
- 404
- 500
"string"
List crawl results
List all crawl results associated with your team's requests.
The response includes:
- Extracted data
- Timestamps
- Success/failure status
- Resource statistics
- Associated attachments
Results are paginated and can be filtered by crawl request.
Authorizations:
path Parameters
crawl_request_uuid required | string^[0-9a-fA-F-]{36}$ |
query Parameters
page | integer A page number within the paginated result set. |
Responses
Response samples
- 200
- 500
{- "count": 123,
- "results": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachments": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachment_type": "pdf",
- "filename": "string"
}
], - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z"
}
]
}
Get crawl result
Get detailed information about a specific crawl result.
Returns:
- Complete extracted data
- Crawling metadata
- Error details (if any)
- Performance metrics
- Resource usage statistics
Authorizations:
path Parameters
crawl_request_uuid required | string^[0-9a-fA-F-]{36}$ |
id required | string |
Responses
Response samples
- 200
- 404
- 500
{- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachments": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachment_type": "pdf",
- "filename": "string"
}
], - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z"
}