WaterCrawl API (1.0.0)
Download OpenAPI specification:Download
This documentation covers all the external APIs that can be accessed using a Team API Key. These APIs are designed for integration with external systems and services.
All endpoints in this documentation require authentication using an API Key. To authenticate your requests:
- Generate an API Key from the API Keys Dashboard
- Include the API Key in your requests using the
X-API-Key
header:
curl -H "X-API-Key: your_api_key_here" https://api.watercrawl.dev/v1/...
Important Notes:
- Keep your API Key secure and never share it publicly
- You can generate multiple API Keys for different purposes
- You can revoke API Keys at any time from the dashboard
- Each API Key is associated with a specific team
API requests are rate-limited based on your team's plan. Please refer to your plan details for specific limits.
List crawl requests
Retrieve a list of all crawl requests for your team.
The response includes:
- A list of crawl requests
- Request status (new, running, paused, finished, cancelling, canceled, failed)
- Creation and completion timestamps
- Crawling configuration details
- Progress statistics
Query parameters:
- page: Page number (default: 1)
- page_size: Number of items per page (default: 25, maximum: 100)
- uuid: Filter requests by UUID
- url: Filter requests by start URL
- status: Filter requests by status
- created_at: Filter requests by date
Extra filters:
- url__contains: Filter requests by URL containing a specific string
- url__startswith: Filter requests by URL starting with a specific string
- created_at__gt: Filter requests by creation date (greater than)
- created_at__lt: Filter requests by creation date (less than)
Authorizations:
query Parameters
created_at | string <date-time> Filter crawl requests by date |
page | integer A page number within the paginated result set. |
page_size | integer Number of results to return per page. |
status | string Filter crawl requests by status. |
url | string Filter crawl requests by start URL. |
uuid | string Filter crawl requests by UUID. |
Responses
Response samples
- 200
- 500
{- "count": 123,
- "results": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "status": "new",
- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}, - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z",
- "duration": "string",
- "number_of_documents": "string",
}
]
}
Start a new crawl request
Start a new web crawling task with specified configuration.
This endpoint allows you to:
- Start a new crawl with custom settings
- Configure crawling depth and scope
- Set specific data extraction rules
- Define crawling boundaries and limits
The crawl request will be processed asynchronously. you will receive a task ID that you can use to track the progress of the crawl.
Authorizations:
Request Body schema: required
url required | string <uri> <= 255 characters |
required | object (CrawlOption) |
Responses
Request samples
- Payload
{- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}
}
Response samples
- 201
- 400
- 500
{- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "status": "new",
- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}, - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z",
- "duration": "string",
- "number_of_documents": "string",
}
Get crawl request
Get detailed information about a specific crawl request.
Returns comprehensive information including:
- Current status and progress
- Configuration settings used
- Resource usage statistics
Authorizations:
path Parameters
id required | string |
Responses
Response samples
- 200
- 404
- 500
{- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "status": "new",
- "options": {
- "spider_options": {
- "max_depth": 1,
- "page_limit": 1,
- "allowed_domains": [ ],
- "exclude_paths": [ ],
- "include_paths": [ ]
}, - "page_options": {
- "exclude_tags": [ ],
- "include_tags": [ ],
- "wait_time": 100,
- "include_html": false,
- "only_main_content": true,
- "include_links": false,
- "timeout": 15000,
- "accept_cookies_selector": "string",
- "locale": "en-US",
- "extra_headers": {
- "property1": "string",
- "property2": "string"
}, - "actions": [ ]
}, - "plugin_options": {
- "property1": "string",
- "property2": "string"
}
}, - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z",
- "duration": "string",
- "number_of_documents": "string",
}
Cancel a running crawl
Cancel an active crawling task.
This will:
- Stop the crawler immediately
- Save any data collected so far
- Free up crawling resources
- Mark the request as cancelled
Note: Cancelled requests cannot be resumed.
Authorizations:
path Parameters
id required | string |
Responses
Response samples
- 404
- 500
{- "message:": "Not found.",
- "errors": null,
- "code": 404
}
Download crawl result
Download the results of a completed crawl request as a JSON file.
Authorizations:
path Parameters
id required | string |
query Parameters
output_format | string Enum: "json" "markdown" Format of the download file. Default: json. Available formats: markdown, json. |
Responses
Response samples
- 200
- 404
- 500
{- "property1": null,
- "property2": null
}
Check crawl status
Real-time status monitoring using Server-Sent Events (SSE).
The endpoint streams updates every second with:
- Current crawling status
- Pages crawled
- Data extracted
Message Types:
- 'state': Contains crawl request status updates
- 'result': Contains new crawl results as they arrive
Connection remains open until:
- Crawl completes
- Error occurs
- Client disconnects
Query Parameters:
- prefetched: If you set this to True, you will get the result json instead of a download link. Default is False
Authorizations:
path Parameters
id required | string |
query Parameters
prefetched | boolean Prefetch crawl results. Default: False. |
Responses
Response samples
- 200
- 404
- 500
"string"
List crawl results
List all crawl results associated with your team's crawl requests.
The response includes:
- Extracted data
- Timestamps
- Success/failure status
- Resource statistics
- Associated attachments
Query Parameters:
- page: Page number
- page_size: Number of results per page(maximum 100)
- url: Filter results by URL
- created_at: Filter results by creation date (exact match)
- prefetched: If you set this to True, you will get the result json instead of a download link. Default is False
Extra filters:
- url__contains: Filter results by URL containing a specific string
- url__startswith: Filter results by URL starting with a specific string
- created_at__gt: Filter results by creation date (greater than or equal to)
- created_at__lt: Filter results by creation date (less than or equal to)
Results are paginated and can be filtered by crawl request.
Authorizations:
path Parameters
crawl_request_uuid required | string^[0-9a-fA-F-]{36}$ |
query Parameters
created_at | string <date-time> Filter crawl results by date |
page | integer A page number within the paginated result set. |
page_size | integer Number of results to return per page. |
prefetched | boolean Prefetch crawl results. Default: False. |
url | string Filter crawl results by URL. |
Responses
Response samples
- 200
- 500
{- "count": 123,
- "results": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachments": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachment_type": "pdf",
- "filename": "string"
}
], - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z"
}
]
}
Get crawl result
Get detailed information about a specific crawl result.
Returns:
- Complete extracted data
- Crawling metadata
- Performance metrics
- Resource usage statistics
Authorizations:
path Parameters
crawl_request_uuid required | string^[0-9a-fA-F-]{36}$ |
id required | string |
Responses
Response samples
- 200
- 404
- 500
{- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachments": [
- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "attachment_type": "pdf",
- "filename": "string"
}
], - "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z"
}
List team subscriptions
Returns a list of all active subscriptions for the team.
Authorizations:
Responses
Response samples
- 200
- 500
[- {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "plan": {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "name": "string",
- "label": "string",
- "group": "yearly",
- "description": "string",
- "price_before_discount": "string",
- "price": "string",
- "number_of_users": -2147483648,
- "page_credit": -2147483648,
- "daily_page_credit": -2147483648,
- "crawl_max_depth": -2147483648,
- "crawl_max_limit": -2147483648,
- "max_concurrent_crawl": -2147483648,
- "is_default": true,
- "features": [
- {
- "title": "string",
- "help_text": "string",
- "icon": "info"
}
]
}, - "status": "string",
- "remain_page_credit": -2147483648,
- "remain_daily_page_credit": -2147483648,
- "start_at": "2019-08-24T14:15:22Z",
- "current_period_start_at": "2019-08-24T14:15:22Z",
- "current_period_end_at": "2019-08-24T14:15:22Z",
- "cancel_at": "2019-08-24T14:15:22Z",
- "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z"
}
]
Get subscription details
Returns detailed information about the team's current subscription.
Authorizations:
path Parameters
id required | string |
Responses
Response samples
- 200
- 404
- 500
{- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "plan": {
- "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
- "name": "string",
- "label": "string",
- "group": "yearly",
- "description": "string",
- "price_before_discount": "string",
- "price": "string",
- "number_of_users": -2147483648,
- "page_credit": -2147483648,
- "daily_page_credit": -2147483648,
- "crawl_max_depth": -2147483648,
- "crawl_max_limit": -2147483648,
- "max_concurrent_crawl": -2147483648,
- "is_default": true,
- "features": [
- {
- "title": "string",
- "help_text": "string",
- "icon": "info"
}
]
}, - "status": "string",
- "remain_page_credit": -2147483648,
- "remain_daily_page_credit": -2147483648,
- "start_at": "2019-08-24T14:15:22Z",
- "current_period_start_at": "2019-08-24T14:15:22Z",
- "current_period_end_at": "2019-08-24T14:15:22Z",
- "cancel_at": "2019-08-24T14:15:22Z",
- "created_at": "2019-08-24T14:15:22Z",
- "updated_at": "2019-08-24T14:15:22Z"
}
Get current subscription details
Returns detailed information about the team's current subscription.
Authorizations:
Responses
Response samples
- 200
- 500
{- "plan_name": "string",
- "status": "string",
- "plan_page_credit": 0,
- "plan_daily_page_credit": 0,
- "plan_number_users": 0,
- "remain_number_users": 0,
- "remaining_page_credit": 0,
- "remaining_daily_page_credit": 0,
- "max_depth": 0,
- "max_concurrent_crawl": 0,
- "start_at": "2019-08-24T14:15:22Z",
- "current_period_start_at": "2019-08-24T14:15:22Z",
- "current_period_end_at": "2019-08-24T14:15:22Z",
- "cancel_at": "2019-08-24T14:15:22Z",
- "is_default": true
}