WaterCrawl API (1.0.0)

Download OpenAPI specification:Download

Introduction

This documentation covers all the external APIs that can be accessed using a Team API Key. These APIs are designed for integration with external systems and services.

Authentication

All endpoints in this documentation require authentication using an API Key. To authenticate your requests:

Generate an API Key from the API Keys Dashboard
Include the API Key in your requests using the X-API-Key header:

curl -H "X-API-Key: your_api_key_here" https://api.watercrawl.dev/v1/...

Important Notes:

Keep your API Key secure and never share it publicly
You can generate multiple API Keys for different purposes
You can revoke API Keys at any time from the dashboard
Each API Key is associated with a specific team

Rate Limiting

API requests are rate-limited based on your team's plan. Please refer to your plan details for specific limits.

Crawl Requests

List crawl requests

Retrieve a list of all crawl requests for your team.

The response includes:

A list of crawl requests
Request status (new, running, paused, finished, canceling, canceled, failed)
Creation and completion timestamps
Crawling configuration details
Progress statistics

Query parameters:

page: Page number (default: 1)
page_size: Number of items per page (default: 25, maximum: 100)
uuid: Filter requests by UUID
url: Filter requests by start URL
status: Filter requests by status
created_at: Filter requests by date

Extra filters:

url__contains: Filter requests by URL containing a specific string
url__startswith: Filter requests by URL starting with a specific string
created_at__gt: Filter requests by creation date (greater than)
created_at__lt: Filter requests by creation date (less than)

Authorizations:

ApiKeyAuth

query Parameters

created_at	string <date-time> Filter crawl requests by date
page	integer A page number within the paginated result set.
page_size	integer Number of results to return per page.
status	string Filter crawl requests by status.
url	string Filter crawl requests by start URL.
uuid	string Filter crawl requests by UUID.

Responses

Response samples

200
500

Content type

application/json

{"count": 123,
"next": "http://api.example.org/accounts/?page=4",
"previous": "http://api.example.org/accounts/?page=2",
"results": [{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"urls": ["http://example.com"
],
"crawl_type": "single",
"status": "new",
"options": {"spider_options": {"max_depth": 1,
"page_limit": 1,
"allowed_domains": [ ],
"exclude_paths": [ ],
"include_paths": [ ],
"proxy_server": "string"
},
"page_options": {"exclude_tags": [ ],
"include_tags": [ ],
"wait_time": 100,
"include_html": false,
"only_main_content": true,
"include_links": false,
"timeout": 15000,
"accept_cookies_selector": "string",
"locale": "en-US",
"extra_headers": {"property1": "string",
"property2": "string"
},
"actions": [ ]
},
"plugin_options": {"property1": "string",
"property2": "string"
}
},
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z",
"duration": "string",
"number_of_documents": "string",
"sitemap": "http://example.com"
}
]
}

Start a new crawl request

Start a new web crawling task with specified configuration.

This endpoint allows you to:

Start a new crawl with custom settings
Configure crawling depth and scope
Set specific data extraction rules
Define crawling boundaries and limits

The crawl request will be processed asynchronously. you will receive a task ID that you can use to track the progress of the crawl.

Authorizations:

ApiKeyAuth

Request Body schema:
application/json
required

url required	string <uri>
required	object (CrawlOption)

Responses

Request samples

Payload

Content type

application/json

{"url": "http://example.com",
"options": {"spider_options": {"max_depth": 1,
"page_limit": 1,
"allowed_domains": [ ],
"exclude_paths": [ ],
"include_paths": [ ],
"proxy_server": "string"
},
"page_options": {"exclude_tags": [ ],
"include_tags": [ ],
"wait_time": 100,
"include_html": false,
"only_main_content": true,
"include_links": false,
"timeout": 15000,
"accept_cookies_selector": "string",
"locale": "en-US",
"extra_headers": {"property1": "string",
"property2": "string"
},
"actions": [ ]
},
"plugin_options": {"property1": "string",
"property2": "string"
}
}
}

Response samples

201
400
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"urls": ["http://example.com"
],
"crawl_type": "single",
"status": "new",
"options": {"spider_options": {"max_depth": 1,
"page_limit": 1,
"allowed_domains": [ ],
"exclude_paths": [ ],
"include_paths": [ ],
"proxy_server": "string"
},
"page_options": {"exclude_tags": [ ],
"include_tags": [ ],
"wait_time": 100,
"include_html": false,
"only_main_content": true,
"include_links": false,
"timeout": 15000,
"accept_cookies_selector": "string",
"locale": "en-US",
"extra_headers": {"property1": "string",
"property2": "string"
},
"actions": [ ]
},
"plugin_options": {"property1": "string",
"property2": "string"
}
},
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z",
"duration": "string",
"number_of_documents": "string",
"sitemap": "http://example.com"
}

Get crawl request

Get detailed information about a specific crawl request.

Returns comprehensive information including:

Current status and progress
Configuration settings used
Resource usage statistics

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

200
404
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"urls": ["http://example.com"
],
"crawl_type": "single",
"status": "new",
"options": {"spider_options": {"max_depth": 1,
"page_limit": 1,
"allowed_domains": [ ],
"exclude_paths": [ ],
"include_paths": [ ],
"proxy_server": "string"
},
"page_options": {"exclude_tags": [ ],
"include_tags": [ ],
"wait_time": 100,
"include_html": false,
"only_main_content": true,
"include_links": false,
"timeout": 15000,
"accept_cookies_selector": "string",
"locale": "en-US",
"extra_headers": {"property1": "string",
"property2": "string"
},
"actions": [ ]
},
"plugin_options": {"property1": "string",
"property2": "string"
}
},
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z",
"duration": "string",
"number_of_documents": "string",
"sitemap": "http://example.com"
}

Cancel a running crawl

Cancel an active crawling task.

This will:

Stop the crawler immediately
Save any data collected so far
Free up crawling resources
Mark the request as canceled

Note: Canceled requests cannot be resumed.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

404
500

Content type

application/json

{"message:": "Not found.",
"errors": null,
"code": 404
}

Download crawl result

Download the results of a completed crawl request as a JSON file.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

query Parameters

output_format

string

Enum: "json" "markdown"

Format of the download file. Default: json. Available formats: markdown, json.

Responses

Response samples

200
404
500

Content type

application/json

{"property1": null,
"property2": null
}

Get sitemap graph

Get a graph representation of a crawl request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

200
404
500

Content type

application/json

{"property1": null,
"property2": null
}

Get markdown report

Get a markdown representation of a crawl request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

200
404
500

Content type

application/json

"string"

Check crawl status

Real-time status monitoring using Server-Sent Events (SSE).

The endpoint streams updates every second with:

Current crawling status
Pages crawled
Data extracted

Message Types:

'state': Contains crawl request status updates
'result': Contains new crawl results as they arrive

Connection remains open until:

Crawl completes
Error occurs
Client disconnects

Query Parameters:

prefetched: If you set this to True, you will get the result json instead of a download link. Default is False

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

query Parameters

prefetched

boolean

Prefetch crawl results. Default: False.

Responses

Response samples

200
404
500

Content type

application/json

"string"

Batch create crawl requests

Start multiple urls crawling tasks in a single request. This endpoint allows you to:

Submit a batch of crawl requests
The other parameters are the same as the single crawl request creation endpoint.

The crawl request will be processed asynchronously. you will receive a task ID that you can use to track the progress of the crawl.

Authorizations:

ApiKeyAuth

Request Body schema:
application/json
required

urls required	Array of strings <uri> non-empty [ items <uri > ]
required	object (BatchCrawlOption)

Responses

Request samples

Payload

Content type

application/json

{"urls": ["http://example.com"
],
"options": {"spider_options": {"proxy_server": "string"
},
"page_options": {"exclude_tags": [ ],
"include_tags": [ ],
"wait_time": 100,
"include_html": false,
"only_main_content": true,
"include_links": false,
"timeout": 15000,
"accept_cookies_selector": "string",
"locale": "en-US",
"extra_headers": {"property1": "string",
"property2": "string"
},
"actions": [ ]
},
"plugin_options": {"property1": "string",
"property2": "string"
}
}
}

Response samples

201
400
404
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"urls": ["http://example.com"
],
"crawl_type": "single",
"status": "new",
"options": {"spider_options": {"max_depth": 1,
"page_limit": 1,
"allowed_domains": [ ],
"exclude_paths": [ ],
"include_paths": [ ],
"proxy_server": "string"
},
"page_options": {"exclude_tags": [ ],
"include_tags": [ ],
"wait_time": 100,
"include_html": false,
"only_main_content": true,
"include_links": false,
"timeout": 15000,
"accept_cookies_selector": "string",
"locale": "en-US",
"extra_headers": {"property1": "string",
"property2": "string"
},
"actions": [ ]
},
"plugin_options": {"property1": "string",
"property2": "string"
}
},
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z",
"duration": "string",
"number_of_documents": "string",
"sitemap": "http://example.com"
}

Crawl Results

List crawl results

List all crawl results associated with your team's crawl requests.

The response includes:

Extracted data
Timestamps
Success/failure status
Resource statistics
Associated attachments

Query Parameters:

page: Page number
page_size: Number of results per page(maximum 100)
url: Filter results by URL
created_at: Filter results by creation date (exact match)
prefetched: If you set this to True, you will get the result json instead of a download link. Default is False

Extra filters:

url__contains: Filter results by URL containing a specific string
url__startswith: Filter results by URL starting with a specific string
created_at__gt: Filter results by creation date (greater than or equal to)
created_at__lt: Filter results by creation date (less than or equal to)

Results are paginated and can be filtered by crawl request.

Authorizations:

ApiKeyAuth

path Parameters

crawl_request_uuid

required

string^[0-9a-fA-F-]{36}$

query Parameters

created_at	string <date-time> Filter crawl results by date
page	integer A page number within the paginated result set.
page_size	integer Number of results to return per page.
prefetched	boolean Prefetch crawl results. Default: False.
url	string Filter crawl results by URL.

Responses

Response samples

200
500

Content type

application/json

{"count": 123,
"next": "http://api.example.org/accounts/?page=4",
"previous": "http://api.example.org/accounts/?page=2",
"results": [{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"result": "http://example.com",
"attachments": [{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"attachment": "http://example.com",
"attachment_type": "pdf",
"filename": "string"
}
],
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z"
}
]
}

Get crawl result

Get detailed information about a specific crawl result.

Returns:

Complete extracted data
Crawling metadata
Performance metrics
Resource usage statistics

Authorizations:

ApiKeyAuth

path Parameters

crawl_request_uuid required	string^[0-9a-fA-F-]{36}$
id required	string

Responses

Response samples

200
404
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"result": "http://example.com",
"attachments": [{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"attachment": "http://example.com",
"attachment_type": "pdf",
"filename": "string"
}
],
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z"
}

Proxy Servers

List all proxy servers (Global and Private proxies)

Get a list of all proxy servers. This list includes all Global and Team proxy servers.

Authorizations:

ApiKeyAuth

Responses

Response samples

200
404
500

Content type

application/json

{"name": "string",
"slug": "string",
"category": "general"
}

Search Requests

List search requests

Get a list of all search requests for your team.

Authorizations:

ApiKeyAuth

query Parameters

page	integer A page number within the paginated result set.
page_size	integer Number of results to return per page.

Responses

Response samples

200
500

Content type

application/json

{"count": 123,
"next": "http://api.example.org/accounts/?page=4",
"previous": "http://api.example.org/accounts/?page=2",
"results": [{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"query": "string",
"search_options": {"language": "string",
"country": "string",
"time_renge": "any",
"search_type": "web",
"depth": "basic"
},
"result_limit": 5,
"duration": "string",
"status": "new",
"result": "http://example.com",
"created_at": "2019-08-24T14:15:22Z"
}
]
}

Create search request

Create a new search request.

Authorizations:

ApiKeyAuth

Request Body schema:
application/json
required

query required	string <= 255 characters
required	object (SearchOptions)
result_limit	integer [ 1 .. 20 ] Default: 5

Responses

Request samples

Payload

Content type

application/json

{"query": "string",
"search_options": {"language": "string",
"country": "string",
"time_renge": "any",
"search_type": "web",
"depth": "basic"
},
"result_limit": 5
}

Response samples

201
400
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"query": "string",
"search_options": {"language": "string",
"country": "string",
"time_renge": "any",
"search_type": "web",
"depth": "basic"
},
"result_limit": 5,
"duration": "string",
"status": "new",
"result": "http://example.com",
"created_at": "2019-08-24T14:15:22Z"
}

Get search request

Get detailed information about a specific search request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

query Parameters

prefetched

boolean

Prefetch crawl results. Default: False.

Responses

Response samples

200
404
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"query": "string",
"search_options": {"language": "string",
"country": "string",
"time_renge": "any",
"search_type": "web",
"depth": "basic"
},
"result_limit": 5,
"duration": "string",
"status": "new",
"result": "http://example.com",
"created_at": "2019-08-24T14:15:22Z"
}

Delete search request

Stop a running search request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

404
500

Content type

application/json

{"message:": "Not found.",
"errors": null,
"code": 404
}

Check search request status

Real-time status monitoring using Server-Sent Events (SSE).

The endpoint streams updates every second with:

Current search request status
Data extracted

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

query Parameters

prefetched

boolean

Prefetch crawl results. Default: False.

Responses

Response samples

200
404
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"query": "string",
"search_options": {"language": "string",
"country": "string",
"time_renge": "any",
"search_type": "web",
"depth": "basic"
},
"result_limit": 5,
"duration": "string",
"status": "new",
"result": "http://example.com",
"created_at": "2019-08-24T14:15:22Z"
}

Sitemap Requests

List sitemap requests

Get a list of all sitemap requests for your team.

Authorizations:

ApiKeyAuth

query Parameters

page	integer A page number within the paginated result set.
page_size	integer Number of results to return per page.

Responses

Response samples

200
500

Content type

application/json

{"count": 123,
"next": "http://api.example.org/accounts/?page=4",
"previous": "http://api.example.org/accounts/?page=2",
"results": [{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"status": "new",
"options": {"include_subdomains": true,
"ignore_sitemap_xml": false,
"search": "string",
"include_paths": [ ],
"exclude_paths": [ ]
},
"duration": "string",
"result": "http://example.com",
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z"
}
]
}

Create sitemap request

Create a new sitemap request.

Authorizations:

ApiKeyAuth

Request Body schema:
application/json
required

url required	string <uri> <= 255 characters
required	object (SitemapRequestOption)

Responses

Request samples

Payload

Content type

application/json

{"url": "http://example.com",
"options": {"include_subdomains": true,
"ignore_sitemap_xml": false,
"search": "string",
"include_paths": [ ],
"exclude_paths": [ ]
}
}

Response samples

201
400
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"status": "new",
"options": {"include_subdomains": true,
"ignore_sitemap_xml": false,
"search": "string",
"include_paths": [ ],
"exclude_paths": [ ]
},
"duration": "string",
"result": "http://example.com",
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z"
}

Get sitemap request

Get detailed information about a specific sitemap request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

200
404
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"url": "http://example.com",
"status": "new",
"options": {"include_subdomains": true,
"ignore_sitemap_xml": false,
"search": "string",
"include_paths": [ ],
"exclude_paths": [ ]
},
"duration": "string",
"result": "http://example.com",
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z"
}

Delete sitemap request

Delete a sitemap request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

404
500

Content type

application/json

{"message:": "Not found.",
"errors": null,
"code": 404
}

Get sitemap graph

Get a graph representation of a sitemap request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

200
404
500

Content type

application/json

{"property1": null,
"property2": null
}

Get sitemap markdown

Get a markdown representation of a sitemap request.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

200
404
500

Content type

application/json

"string"

Check sitemap request status

Real-time status monitoring using Server-Sent Events (SSE). The endpoint streams updates every second with:

Current sitemap request status

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

query Parameters

prefetched

boolean

Prefetch crawl results. Default: False.

Responses

Response samples

200
404
500

Content type

application/json

"string"

Subscriptions

List team subscriptions

Returns a list of all active subscriptions for the team.

Authorizations:

ApiKeyAuth

Responses

Response samples

200
500

Content type

application/json

[{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"plan": {"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"name": "string",
"label": "string",
"group": "yearly",
"description": "string",
"price_before_discount": "string",
"price": "string",
"number_of_users": -2147483648,
"page_credit": -2147483648,
"daily_page_credit": -2147483648,
"crawl_max_depth": -2147483648,
"crawl_max_limit": -2147483648,
"max_concurrent_crawl": -2147483648,
"is_default": true,
"features": [{"title": "string",
"help_text": "string",
"icon": "info"
}
]
},
"status": "string",
"remain_page_credit": -2147483648,
"remain_daily_page_credit": -2147483648,
"start_at": "2019-08-24T14:15:22Z",
"current_period_start_at": "2019-08-24T14:15:22Z",
"current_period_end_at": "2019-08-24T14:15:22Z",
"cancel_at": "2019-08-24T14:15:22Z",
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z"
}
]

Get subscription details

Returns detailed information about the team's current subscription.

Authorizations:

ApiKeyAuth

path Parameters

id

required

string

Responses

Response samples

200
404
500

Content type

application/json

{"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"plan": {"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"name": "string",
"label": "string",
"group": "yearly",
"description": "string",
"price_before_discount": "string",
"price": "string",
"number_of_users": -2147483648,
"page_credit": -2147483648,
"daily_page_credit": -2147483648,
"crawl_max_depth": -2147483648,
"crawl_max_limit": -2147483648,
"max_concurrent_crawl": -2147483648,
"is_default": true,
"features": [{"title": "string",
"help_text": "string",
"icon": "info"
}
]
},
"status": "string",
"remain_page_credit": -2147483648,
"remain_daily_page_credit": -2147483648,
"start_at": "2019-08-24T14:15:22Z",
"current_period_start_at": "2019-08-24T14:15:22Z",
"current_period_end_at": "2019-08-24T14:15:22Z",
"cancel_at": "2019-08-24T14:15:22Z",
"created_at": "2019-08-24T14:15:22Z",
"updated_at": "2019-08-24T14:15:22Z"
}

Get current subscription details

Returns detailed information about the team's current subscription.

Authorizations:

ApiKeyAuth

Responses

Response samples

200
500

Content type

application/json

{"plan_name": "string",
"status": "string",
"plan_page_credit": 0,
"plan_daily_page_credit": 0,
"plan_number_users": 0,
"remain_number_users": 0,
"remaining_page_credit": 0,
"remaining_daily_page_credit": 0,
"max_depth": 0,
"max_concurrent_crawl": 0,
"start_at": "2019-08-24T14:15:22Z",
"current_period_start_at": "2019-08-24T14:15:22Z",
"current_period_end_at": "2019-08-24T14:15:22Z",
"cancel_at": "2019-08-24T14:15:22Z",
"is_default": true
}

WaterCrawl API (1.0.0)

Introduction

Authentication

Important Notes:

Rate Limiting

Crawl Requests

List crawl requests

Authorizations:

query Parameters

Responses

Response samples

Start a new crawl request

Authorizations:

Request Body schema: application/jsonapplication/x-www-form-urlencodedmultipart/form-dataapplication/jsonrequired

Responses

Request samples

Response samples

Get crawl request

Authorizations:

path Parameters

Responses

Response samples

Cancel a running crawl

Authorizations:

path Parameters

Responses

Response samples

Download crawl result

Authorizations:

path Parameters

query Parameters

Responses

Response samples

Get sitemap graph

Authorizations:

path Parameters

Responses

Response samples

Get markdown report

Authorizations:

path Parameters

Responses

Response samples

Check crawl status

Authorizations:

path Parameters

query Parameters

Responses

Response samples

Batch create crawl requests

Authorizations:

Request Body schema: application/jsonapplication/x-www-form-urlencodedmultipart/form-dataapplication/jsonrequired

Responses

Request samples

Response samples

Crawl Results

List crawl results

Authorizations:

path Parameters

query Parameters

Responses

Response samples

Get crawl result

Authorizations:

path Parameters

Responses

Response samples

Proxy Servers

List all proxy servers (Global and Private proxies)

Authorizations:

Responses

Response samples

Search Requests

List search requests

Authorizations:

query Parameters

Responses

Response samples

Create search request

Authorizations:

Request Body schema:
application/json
required

Request Body schema:
application/json
required

Request Body schema:
application/json
required

Request Body schema:
application/json
required