Skip to main content

WaterCrawl API (1.0.0)

Download OpenAPI specification:Download

Introduction

This documentation covers all the external APIs that can be accessed using a Team API Key. These APIs are designed for integration with external systems and services.

Authentication

All endpoints in this documentation require authentication using an API Key. To authenticate your requests:

  1. Generate an API Key from the API Keys Dashboard
  2. Include the API Key in your requests using the X-API-Key header:
curl -H "X-API-Key: your_api_key_here" https://api.watercrawl.dev/v1/...

Important Notes:

  • Keep your API Key secure and never share it publicly
  • You can generate multiple API Keys for different purposes
  • You can revoke API Keys at any time from the dashboard
  • Each API Key is associated with a specific team

Rate Limiting

API requests are rate-limited based on your team's plan. Please refer to your plan details for specific limits.

Crawl Requests

List crawl requests

Retrieve a list of all crawl requests for your team.

The response includes:

  • A list of crawl requests
  • Request status (new, running, paused, finished, cancelling, canceled, failed)
  • Creation and completion timestamps
  • Crawling configuration details
  • Progress statistics

Query parameters:

  • page: Page number (default: 1)
  • page_size: Number of items per page (default: 25, maximum: 100)
  • uuid: Filter requests by UUID
  • url: Filter requests by start URL
  • status: Filter requests by status
  • created_at: Filter requests by date

Extra filters:

  • url__contains: Filter requests by URL containing a specific string
  • url__startswith: Filter requests by URL starting with a specific string
  • created_at__gt: Filter requests by creation date (greater than)
  • created_at__lt: Filter requests by creation date (less than)
Authorizations:
ApiKeyAuth
query Parameters
created_at
string <date-time>

Filter crawl requests by date

page
integer

A page number within the paginated result set.

page_size
integer

Number of results to return per page.

status
string

Filter crawl requests by status.

url
string

Filter crawl requests by start URL.

uuid
string

Filter crawl requests by UUID.

Responses

Response samples

Content type
application/json
{
  • "count": 123,
  • "results": [
    ]
}

Start a new crawl request

Start a new web crawling task with specified configuration.

This endpoint allows you to:

  • Start a new crawl with custom settings
  • Configure crawling depth and scope
  • Set specific data extraction rules
  • Define crawling boundaries and limits

The crawl request will be processed asynchronously. you will receive a task ID that you can use to track the progress of the crawl.

Authorizations:
ApiKeyAuth
Request Body schema:
required
url
required
string <uri> <= 255 characters
required
object (CrawlOption)

Responses

Request samples

Content type
{
  • "options": {
    }
}

Response samples

Content type
application/json
{
  • "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
  • "status": "new",
  • "options": {
    },
  • "created_at": "2019-08-24T14:15:22Z",
  • "updated_at": "2019-08-24T14:15:22Z",
  • "duration": "string",
  • "number_of_documents": "string",
  • "sitemap": "http://example.com"
}

Get crawl request

Get detailed information about a specific crawl request.

Returns comprehensive information including:

  • Current status and progress
  • Configuration settings used
  • Resource usage statistics
Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
{
  • "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
  • "status": "new",
  • "options": {
    },
  • "created_at": "2019-08-24T14:15:22Z",
  • "updated_at": "2019-08-24T14:15:22Z",
  • "duration": "string",
  • "number_of_documents": "string",
  • "sitemap": "http://example.com"
}

Cancel a running crawl

Cancel an active crawling task.

This will:

  • Stop the crawler immediately
  • Save any data collected so far
  • Free up crawling resources
  • Mark the request as cancelled

Note: Cancelled requests cannot be resumed.

Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
{
  • "message:": "Not found.",
  • "errors": null,
  • "code": 404
}

Download crawl result

Download the results of a completed crawl request as a JSON file.

Authorizations:
ApiKeyAuth
path Parameters
id
required
string
query Parameters
output_format
string
Enum: "json" "markdown"

Format of the download file. Default: json. Available formats: markdown, json.

Responses

Response samples

Content type
application/json
{
  • "property1": null,
  • "property2": null
}

Get sitemap graph

Get a graph representation of a crawl request.

Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
{
  • "property1": null,
  • "property2": null
}

Get markdown report

Get a markdown representation of a crawl request.

Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
"string"

Check crawl status

Real-time status monitoring using Server-Sent Events (SSE).

The endpoint streams updates every second with:

  • Current crawling status
  • Pages crawled
  • Data extracted

Message Types:

  1. 'state': Contains crawl request status updates
  2. 'result': Contains new crawl results as they arrive

Connection remains open until:

  • Crawl completes
  • Error occurs
  • Client disconnects

Query Parameters:

  • prefetched: If you set this to True, you will get the result json instead of a download link. Default is False
Authorizations:
ApiKeyAuth
path Parameters
id
required
string
query Parameters
prefetched
boolean

Prefetch crawl results. Default: False.

Responses

Response samples

Content type
application/json
"string"

Crawl Results

List crawl results

List all crawl results associated with your team's crawl requests.

The response includes:

  • Extracted data
  • Timestamps
  • Success/failure status
  • Resource statistics
  • Associated attachments

Query Parameters:

  • page: Page number
  • page_size: Number of results per page(maximum 100)
  • url: Filter results by URL
  • created_at: Filter results by creation date (exact match)
  • prefetched: If you set this to True, you will get the result json instead of a download link. Default is False

Extra filters:

  • url__contains: Filter results by URL containing a specific string
  • url__startswith: Filter results by URL starting with a specific string
  • created_at__gt: Filter results by creation date (greater than or equal to)
  • created_at__lt: Filter results by creation date (less than or equal to)

Results are paginated and can be filtered by crawl request.

Authorizations:
ApiKeyAuth
path Parameters
crawl_request_uuid
required
string^[0-9a-fA-F-]{36}$
query Parameters
created_at
string <date-time>

Filter crawl results by date

page
integer

A page number within the paginated result set.

page_size
integer

Number of results to return per page.

prefetched
boolean

Prefetch crawl results. Default: False.

url
string

Filter crawl results by URL.

Responses

Response samples

Content type
application/json
{}

Get crawl result

Get detailed information about a specific crawl result.

Returns:

  • Complete extracted data
  • Crawling metadata
  • Performance metrics
  • Resource usage statistics
Authorizations:
ApiKeyAuth
path Parameters
crawl_request_uuid
required
string^[0-9a-fA-F-]{36}$
id
required
string

Responses

Response samples

Content type
application/json
{
  • "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
  • "result": "http://example.com",
  • "attachments": [
    ],
  • "created_at": "2019-08-24T14:15:22Z",
  • "updated_at": "2019-08-24T14:15:22Z"
}

Subscriptions

List team subscriptions

Returns a list of all active subscriptions for the team.

Authorizations:
ApiKeyAuth

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Get subscription details

Returns detailed information about the team's current subscription.

Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
{
  • "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
  • "plan": {
    },
  • "status": "string",
  • "remain_page_credit": -2147483648,
  • "remain_daily_page_credit": -2147483648,
  • "start_at": "2019-08-24T14:15:22Z",
  • "current_period_start_at": "2019-08-24T14:15:22Z",
  • "current_period_end_at": "2019-08-24T14:15:22Z",
  • "cancel_at": "2019-08-24T14:15:22Z",
  • "created_at": "2019-08-24T14:15:22Z",
  • "updated_at": "2019-08-24T14:15:22Z"
}

Get current subscription details

Returns detailed information about the team's current subscription.

Authorizations:
ApiKeyAuth

Responses

Response samples

Content type
application/json
{
  • "plan_name": "string",
  • "status": "string",
  • "plan_page_credit": 0,
  • "plan_daily_page_credit": 0,
  • "plan_number_users": 0,
  • "remain_number_users": 0,
  • "remaining_page_credit": 0,
  • "remaining_daily_page_credit": 0,
  • "max_depth": 0,
  • "max_concurrent_crawl": 0,
  • "start_at": "2019-08-24T14:15:22Z",
  • "current_period_start_at": "2019-08-24T14:15:22Z",
  • "current_period_end_at": "2019-08-24T14:15:22Z",
  • "cancel_at": "2019-08-24T14:15:22Z",
  • "is_default": true
}