Skip to main content

WaterCrawl API (1.0.0)

Download OpenAPI specification:Download

Introduction

This documentation covers all the external APIs that can be accessed using a Team API Key. These APIs are designed for integration with external systems and services.

Authentication

All endpoints in this documentation require authentication using an API Key. To authenticate your requests:

  1. Generate an API Key from the API Keys Dashboard
  2. Include the API Key in your requests using the X-API-Key header:
curl -H "X-API-Key: your_api_key_here" https://api.watercrawl.dev/v1/...

Important Notes:

  • Keep your API Key secure and never share it publicly
  • You can generate multiple API Keys for different purposes
  • You can revoke API Keys at any time from the dashboard
  • Each API Key is associated with a specific team

Rate Limiting

API requests are rate-limited based on your team's plan. Please refer to your plan details for specific limits.

Crawl Requests

List crawl requests

Retrieve a list of all crawl requests for your team.

The response includes:

  • Request status (pending, running, completed, failed)
  • Creation and completion timestamps
  • Crawling configuration details
  • Progress statistics

Results are paginated and can be filtered by status.

Authorizations:
ApiKeyAuth
query Parameters
page
integer

A page number within the paginated result set.

Responses

Response samples

Content type
application/json
{
  • "count": 123,
  • "results": [
    ]
}

Start a new crawl request

Start a new web crawling task with specified configuration.

This endpoint allows you to:

  • Start a new crawl with custom settings
  • Configure crawling depth and scope
  • Set specific data extraction rules
  • Define crawling boundaries and limits

The crawl request will be queued and processed based on your team's plan limits.

Authorizations:
ApiKeyAuth
Request Body schema:
required
url
required
string <uri> <= 255 characters
required
object (CrawlOption)

Responses

Request samples

Content type
{
  • "options": {
    }
}

Response samples

Content type
application/json
{
  • "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
  • "status": "new",
  • "options": {
    },
  • "created_at": "2019-08-24T14:15:22Z",
  • "updated_at": "2019-08-24T14:15:22Z",
  • "duration": "string",
  • "number_of_documents": "string"
}

Get crawl request

Get detailed information about a specific crawl request.

Returns comprehensive information including:

  • Current status and progress
  • Configuration settings used
  • Error details (if any)
  • Resource usage statistics
  • Associated results and downloads
Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
{
  • "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
  • "status": "new",
  • "options": {
    },
  • "created_at": "2019-08-24T14:15:22Z",
  • "updated_at": "2019-08-24T14:15:22Z",
  • "duration": "string",
  • "number_of_documents": "string"
}

Cancel a running crawl

Cancel an active crawling task.

This will:

  • Stop the crawler immediately
  • Save any data collected so far
  • Free up crawling resources
  • Mark the request as cancelled

Note: Cancelled requests cannot be resumed.

Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
{
  • "message:": "Not found.",
  • "errors": null,
  • "code": 404
}

Download crawl result

Download the collected data from a completed crawl request.

Supports multiple formats:

  • JSON (structured data)
  • CSV (tabular data)
  • ZIP (compressed with attachments)

The response includes:

  • Extracted data points
  • Metadata and timestamps
  • Error logs (if any)
  • Downloaded resources (based on configuration)
Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
{
  • "property1": null,
  • "property2": null
}

Check crawl status

Real-time status monitoring using Server-Sent Events (SSE).

The endpoint streams updates every second with:

  • Current crawling status
  • Pages crawled/remaining
  • Data extracted
  • Error counts
  • Resource usage

Message Types:

  1. 'state': Contains crawl request status updates
  2. 'result': Contains new crawl results as they arrive

Connection remains open until:

  • Crawl completes
  • Error occurs
  • Client disconnects
Authorizations:
ApiKeyAuth
path Parameters
id
required
string

Responses

Response samples

Content type
application/json
"string"

Crawl Results

List crawl results

List all crawl results associated with your team's requests.

The response includes:

  • Extracted data
  • Timestamps
  • Success/failure status
  • Resource statistics
  • Associated attachments

Results are paginated and can be filtered by crawl request.

Authorizations:
ApiKeyAuth
path Parameters
crawl_request_uuid
required
string^[0-9a-fA-F-]{36}$
query Parameters
page
integer

A page number within the paginated result set.

Responses

Response samples

Content type
application/json
{}

Get crawl result

Get detailed information about a specific crawl result.

Returns:

  • Complete extracted data
  • Crawling metadata
  • Error details (if any)
  • Performance metrics
  • Resource usage statistics
Authorizations:
ApiKeyAuth
path Parameters
crawl_request_uuid
required
string^[0-9a-fA-F-]{36}$
id
required
string

Responses

Response samples

Content type
application/json
{
  • "uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
  • "result": "http://example.com",
  • "attachments": [
    ],
  • "created_at": "2019-08-24T14:15:22Z",
  • "updated_at": "2019-08-24T14:15:22Z"
}