Skip to main content

Scrape Single URL

A convenience function available in client libraries to quickly scrape a single URL. This function combines the create crawl request and monitoring functionality into a single synchronous operation.

note

This function is only available in the client libraries and is not a direct API endpoint.

Request Examples

from watercrawl import WaterCrawlAPIClient

# Initialize client
client = WaterCrawlAPIClient('your_api_key')

# Quick scrape a single URL with default options
result = client.scrape_url('https://example.com')

# Scrape with custom page options
result = client.scrape_url(
url="https://example.com",
page_options={
"exclude_tags": ["nav", "footer", "aside"],
"include_tags": ["article", "main"],
"wait_time": 100,
"include_html": False,
"only_main_content": True,
"include_links": False
}
)

# Print the extracted content
print(result['result']['markdown'])

Response Example

{
"uuid": "123e4567-e89b-12d3-a456-426614174000",
"url": "https://example.com/page",
"status": "success",
"result": {
"markdown": "Extracted content...",
"html": "<html>...</html>",
"metadata": {
"title": "Page Title",
"description": "Page Description",
"language": "en",
"word_count": 1500
},
"links": ["https://example.com/link1", "https://example.com/link2"]
},
"attachments": [
{
"uuid": "095be615-a8ad-4c33-8e9c-c7612fbf6c9f",
"attachment": "https://storage.watercrawl.dev/123e4567-e89b-12d3-a456-426614174000.pdf",
"attachment_type": "pdf",
"filename": "screenshot.pdf"
}
],
"created_at": "2024-01-01T00:00:00Z"
}

Function Parameters

ParameterTypeRequiredDescription
urlstringYesThe URL to scrape
page_optionsobjectNoOptions for page processing
wait_for_completionbooleanNoIf false, returns immediately with crawl request info instead of waiting for results (Node.js only)

Page Options

OptionTypeDefaultDescription
exclude_tagsstring[][]HTML tags to exclude from content extraction
include_tagsstring[][]HTML tags to include in content extraction
wait_timenumber0Time to wait after page load (ms)
include_htmlbooleanfalseInclude raw HTML in result
only_main_contentbooleantrueExtract only main content
include_linksbooleanfalseInclude list of links in result

How It Works

The scrape_url function is a convenience wrapper that:

  1. Creates a crawl request for a single URL
  2. Monitors the crawl progress
  3. Returns the final result when complete

Under the hood, it uses:

  • create_crawl_request with depth=0 and page_limit=1
  • monitor_crawl_request to wait for completion
  • Returns the first (and only) result

This provides a simpler interface when you only need to scrape a single URL and want to wait for the result.

tip

For scraping multiple pages or more complex crawling needs, use the Create Crawl Request endpoint directly.