Scrapeless API
latest
  • latest
  • v1
Dashboard
Dashboard
latest
  • latest
  • v1
  1. Crawl
  • User
    • Get User Info
      GET
  • Scraping Browser
    • CDP API
    • Browser extensions
      • Upload extension
      • Upgrade extension
      • Delete extension
      • Extension info
      • Extension list
    • Browser profiles
      • Profile list
      • Profile info
      • Delete profile
      • Upgrade profile
      • Create profile
    • Connect
      WSS
    • Running sessions
      GET
    • Live URL
      GET
  • Scraping API
    • shopee
      • Actor List
      • Shopee Product
      • Shopee Search
      • Shopee Rcmd
    • br sites
      • Solucoes cnpjreva
      • Solucoes certidaointernet
      • Servicos receita
      • Consopt
    • amazon
      • API Parameters
      • product
      • seller
      • keywords
    • google search
      • API Parameters
      • Google Search:Advanced search parameters(tbs)
      • Google Search
      • Google Images
      • Google Local
      • Google Shopping
      • Google Videos
    • google trends
      • API Parameters
      • AutoComplete
      • Interest Over Time
      • Compared Breakdown By Region
      • Interest By Subregion
      • Related Queries
      • Related Topics
      • Trending Now
    • google flights
      • API Parameters
      • Round trip
      • One way
      • Multi-city
    • google flights chart
      • API Parameters
      • chart
    • google scholar
      • API Parameters(Google Scholar)
      • API Parameters(Google Scholar Author)
      • API Parameters(Google Scholar Cite)
      • API Parameters(Google Scholar Profiles)
      • Google Scholar
      • Google Scholar Author
      • Google Scholar Cite
      • Google Scholar Profiles
    • Scraper Request
      POST
    • Scraper GetResult
      GET
  • Universal Scraping API
    • JS Render Docs
    • JS Render
      POST
    • Web Unlocker
      POST
  • Crawler
    • Scrape
      • Scrape a single URL
      • Scrape multiple URLs
      • Cancel a batch scrape job
      • Get the status of a scrape
      • Get the status of a batch scrape job
      • Get the errors of a batch scrape job
    • Crawl
      • Crawl multiple URLs based on options
        POST
      • Cancel a crawl job
        DELETE
      • Get the status of a crawl job
        GET
      • Get the errors of a crawl job
        GET
  • Public
    • actor status
    • actor status
  1. Crawl

Crawl multiple URLs based on options

POST
/api/v2/crawler/crawl
Last modified:2025-07-25 01:11:05

Request

Authorization
Add parameter in header
x-api-token
Example:
x-api-token: ********************
Body Params application/json
url
string <uri>
required
The base URL to start crawling from
limit
integer 
optional
Maximum number of pages to crawl. Default limit is 10000.
Default:
10000
excludePaths
array[string]
optional
URL pathname regex patterns that exclude matching URLs from the crawl. For example, if you set "excludePaths": ["blog/.*"] for the base URL firecrawl.dev, any results matching that pattern will be excluded, such as https://www.scrapeless.com/blog/firecrawl-launch-week-1-recap.
includePaths
array[string]
optional
URL pathname regex patterns that include matching URLs in the crawl. Only the paths that match the specified patterns will be included in the response. For example, if you set "includePaths": ["blog/.*"] for the base URL firecrawl.dev, only results matching that pattern will be included, such as https://www.scrapeless.com/blog/firecrawl-launch-week-1-recap.
maxDepth
integer 
optional
Maximum depth to crawl relative to the base URL. Basically, the max number of slashes the pathname of a scraped URL may contain.
Default:
10
maxDiscoveryDepth
integer 
optional
Maximum depth to crawl based on discovery order. The root site and sitemapped pages has a discovery depth of 0. For example, if you set it to 1, and you set ignoreSitemap, you will only crawl the entered URL and all URLs that are linked on that page.
ignoreSitemap
boolean 
optional
Ignore the website sitemap when crawling
Default:
false
ignoreQueryParameters
boolean 
optional
Do not re-scrape the same path with different (or none) query parameters
Default:
false
deduplicateSimilarURLs
boolean 
optional
Controls whether similar URLs should be deduplicated.
regexOnFullURL
boolean 
optional
Controls whether the regular expression should be applied to the full URL.
allowBackwardLinks
boolean 
optional
By default, the crawl skips sublinks that aren’t part of the URL hierarchy you specify. For example, crawling https://example.com/products/ wouldn’t capture pages under https://example.com/promotions/deal-567. To include such links, enable the allowBackwardLinks parameter.
Default:
false
allowExternalLinks
boolean 
optional
Allows the crawler to follow links to external websites.
Default:
false
delay
number 
optional
Delay in seconds between scrapes. This helps respect website rate limits.
scrapeOptions
object (ScrapeOptions) 
optional
formats
array[string]
optional
Formats to include in the output.
Allowed values:
markdownhtmlrawHtmllinksscreenshotscreenshot@fullPagejson
Default:
markdown
onlyMainContent
boolean 
optional
Only return the main content of the page excluding headers, navs, footers, etc.
Default:
true
includeTags
array[string]
optional
Tags to include in the output.
excludeTags
array[string]
optional
Tags to exclude from the output.
headers
object 
optional
Headers to send with the request. Can be used to send cookies, user-agent, etc.
waitFor
integer 
optional
Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.
Default:
0
timeout
integer 
optional
Timeout in milliseconds for the request
Default:
30000
browserOptions
object 
optional
sessionName
string 
optional
Set a name for your session to facilitate searching and viewing in the historical session list.
sessionTTL
string 
optional
Controls the session duration and automatically closes the browser instance after timeout. Measured in seconds (s), defaults to 180 seconds (3 minutes), customizable between 60 seconds (1 minute) and 900 seconds (recommended maximum 15 minutes, but longer times can be set). Once the specified TTL is reached, the session will expire and Scraping Browser will close the browser instance to free resources.
sessionRecording
string 
optional
Whether to enable session recording. When enabled, the entire browser session execution process will be automatically recorded, and after the session is completed, it can be replayed and viewed in the historical session list details. Defaults to false.
proxyCountry
string 
optional
Sets the target country/region for the proxy, sending requests via an IP address from that region. You can specify a country code (e.g., US for the United States, GB for the United Kingdom, ANY for any country). See country codes for all supported options.
proxyURL
string 
optional
Used to set the browser’s proxy URL, for example: http://user:pass@ip:port. If this parameter is set, all other proxy_* parameters will be ignored.
💡Custom proxy functionality is currently only available to Enterprise and Enterprise Enhanced subscription users Upgrade Now
💡Enterprise-level custom users can contact us to use custom proxies.
fingerprint
string 
optional
A browser fingerprint is a nearly unique “digital fingerprint” created using your browser and device configuration information, which can be used to track your online activity even without cookies. Fortunately, configuring fingerprints in Scraping Browser is optional. We offer deep customization of browser fingerprints, such as core parameters like browser user agent, time zone, language, and screen resolution, and support extending functionality through custom launch parameters. Suitable for multi-account management, data collection, and privacy protection scenarios, using scrapeless’s own Chromium browser completely avoids detection. By default, our Scraping Browser service generates a random fingerprint for each session. Reference
Example
{
    "url": "http://example.com",
    "limit": 10000,
    "excludePaths": [
        "string"
    ],
    "includePaths": [
        "string"
    ],
    "maxDepth": 10,
    "maxDiscoveryDepth": 0,
    "ignoreSitemap": false,
    "ignoreQueryParameters": false,
    "deduplicateSimilarURLs": true,
    "regexOnFullURL": true,
    "allowBackwardLinks": false,
    "allowExternalLinks": false,
    "delay": 0,
    "scrapeOptions": {
        "formats": [
            "markdown"
        ],
        "onlyMainContent": true,
        "includeTags": [
            "string"
        ],
        "excludeTags": [
            "string"
        ],
        "headers": {},
        "waitFor": 0,
        "timeout": 30000
    },
    "browserOptions": {
        "sessionName": "string",
        "sessionTTL": "string",
        "sessionRecording": "string",
        "proxyCountry": "string",
        "proxyURL": "string",
        "fingerprint": "string"
    }
}

Request samples

Shell
JavaScript
Java
Swift
Go
PHP
Python
HTTP
C
C#
Objective-C
Ruby
OCaml
Dart
R
Request Request Example
Shell
JavaScript
Java
Swift
curl --location --request POST 'https://api.scrapeless.com/api/v2/crawler/crawl' \
--header 'Content-Type: application/json' \
--header 'x-api-token;' \
--data-raw '{
    "url": "http://example.com",
    "limit": 10000,
    "excludePaths": [
        "string"
    ],
    "includePaths": [
        "string"
    ],
    "maxDepth": 10,
    "maxDiscoveryDepth": 0,
    "ignoreSitemap": false,
    "ignoreQueryParameters": false,
    "deduplicateSimilarURLs": true,
    "regexOnFullURL": true,
    "allowBackwardLinks": false,
    "allowExternalLinks": false,
    "delay": 0,
    "scrapeOptions": {
        "formats": [
            "markdown"
        ],
        "onlyMainContent": true,
        "includeTags": [
            "string"
        ],
        "excludeTags": [
            "string"
        ],
        "headers": {},
        "waitFor": 0,
        "timeout": 30000
    },
    "browserOptions": {
        "sessionName": "string",
        "sessionTTL": "string",
        "sessionRecording": "string",
        "proxyCountry": "string",
        "proxyURL": "string",
        "fingerprint": "string"
    }
}'

Responses

🟢200OK
application/json
Successful response
Body
success
boolean 
optional
id
string 
optional
Example
{
    "success": true,
    "id": "string"
}
🟠402402
🟠429429
🔴500Server Error
Modified at 2025-07-25 01:11:05
Previous
Get the errors of a batch scrape job
Next
Cancel a crawl job
Built with