Crawl multiple URLs based on options

url

string <uri>

required

The base URL to start crawling from

limit

integer

optional

Maximum number of pages to crawl. Default limit is 10000.

Default:

10000

excludePaths

array[string]

optional

URL pathname regex patterns that exclude matching URLs from the crawl. For example, if you set "excludePaths": ["blog/.*"] for the base URL firecrawl.dev, any results matching that pattern will be excluded, such as https://www.scrapeless.com/blog/firecrawl-launch-week-1-recap.

includePaths

array[string]

optional

URL pathname regex patterns that include matching URLs in the crawl. Only the paths that match the specified patterns will be included in the response. For example, if you set "includePaths": ["blog/.*"] for the base URL firecrawl.dev, only results matching that pattern will be included, such as https://www.scrapeless.com/blog/firecrawl-launch-week-1-recap.

maxDepth

integer

optional

Maximum depth to crawl relative to the base URL. Basically, the max number of slashes the pathname of a scraped URL may contain.

Default:

maxDiscoveryDepth

integer

optional

Maximum depth to crawl based on discovery order. The root site and sitemapped pages has a discovery depth of 0. For example, if you set it to 1, and you set ignoreSitemap, you will only crawl the entered URL and all URLs that are linked on that page.

ignoreSitemap

boolean

optional

Ignore the website sitemap when crawling

Default:

false

ignoreQueryParameters

boolean

optional

Do not re-scrape the same path with different (or none) query parameters

Default:

false

deduplicateSimilarURLs

boolean

optional

Controls whether similar URLs should be deduplicated.

regexOnFullURL

boolean

optional

Controls whether the regular expression should be applied to the full URL.

allowBackwardLinks

boolean

optional

By default, the crawl skips sublinks that aren’t part of the URL hierarchy you specify. For example, crawling https://example.com/products/ wouldn’t capture pages under https://example.com/promotions/deal-567. To include such links, enable the allowBackwardLinks parameter.

Default:

false

allowExternalLinks

boolean

optional

Allows the crawler to follow links to external websites.

Default:

false

delay

number

optional

Delay in seconds between scrapes. This helps respect website rate limits.

scrapeOptions

object (ScrapeOptions)

optional

formats

array[string]

optional

Formats to include in the output.

Allowed values:

markdownhtmlrawHtmllinksscreenshotscreenshot@fullPagejson

Default:

markdown

onlyMainContent

boolean

optional

Only return the main content of the page excluding headers, navs, footers, etc.

Default:

true

includeTags

array[string]

optional

Tags to include in the output.

excludeTags

array[string]

optional

Tags to exclude from the output.

headers

object

optional

Headers to send with the request. Can be used to send cookies, user-agent, etc.

waitFor

integer

optional

Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.

Default:

timeout

integer

optional

Timeout in milliseconds for the request

Default:

30000

browserOptions

object

optional

sessionName

string

optional

Set a name for your session to facilitate searching and viewing in the historical session list.

sessionTTL

string

optional

Controls the session duration and automatically closes the browser instance after timeout. Measured in seconds (s), defaults to 180 seconds (3 minutes), customizable between 60 seconds (1 minute) and 900 seconds (recommended maximum 15 minutes, but longer times can be set). Once the specified TTL is reached, the session will expire and Scraping Browser will close the browser instance to free resources.

sessionRecording

string

optional

Whether to enable session recording. When enabled, the entire browser session execution process will be automatically recorded, and after the session is completed, it can be replayed and viewed in the historical session list details. Defaults to false.

proxyCountry

string

optional

Sets the target country/region for the proxy, sending requests via an IP address from that region. You can specify a country code (e.g., US for the United States, GB for the United Kingdom, ANY for any country). See country codes for all supported options.

proxyURL

string

optional

Used to set the browser’s proxy URL, for example: http://user:pass@ip:port. If this parameter is set, all other proxy_* parameters will be ignored.

💡Custom proxy functionality is currently only available to Enterprise and Enterprise Enhanced subscription users Upgrade Now

💡Enterprise-level custom users can contact us to use custom proxies.

fingerprint

string

optional

A browser fingerprint is a nearly unique “digital fingerprint” created using your browser and device configuration information, which can be used to track your online activity even without cookies. Fortunately, configuring fingerprints in Scraping Browser is optional. We offer deep customization of browser fingerprints, such as core parameters like browser user agent, time zone, language, and screen resolution, and support extending functionality through custom launch parameters. Suitable for multi-account management, data collection, and privacy protection scenarios, using scrapeless’s own Chromium browser completely avoids detection. By default, our Scraping Browser service generates a random fingerprint for each session. Reference

Crawl multiple URLs based on options

Request

Request samples

Responses