Back
gh

D4Vinci/Scrapling: 🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl! - D4Vinci/Scrapling

by D4Vinci github.com 2,251 words
View original

Effortless Web Scraping for the Modern Web

D4Vinci%2FScrapling | Trendshift
العربيه | Español | Français | Deutsch | 简体中文 | 日本語 | Русский | 한국어
Tests PyPI version PyPI package downloads Static Badge OpenClaw Skill
DiscordX (formerly Twitter) Follow
Supported Python versions

Selection methods · Fetchers · Spiders · Proxy Rotation · CLI · MCP

Scrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.

Its parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause/resume and automatic proxy rotation — all in a few lines of Python. One library, zero compromises.

Blazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there’s something for everyone.

from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
StealthyFetcher.adaptive = True
p = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)  # Fetch website under the radar!
products = p.css('.product', auto_save=True)                                        # Scrape data that survives website design changes!
products = p.css('.product', adaptive=True)                                         # Later, if the website structure changes, pass \`adaptive=True\` to find them!

Or scale up to full crawls

from scrapling.spiders import Spider, Response

class MySpider(Spider):
  name = "demo"
  start_urls = ["https://example.com/"]

  async def parse(self, response: Response):
      for item in response.css('.product'):
          yield {"title": item.css('h2::text').get()}

MySpider().start()

At DataImpulse, we specialize in developing custom proxy services for your business. Make requests from anywhere, collect data, and enjoy fast connections with our premium proxies.

Platinum Sponsors

Scrapling handles Cloudflare Turnstile. For enterprise-grade protection, Hyper Solutions provides API endpoints that generate valid antibot tokens for Akamai, DataDome, Kasada, and Incapsula. Simple API calls, no browser automation required.
Hey, we built BirdProxies because proxies shouldn’t be complicated or overpriced. Fast residential and ISP proxies in 195+ locations, fair pricing, and real support. Try our FlappyBird game on the landing page for free data!
Evomi : residential proxies from $0.49/GB. Scraping browser with fully spoofed Chromium, residential IPs, auto CAPTCHA solving, and anti-bot bypass. Scraper API for hassle-free results. MCP and N8N integrations are available.
TikHub.io provides 900+ stable APIs across 16+ platforms including TikTok, X, YouTube & Instagram, with 40M+ datasets. Also offers DISCOUNTED AI models — Claude, GPT, GEMINI & more up to 71% off.
Nsocks provides fast Residential and ISP proxies for developers and scrapers. Global IP coverage, high anonymity, smart rotation, and reliable performance for automation and data extraction. Use Xcrawl to simplify large-scale web crawling.
Close your laptop. Your scrapers keep running. PetroSky VPS - cloud servers built for nonstop automation. Windows and Linux machines with full control. From €6.99/mo.
Read a full review of Scrapling on The Web Scraping Club (Nov 2025), the #1 newsletter dedicated to Web Scraping.
Proxy-Seller provides reliable proxy infrastructure for web scraping, offering IPv4, IPv6, ISP, Residential, and Mobile proxies with stable performance, broad geo coverage, and flexible plans for business-scale data collection.

Do you want to show your ad here? Click here

Sponsors

Do you want to show your ad here? Click here and choose the tier that suites you!


Key Features

Spiders — A Full Crawling Framework

Advanced Websites Fetching with Session Support

Adaptive Scraping & AI Integration

High-Performance & battle-tested Architecture

Developer/Web Scraper Friendly Experience

Getting Started

Let’s give you a quick glimpse of what Scrapling can do without deep diving.

Basic Usage

HTTP requests with session support

from scrapling.fetchers import Fetcher, FetcherSession

with FetcherSession(impersonate='chrome') as session:  # Use latest version of Chrome's TLS fingerprint
    page = session.get('https://quotes.toscrape.com/', stealthy_headers=True)
    quotes = page.css('.quote .text::text').getall()

# Or use one-off requests
page = Fetcher.get('https://quotes.toscrape.com/')
quotes = page.css('.quote .text::text').getall()

Advanced stealth mode

from scrapling.fetchers import StealthyFetcher, StealthySession

with StealthySession(headless=True, solve_cloudflare=True) as session:  # Keep the browser open until you finish
    page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)
    data = page.css('#padded_content a').getall()

# Or use one-off request style, it opens the browser for this request, then closes it after finishing
page = StealthyFetcher.fetch('https://nopecha.com/demo/cloudflare')
data = page.css('#padded_content a').getall()

Full browser automation

from scrapling.fetchers import DynamicFetcher, DynamicSession

with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:  # Keep the browser open until you finish
    page = session.fetch('https://quotes.toscrape.com/', load_dom=False)
    data = page.xpath('//span[@class="text"]/text()').getall()  # XPath selector if you prefer it

# Or use one-off request style, it opens the browser for this request, then closes it after finishing
page = DynamicFetcher.fetch('https://quotes.toscrape.com/')
data = page.css('.quote .text::text').getall()

Spiders

Build full crawlers with concurrent requests, multiple session types, and pause/resume:

from scrapling.spiders import Spider, Request, Response

class QuotesSpider(Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]
    concurrent_requests = 10
    
    async def parse(self, response: Response):
        for quote in response.css('.quote'):
            yield {
                "text": quote.css('.text::text').get(),
                "author": quote.css('.author::text').get(),
            }
            
        next_page = response.css('.next a')
        if next_page:
            yield response.follow(next_page[0].attrib['href'])

result = QuotesSpider().start()
print(f"Scraped {len(result.items)} quotes")
result.items.to_json("quotes.json")

Use multiple session types in a single spider:

from scrapling.spiders import Spider, Request, Response
from scrapling.fetchers import FetcherSession, AsyncStealthySession

class MultiSessionSpider(Spider):
    name = "multi"
    start_urls = ["https://example.com/"]
    
    def configure_sessions(self, manager):
        manager.add("fast", FetcherSession(impersonate="chrome"))
        manager.add("stealth", AsyncStealthySession(headless=True), lazy=True)
    
    async def parse(self, response: Response):
        for link in response.css('a::attr(href)').getall():
            # Route protected pages through the stealth session
            if "protected" in link:
                yield Request(link, sid="stealth")
            else:
                yield Request(link, sid="fast", callback=self.parse)  # explicit callback

Pause and resume long crawls with checkpoints by running the spider like this:

QuotesSpider(crawldir="./crawl_data").start()

Press Ctrl+C to pause gracefully — progress is saved automatically. Later, when you start the spider again, pass the same crawldir, and it will resume from where it stopped.

from scrapling.fetchers import Fetcher

# Rich element selection and navigation
page = Fetcher.get('https://quotes.toscrape.com/')

# Get quotes with multiple selection methods
quotes = page.css('.quote')  # CSS selector
quotes = page.xpath('//div[@class="quote"]')  # XPath
quotes = page.find_all('div', {'class': 'quote'})  # BeautifulSoup-style
# Same as
quotes = page.find_all('div', class_='quote')
quotes = page.find_all(['div'], class_='quote')
quotes = page.find_all(class_='quote')  # and so on...
# Find element by text content
quotes = page.find_by_text('quote', tag='div')

# Advanced navigation
quote_text = page.css('.quote')[0].css('.text::text').get()
quote_text = page.css('.quote').css('.text::text').getall()  # Chained selectors
first_quote = page.css('.quote')[0]
author = first_quote.next_sibling.css('.author::text')
parent_container = first_quote.parent

# Element relationships and similarity
similar_elements = first_quote.find_similar()
below_elements = first_quote.below_elements()

You can use the parser right away if you don’t want to fetch websites like below:

from scrapling.parser import Selector

page = Selector("<html>...</html>")

And it works precisely the same way!

Async Session Management Examples

import asyncio
from scrapling.fetchers import FetcherSession, AsyncStealthySession, AsyncDynamicSession

async with FetcherSession(http3=True) as session:  # \`FetcherSession\` is context-aware and can work in both sync/async patterns
    page1 = session.get('https://quotes.toscrape.com/')
    page2 = session.get('https://quotes.toscrape.com/', impersonate='firefox135')

# Async session usage
async with AsyncStealthySession(max_pages=2) as session:
    tasks = []
    urls = ['https://example.com/page1', 'https://example.com/page2']
    
    for url in urls:
        task = session.fetch(url)
        tasks.append(task)
    
    print(session.get_pool_stats())  # Optional - The status of the browser tabs pool (busy/free/error)
    results = await asyncio.gather(*tasks)
    print(session.get_pool_stats())

CLI & Interactive Shell

Scrapling includes a powerful command-line interface:

asciicast

Launch the interactive Web Scraping shell

scrapling shell

Extract pages to a file directly without programming (Extracts the content inside the body tag by default). If the output file ends with .txt, then the text content of the target will be extracted. If it ends in .md, it will be a Markdown representation of the HTML content; if it ends in .html, it will be the HTML content itself.

scrapling extract get 'https://example.com' content.md
scrapling extract get 'https://example.com' content.txt --css-selector '#fromSkipToProducts' --impersonate 'chrome'  # All elements matching the CSS selector '#fromSkipToProducts'
scrapling extract fetch 'https://example.com' content.md --css-selector '#fromSkipToProducts' --no-headless
scrapling extract stealthy-fetch 'https://nopecha.com/demo/cloudflare' captchas.html --css-selector '#padded_content a' --solve-cloudflare

[!note] Note There are many additional features, but we want to keep this page concise, including the MCP server and the interactive Web Scraping Shell. Check out the full documentation here

Performance Benchmarks

Scrapling isn’t just powerful—it’s also blazing fast. The following benchmarks compare Scrapling’s parser with the latest versions of other popular libraries.

Text Extraction Speed Test (5000 nested elements)

#LibraryTime (ms)vs Scrapling
1Scrapling2.021.0x
2Parsel/Scrapy2.041.01
3Raw Lxml2.541.257
4PyQuery24.17~12x
5Selectolax82.63~41x
6MechanicalSoup1549.71~767.1x
7BS4 with Lxml1584.31~784.3x
8BS4 with html5lib3391.91~1679.1x

Element Similarity & Text Search Performance

Scrapling’s adaptive element finding capabilities significantly outperform alternatives:

LibraryTime (ms)vs Scrapling
Scrapling2.391.0x
AutoScraper12.455.209x

All benchmarks represent averages of 100+ runs. See benchmarks.py for methodology.

Installation

Scrapling requires Python 3.10 or higher:

pip install scrapling

This installation only includes the parser engine and its dependencies, without any fetchers or commandline dependencies.

Optional Dependencies

  1. If you are going to use any of the extra features below, the fetchers, or their classes, you will need to install fetchers’ dependencies and their browser dependencies as follows:
    pip install "scrapling[fetchers]"
    scrapling install           # normal install
    scrapling install  --force  # force reinstall
    This downloads all browsers, along with their system dependencies and fingerprint manipulation dependencies. Or you can install them from the code instead of running a command like this:
    from scrapling.cli import install
    install([], standalone_mode=False)          # normal install
    install(["--force"], standalone_mode=False) # force reinstall
  2. Extra features:
    • Install the MCP server feature:
      pip install "scrapling[ai]"
      • Install shell features (Web Scraping shell and the extract command):
      pip install "scrapling[shell]"
      • Install everything:
      pip install "scrapling[all]"
    Remember that you need to install the browser dependencies with scrapling install after any of these extras (if you didn’t already)

Docker

You can also install a Docker image with all extras and browsers with the following command from DockerHub:

docker pull pyd4vinci/scrapling

Or download it from the GitHub registry:

docker pull ghcr.io/d4vinci/scrapling:latest

This image is automatically built and pushed using GitHub Actions and the repository’s main branch.

Contributing

We welcome contributions! Please read our contributing guidelines before getting started.

Disclaimer

[!caution] Caution This library is provided for educational and research purposes only. By using this library, you agree to comply with local and international data scraping and privacy laws. The authors and contributors are not responsible for any misuse of this software. Always respect the terms of service of websites and robots.txt files.

🎓 Citations

If you have used our library for research purposes please quote us with the following reference:

@misc{scrapling,
    author = {Karim Shoair},
    title = {Scrapling},
    year = {2024},
    url = {https://github.com/D4Vinci/Scrapling},
    note = {An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!}
  }