Web scraping & data pipelines
Reliable data from any source, on a schedule.
If a decision in your business depends on data that lives on someone else's website, I turn that into a reliable feed. Prices, stock, listings, public registers, competitor data: collected, cleaned, deduplicated, and delivered where you need it.
The hard part is rarely fetching a page. It is doing it reliably at scale, behind logins and anti-bot defences, and knowing the moment a source breaks so you never make decisions on stale or wrong numbers.
What you get
- A complete dataset instead of a hand-checked sample
- Fresh data on a schedule (daily, hourly, or on demand)
- Validation and dedup so the numbers can be trusted
- Alerts when a source changes or a run fails
Deliverables
- Custom scraper with login + anti-bot handling
- Normalisation into one clean schema
- Output to Google Sheets, Excel, a database, or an API
- Scheduling, monitoring, and error notifications
Common questions
- Can you scrape sites that require a login?
- Yes. Authenticated sessions, pagination, and anti-bot protection are the usual case, not the exception.
- How do you handle a site that changes its layout?
- Monitoring flags a broken run within a day, and maintenance is part of the Run phase, so the pipeline does not silently go stale.
- Is web scraping legal?
- I focus on publicly available data and respect each site's terms and rate limits. For anything sensitive, we scope it together up front.
Related work
This in the real world.

Online-pharmacy price monitoring (Gemini.pl)
Real-time tracking of 100k+ products, trend analysis, and change alerts. Purchasing costs cut 25% via historical analysis.
Python · Web Scraping · FastAPI
Case
Drug price & availability monitoring (PGF, Neuca24)
A system pulls prices and stock daily from Poland’s two largest pharmaceutical wholesalers, runs a comparison, and exports to Excel, supporting purchasing decisions.
Python · SQLite · FastAPI
CaseReal-estate & vehicle market data (OLX, Otodom, Otomoto)
Listing scrapers across Poland’s biggest property and vehicle marketplaces, deduplicated into one clean feed with price-history and new-listing alerts.
Python · Playwright · PostgreSQL
Case