Headless Browser Scraping on Sprites.dev with Lightpanda
If you've tried running Puppeteer or Playwright on Sprites.dev, you've probably hit a wall. Chrome crashes with cryptic permission errors. The zygote process fails. Shared libraries won't load. It's frustrating because Sprites would otherwise be perfect for web scraping—persistent environments, pay-per-use pricing, and checkpoint/restore capabilities.
After extensive testing, I found a solution: Lightpanda.
Why Chrome Fails on Sprites
Sprites.dev runs on Firecracker microVMs—the same technology powering AWS Lambda. These VMs are incredibly secure and lightweight, but that security comes with restrictions. Chrome expects certain kernel capabilities that Firecracker doesn't provide.
When you try to launch Chrome, you'll see errors like:
Check failed: . : Permission denied (13)
Failed to load GLES library: libGLESv2.so: cannot open shared object file
Exiting GPU process due to errors during initialization
Even with --no-sandbox, --disable-gpu, and --single-process flags, Chrome's zygote process (which handles
sandboxing) crashes. I tried @sparticuz/chromium (the AWS Lambda Chromium package), chromedp/headless-shell, and
various Docker-based solutions. Nothing worked.
Enter Lightpanda
Lightpanda is a headless browser built from scratch in Zig, specifically designed for automation and AI workloads. Unlike Chrome—which is a desktop browser with the display turned off—Lightpanda was purpose-built for headless operation.
The key differences:
- No rendering engine: Lightpanda doesn't render pixels, so it doesn't need GPU libraries
- Minimal dependencies: Single ~100MB binary with no system requirements
- Chrome DevTools Protocol support: Works with Puppeteer and Playwright
- 11x faster, 9x less memory: Built for efficiency, not compatibility
Most importantly: it works on Sprites.
Setting Up Lightpanda on Sprites
First, create a sprite and install Lightpanda:
sprite create scraper
sprite use scraper
sprite exec -- bash -c "
curl -L -o /home/sprite/lightpanda \
https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux
chmod +x /home/sprite/lightpanda
"
Install puppeteer-core (not the full puppeteer—we don't need bundled Chrome):
sprite exec -- bash -c "cd /home/sprite && npm install puppeteer-core"
That's it. You now have a working headless browser on Sprites.
Basic Usage
Lightpanda runs as a CDP (Chrome DevTools Protocol) server. Start it, then connect with Puppeteer:
import puppeteer from "puppeteer-core";
import { spawn } from "child_process";
// Start Lightpanda CDP server
const server = spawn("/home/sprite/lightpanda", ["serve", "--host", "127.0.0.1", "--port", "9222"], {
detached: true,
stdio: "ignore",
});
server.unref();
// Wait for server to start
await new Promise((r) => setTimeout(r, 2000));
// Connect with Puppeteer
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222/",
});
const page = await browser.newPage();
await page.goto("https://example.com");
const title = await page.title();
console.log(title);
await browser.disconnect();
A Complete Scraping Example
Here's a real-world example that scrapes article metadata from a website:
import puppeteer from "puppeteer-core";
import { spawn } from "child_process";
import { writeFileSync } from "fs";
const BASE_URL = "https://helgesver.re";
async function startLightpanda() {
const proc = spawn("/home/sprite/lightpanda", ["serve", "--host", "127.0.0.1", "--port", "9222"], {
detached: true,
stdio: "ignore",
});
proc.unref();
await new Promise((r) => setTimeout(r, 2000));
return proc;
}
async function scrapeArticles() {
await startLightpanda();
const browser = await puppeteer.connect({
browserWSEndpoint: "ws://127.0.0.1:9222/",
});
const page = await browser.newPage();
await page.goto(BASE_URL, { waitUntil: "networkidle0" });
// Extract article links from homepage
const articleUrls = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a[href*="/articles/"]'))
.map((a) => a.href)
.filter((url) => !url.endsWith("/articles/"));
});
await page.close();
const articles = [];
// Visit each article (use new page per URL to avoid frame issues)
for (const url of [...new Set(articleUrls)]) {
const articlePage = await browser.newPage();
try {
await articlePage.goto(url, { waitUntil: "networkidle0" });
const data = await articlePage.evaluate(() => ({
title: document.querySelector("h1")?.textContent?.trim(),
date: document.querySelector("time")?.getAttribute("datetime"),
description: document.querySelector('meta[name="description"]')?.getAttribute("content"),
wordCount: document.querySelector("article")?.textContent?.split(/\s+/).length || 0,
}));
articles.push({ url, ...data });
} finally {
await articlePage.close();
}
}
await browser.disconnect();
return articles;
}
const results = await scrapeArticles();
writeFileSync("articles.json", JSON.stringify(results, null, 2));
Running on Sprites
Upload your script and run it:
sprite exec \
-file "scraper.js:/home/sprite/scraper.js" \
-- bash -c "cd /home/sprite && node scraper.js"
Output:
[
{
"url": "https://helgesver.re/articles/share-html-prototypes-sprites-dev",
"title": "Share HTML Prototypes Instantly with Sprites.dev",
"date": "2026-01-21",
"wordCount": 841
},
...
]
What Works and What Doesn't
Lightpanda implements the core CDP methods but not everything. Here's what I tested:
| Feature | Status |
|---|---|
page.goto() | Works |
page.evaluate() | Works |
page.title() | Works |
page.content() | Works |
| Form interaction | Works |
| JavaScript execution | Works |
| Cookie handling | Works |
page.screenshot() | Does not work |
page.pdf() | Does not work |
Screenshots and PDFs fail because Lightpanda doesn't have a rendering engine. If you need visual output, you'll need Chrome on a regular VM (not a Sprite).
Tips and Gotchas
Use a new page for each URL. Lightpanda is in beta, and I found that reusing the same page for multiple navigations sometimes causes frame detachment errors. Creating a fresh page for each URL avoids this.
Close pages when done. Memory is limited. Clean up after yourself.
The fetch command works too. For simple cases, Lightpanda has a built-in fetch mode:
sprite exec -- /home/sprite/lightpanda fetch --dump https://example.com
This outputs the full HTML to stdout—useful for quick tests or simple parsing jobs.
Create a checkpoint. Once you have Lightpanda and puppeteer-core installed, save a checkpoint:
sprite checkpoint create
This lets you restore to a known-good state instantly (~300ms).
Cost Comparison
Sprites charge only for active compute time. A typical scraping session:
| Scenario | Time | Approx. Cost |
|---|---|---|
| Scrape 100 pages | ~5 min | $0.02 |
| Daily 1000-page job | ~45 min | $0.15 |
| Hourly monitoring | 24 × 2 min | $0.10/day |
Compare this to running a dedicated server or paying per-request API fees.
When to Use This
Sprites + Lightpanda is great for:
- JavaScript-heavy sites: SPAs, React apps, anything that requires JS execution
- Burst workloads: Pay nothing when idle, scale up when needed
- Persistent sessions: Cookies and state survive between runs
- AI agent sandboxing: Safe environment for automated browsing
It's not ideal for:
- Screenshot generation: Use a regular VM with Chrome
- PDF export: Same—need actual rendering
- High-volume continuous scraping: Fixed-price options may be cheaper
Conclusion
Chrome doesn't work on Sprites, but Lightpanda does. It's faster, lighter, and purpose-built for exactly this use case. The setup takes five minutes, and you get a fully functional headless browser for web scraping on Fly.io's sandboxed VMs.
If you're building scrapers, automation tools, or AI agents that need to browse the web, give Lightpanda a try. It's open source, actively developed, and—most importantly—it actually works where Chrome can't.
Links:
