Headless Browser Scraping on Sprites.dev with Lightpanda

If you've tried running Puppeteer or Playwright on Sprites.dev, you've probably hit a wall. Chrome crashes with cryptic permission errors. The zygote process fails. Shared libraries won't load. It's frustrating because Sprites would otherwise be perfect for web scraping—persistent environments, pay-per-use pricing, and checkpoint/restore capabilities.

After extensive testing, I found a solution: Lightpanda.

Why Chrome Fails on Sprites

Sprites.dev runs on Firecracker microVMs—the same technology powering AWS Lambda. These VMs are incredibly secure and lightweight, but that security comes with restrictions. Chrome expects certain kernel capabilities that Firecracker doesn't provide.

When you try to launch Chrome, you'll see errors like:

Check failed: . : Permission denied (13)
Failed to load GLES library: libGLESv2.so: cannot open shared object file
Exiting GPU process due to errors during initialization

Even with --no-sandbox, --disable-gpu, and --single-process flags, Chrome's zygote process (which handles sandboxing) crashes. I tried @sparticuz/chromium (the AWS Lambda Chromium package), chromedp/headless-shell, and various Docker-based solutions. Nothing worked.

Enter Lightpanda

Lightpanda is a headless browser built from scratch in Zig, specifically designed for automation and AI workloads. Unlike Chrome—which is a desktop browser with the display turned off—Lightpanda was purpose-built for headless operation.

The key differences:

  • No rendering engine: Lightpanda doesn't render pixels, so it doesn't need GPU libraries
  • Minimal dependencies: Single ~100MB binary with no system requirements
  • Chrome DevTools Protocol support: Works with Puppeteer and Playwright
  • 11x faster, 9x less memory: Built for efficiency, not compatibility

Most importantly: it works on Sprites.

Setting Up Lightpanda on Sprites

First, create a sprite and install Lightpanda:

sprite create scraper
sprite use scraper

sprite exec -- bash -c "
  curl -L -o /home/sprite/lightpanda \
    https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux
  chmod +x /home/sprite/lightpanda
"

Install puppeteer-core (not the full puppeteer—we don't need bundled Chrome):

sprite exec -- bash -c "cd /home/sprite && npm install puppeteer-core"

That's it. You now have a working headless browser on Sprites.

Basic Usage

Lightpanda runs as a CDP (Chrome DevTools Protocol) server. Start it, then connect with Puppeteer:

import puppeteer from "puppeteer-core";
import { spawn } from "child_process";

// Start Lightpanda CDP server
const server = spawn("/home/sprite/lightpanda", ["serve", "--host", "127.0.0.1", "--port", "9222"], {
  detached: true,
  stdio: "ignore",
});
server.unref();

// Wait for server to start
await new Promise((r) => setTimeout(r, 2000));

// Connect with Puppeteer
const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222/",
});

const page = await browser.newPage();
await page.goto("https://example.com");
const title = await page.title();
console.log(title);

await browser.disconnect();

A Complete Scraping Example

Here's a real-world example that scrapes article metadata from a website:

import puppeteer from "puppeteer-core";
import { spawn } from "child_process";
import { writeFileSync } from "fs";

const BASE_URL = "https://helgesver.re";

async function startLightpanda() {
  const proc = spawn("/home/sprite/lightpanda", ["serve", "--host", "127.0.0.1", "--port", "9222"], {
    detached: true,
    stdio: "ignore",
  });
  proc.unref();
  await new Promise((r) => setTimeout(r, 2000));
  return proc;
}

async function scrapeArticles() {
  await startLightpanda();

  const browser = await puppeteer.connect({
    browserWSEndpoint: "ws://127.0.0.1:9222/",
  });

  const page = await browser.newPage();
  await page.goto(BASE_URL, { waitUntil: "networkidle0" });

  // Extract article links from homepage
  const articleUrls = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('a[href*="/articles/"]'))
      .map((a) => a.href)
      .filter((url) => !url.endsWith("/articles/"));
  });

  await page.close();

  const articles = [];

  // Visit each article (use new page per URL to avoid frame issues)
  for (const url of [...new Set(articleUrls)]) {
    const articlePage = await browser.newPage();

    try {
      await articlePage.goto(url, { waitUntil: "networkidle0" });

      const data = await articlePage.evaluate(() => ({
        title: document.querySelector("h1")?.textContent?.trim(),
        date: document.querySelector("time")?.getAttribute("datetime"),
        description: document.querySelector('meta[name="description"]')?.getAttribute("content"),
        wordCount: document.querySelector("article")?.textContent?.split(/\s+/).length || 0,
      }));

      articles.push({ url, ...data });
    } finally {
      await articlePage.close();
    }
  }

  await browser.disconnect();
  return articles;
}

const results = await scrapeArticles();
writeFileSync("articles.json", JSON.stringify(results, null, 2));

Running on Sprites

Upload your script and run it:

sprite exec \
  -file "scraper.js:/home/sprite/scraper.js" \
  -- bash -c "cd /home/sprite && node scraper.js"

Output:

[
  {
    "url": "https://helgesver.re/articles/share-html-prototypes-sprites-dev",
    "title": "Share HTML Prototypes Instantly with Sprites.dev",
    "date": "2026-01-21",
    "wordCount": 841
  },
  ...
]

What Works and What Doesn't

Lightpanda implements the core CDP methods but not everything. Here's what I tested:

FeatureStatus
page.goto()Works
page.evaluate()Works
page.title()Works
page.content()Works
Form interactionWorks
JavaScript executionWorks
Cookie handlingWorks
page.screenshot()Does not work
page.pdf()Does not work

Screenshots and PDFs fail because Lightpanda doesn't have a rendering engine. If you need visual output, you'll need Chrome on a regular VM (not a Sprite).

Tips and Gotchas

Use a new page for each URL. Lightpanda is in beta, and I found that reusing the same page for multiple navigations sometimes causes frame detachment errors. Creating a fresh page for each URL avoids this.

Close pages when done. Memory is limited. Clean up after yourself.

The fetch command works too. For simple cases, Lightpanda has a built-in fetch mode:

sprite exec -- /home/sprite/lightpanda fetch --dump https://example.com

This outputs the full HTML to stdout—useful for quick tests or simple parsing jobs.

Create a checkpoint. Once you have Lightpanda and puppeteer-core installed, save a checkpoint:

sprite checkpoint create

This lets you restore to a known-good state instantly (~300ms).

Cost Comparison

Sprites charge only for active compute time. A typical scraping session:

ScenarioTimeApprox. Cost
Scrape 100 pages~5 min$0.02
Daily 1000-page job~45 min$0.15
Hourly monitoring24 × 2 min$0.10/day

Compare this to running a dedicated server or paying per-request API fees.

When to Use This

Sprites + Lightpanda is great for:

  • JavaScript-heavy sites: SPAs, React apps, anything that requires JS execution
  • Burst workloads: Pay nothing when idle, scale up when needed
  • Persistent sessions: Cookies and state survive between runs
  • AI agent sandboxing: Safe environment for automated browsing

It's not ideal for:

  • Screenshot generation: Use a regular VM with Chrome
  • PDF export: Same—need actual rendering
  • High-volume continuous scraping: Fixed-price options may be cheaper

Conclusion

Chrome doesn't work on Sprites, but Lightpanda does. It's faster, lighter, and purpose-built for exactly this use case. The setup takes five minutes, and you get a fully functional headless browser for web scraping on Fly.io's sandboxed VMs.

If you're building scrapers, automation tools, or AI agents that need to browse the web, give Lightpanda a try. It's open source, actively developed, and—most importantly—it actually works where Chrome can't.

Links: