[Bug]: Browser tool discards successfully-fetched page content when screenshot fails

Affected Component

Core Services (Frontend UI/Backend API), AI Agents (Researcher/Developer/...)

Describe the bug

Summary

In backend/pkg/tools/browser.go, the ContentMD, ContentHTML, and Links methods run two concurrent operations: fetching page content and capturing a screenshot. If the screenshot request fails for any reason (scraper service unavailable, rate-limited, timeout, returned image too small), the function returns an error even though the page content was successfully retrieved, discarding the content entirely.

The AI agent then receives:

browser tool 'markdown' handled with error: failed to fetch screenshot by url 'https://...': ...

…instead of the page content. The agent interprets this as the URL being unreachable and either retries, uses a fallback approach, or reports failure — all while the content was sitting in memory and then thrown away.

Root Cause

The screenshot is a non-critical side-effect — its only purpose is to write a PNG file to disk and log it via b.scp.PutScreenshot. The caller (wrapCommandResult) already ignores the return value of PutScreenshot with _, _ = b.scp.PutScreenshot(...). This means the screenshot result is already treated as optional at the call-site, yet the function that produces it can abort the entire operation.

Screenshot failures can occur due to:

The scraper service being temporarily unavailable or restarting
Network timeout (the callScraper client has a hard 65-second timeout for all requests)
The target page returning a very small or empty screenshot response (minImgContentSize = 2048 bytes — any minimal/styled-less page triggers this)
Disk write failure in writeScreenshotToFile
A URL that renders content but cannot be screenshotted (certain headers, login pages, etc.)

All of these are common in normal operation, especially during active penetration testing where the scraper service may be under load or targets may serve unusual responses.

Impact

AI agent browsing fails silently: The agent is told the browser tool encountered an error with no indication that the content was actually available. The agent may waste several tool calls retrying or trying alternative approaches.
Reliable tools appear broken: The browser tool is critical for agents researching vulnerabilities, reading documentation (e.g., HackTricks, CVE advisories), and performing OSINT. Having it fail due to a screenshot side-effect makes the system appear far less reliable than it is.
Affects all three browser actions: markdown, html, and links are all broken by this.
Silent data loss: Successfully-fetched content is garbage-collected silently, with no log entry indicating this happened.

Steps to Reproduce

Start PentAGI with the scraper service down or misconfigured (e.g., SCRAPER_PUBLIC_URL pointing to an unreachable address).
Trigger a pentest or research task that causes the pentester/searcher agent to call the browser tool on any public URL.
Observe that the agent receives an error response rather than page content, even though the URL itself is perfectly accessible via the scraper's /markdown endpoint.

Alternatively, any URL that responds to content requests but returns a screenshot smaller than 2 KB (e.g., a short API response page) will trigger this in a fully-functional setup.

Expected Behavior

The browser tool should return the successfully-fetched content regardless of whether the screenshot succeeds. Screenshot failure should be logged as a warning and the operation should continue with an empty screenshotName.

Suggested Fix

func (b *browser) ContentMD(url string) (string, string, error) {
    var (
        wg                        sync.WaitGroup
        content, screenshotName   string
        errContent, errScreenshot error
    )
    wg.Add(2)

    go func() {
        defer wg.Done()
        content, errContent = b.getMD(url)
    }()

    go func() {
        defer wg.Done()
        screenshotName, errScreenshot = b.getScreenshot(url)
    }()

    wg.Wait()

    if errContent != nil {
        return "", "", errContent
    }

    // Screenshot is non-critical: log the failure but do not discard valid content
    if errScreenshot != nil {
        logrus.WithError(errScreenshot).Warnf("failed to capture screenshot for %s, continuing without it", url)
    }

    return content, screenshotName, nil
}

The same fix should be applied to ContentHTML and Links.

System Configuration

Scraper service (Docker — `docker-compose.yml`)

Setting	Default
Docker image	`vxcontrol/scraper:latest`
Container port	`443/tcp`
Host binding	`SCRAPER_LISTEN_IP` (default `127.0.0.1`) `:` `SCRAPER_LISTEN_PORT` (default `9443`)
`MAX_CONCURRENT_SESSIONS`	`LOCAL_SCRAPER_MAX_CONCURRENT_SESSIONS` (default `10`)
`USERNAME`	`LOCAL_SCRAPER_USERNAME` (default `someuser`)
`PASSWORD`	`LOCAL_SCRAPER_PASSWORD` (default `somepass`)
Shared memory	`shm_size: 2g`

Backend environment variables (`config.go`)

Variable	Description
`SCRAPER_PUBLIC_URL`	URL used for internet/public targets (fed as `scPubURL`)
`SCRAPER_PRIVATE_URL`	URL used for local-zone targets (`.htb`, `.local`, `.lan`, etc.) (fed as `scPrvURL`)

URL selection logic in resolveUrl(): private URL is preferred when the hostname matches any local zone suffix; public URL is used otherwise. Either can fall back to the other if one is unset.

Browser tool constants (`browser.go`)

Constant	Value	Purpose
`minMdContentSize`	`50` bytes	Minimum acceptable Markdown response
`minHtmlContentSize`	`300` bytes	Minimum acceptable HTML response
`minImgContentSize`	`2048` bytes	Minimum acceptable screenshot PNG size
HTTP client timeout	`65` seconds	Applies to every `callScraper` request (both content fetch and screenshot)

The minImgContentSize = 2048 threshold is a practical trigger for this bug: any URL that returns a short or CSS-less page (API endpoints, minimal status pages, login redirects) will produce a screenshot smaller than 2 KB and cause getScreenshot to return an error, discarding valid content even when content fetch succeeded.

Logs and Artifacts

No response

Screenshots or Recordings

No response

Verification

I have checked that this issue hasn't been already reported
I have provided all relevant configuration files (with sensitive data removed)
I have included relevant logs and error messages
I am running the latest version of PentAGI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Browser tool discards successfully-fetched page content when screenshot fails #149

Affected Component

Describe the bug

Summary

Root Cause

Impact

Steps to Reproduce

Steps to Reproduce

Expected Behavior

Suggested Fix

System Configuration

System Configuration

Scraper service (Docker — `docker-compose.yml`)

Backend environment variables (`config.go`)

Browser tool constants (`browser.go`)

Logs and Artifacts

Screenshots or Recordings

Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Browser tool discards successfully-fetched page content when screenshot fails #149

Description

Affected Component

Describe the bug

Summary

Root Cause

Impact

Steps to Reproduce

Steps to Reproduce

Expected Behavior

Suggested Fix

System Configuration

System Configuration

Scraper service (Docker — docker-compose.yml)

Backend environment variables (config.go)

Browser tool constants (browser.go)

Logs and Artifacts

Screenshots or Recordings

Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Scraper service (Docker — `docker-compose.yml`)

Backend environment variables (`config.go`)

Browser tool constants (`browser.go`)