Skip to content

feat(docs): add SEO and GEO optimization#2210

Open
michaelmagan wants to merge 7 commits intomainfrom
magan/docs-seo-geo
Open

feat(docs): add SEO and GEO optimization#2210
michaelmagan wants to merge 7 commits intomainfrom
magan/docs-seo-geo

Conversation

@michaelmagan
Copy link
Contributor

Summary

  • Add JSON-LD structured data (Organization, SoftwareApplication, WebSite, TechArticle schemas)
  • Enhance per-page metadata with Open Graph, Twitter Cards, and canonical URLs
  • Add AI bot rules to robots.txt (GPTBot, PerplexityBot, ClaudeBot, etc.)
  • Fix sitemap to exclude invalid paths and use proper timestamps

Why

GEO (Generative Engine Optimization) improves visibility in AI search engines like ChatGPT, Perplexity, and Claude.

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tambo-docs Ready Ready Preview, Comment Feb 6, 2026 0:43am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
cloud Skipped Skipped Feb 6, 2026 0:43am
showcase Skipped Skipped Feb 6, 2026 0:43am
@charliecreates charliecreates bot requested a review from CharlieHelps February 5, 2026 02:29
@github-actions github-actions bot added area: config Changes to repository configuration files area: documentation Improvements or additions to documentation status: in progress Work is currently being done contributor: tambo-team Created by a Tambo team member change: feat New feature labels Feb 5, 2026
Copy link
Contributor

@charliecreates charliecreates bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main concerns are sitemap correctness (excluded routes can be reintroduced via additionalPaths) and JSON-LD discoverability (using next/script with strategy="afterInteractive" can prevent some crawlers from seeing structured data). There are also SEO correctness risks around using a relative canonical where an absolute URL is already computed, and data integrity concerns from using new Date() as a dateModified fallback in JSON-LD. Addressing these will make the SEO/GEO changes more reliable and deterministic.

Additional notes (1)
  • Maintainability | docs/next-sitemap.config.js:112-143
    exclude + transform are now doing overlapping filtering, but additionalPaths still unconditionally emits all enumerated URLs. If enumerateRoutes() ever includes any of the excluded/special routes (e.g. /robots.txt, /llms.txt, /llms-full.txt), they can still be reintroduced via additionalPaths, defeating the exclusion/transform logic and potentially producing invalid sitemap entries.
Summary of changes

What changed

SEO / GEO metadata

  • Added global metadata in docs/src/app/layout.tsx including keywords, authors, creator/publisher, Open Graph/Twitter fields, and category.
  • Added per-doc-page metadata in docs/src/app/(docs)/[[...slug]]/page.tsx:
    • alternates.canonical
    • openGraph and twitter cards

JSON-LD structured data

  • Introduced docs/src/components/json-ld.tsx with:
    • global schemas: Organization, SoftwareApplication, WebSite
    • helpers for TechArticle, FAQPage, and HowTo
    • renderers: GlobalJsonLd and PageJsonLd
  • Wired JSON-LD into:
    • root layout via <GlobalJsonLd />
    • doc pages via <PageJsonLd schema={createDocPageSchema(...)} />

Robots + sitemap

  • Expanded docs/src/app/robots.ts to emit multiple bot-specific rules (GPTBot, PerplexityBot, ClaudeBot, etc.) and added host.
  • Updated docs/next-sitemap.config.js to:
    • exclude more non-page/duplicate paths
    • skip certain paths in transform
    • change lastmod fallback from epoch (new Date(0)) to new Date()
@charliecreates charliecreates bot removed the request for review from CharlieHelps February 5, 2026 02:33
@michaelmagan
Copy link
Contributor Author

@CharlieHelps fix all the code review comments.

@charliecreates
Copy link
Contributor

charliecreates bot commented Feb 5, 2026

Expand this to see my work.
  • Confirmed the new commit fix(docs): harden sitemap and doc metadata (70f6a0f0) on branch magan/docs-seo-geo is pushed and reflected on PR #2210, updating docs/next-sitemap.config.js and docs/src/app/(docs)/[[...slug]]/page.tsx.
  • Verified that docs/next-sitemap.config.js now has clearer sitemap handling and metadata logic aligned with prior review comments, while keeping isExcludedSitemapPath behavior intact and clarifying getLastModForFile’s undefined behavior via comments and conditional spreads like ...(lastmod != null ? { lastmod } : {}).
  • Re-checked docs/src/app/(docs)/[[...slug]]/page.tsx to confirm the runtime export and docLastModifiedCache semantics match the requested robustness and type-safety changes for doc metadata.
  • Introduced a JsonLdSchema type in docs/src/components/json-ld.tsx and wired it into JsonLd/JsonLdProps to narrow schema away from unknown while still supporting arrays of records, plus added JSDoc explaining JsonLd’s server-rendered JSON-LD purpose.
  • Updated createDocPageSchema in docs/src/components/json-ld.tsx to add an inline comment documenting why dateModified is omitted when unknown and kept the conditional spread for dateModified so schema.org freshness isn’t implied incorrectly.
  • Changed PageJsonLd in docs/src/components/json-ld.tsx so its schema prop is typed as JsonLdSchema instead of unknown, aligning it with the new JsonLdSchema helper type.
  • Re-ran npm run test -- --filter=@tambo-ai/docs and confirmed all tests pass, with only expected console.error output from existing @tambo-ai/react tests and no new failures introduced by the docs changes.
  • Re-ran npm run check-types -- --filter=@tambo-ai/docs and confirmed TypeScript still passes with the tightened JSON-LD types in docs/src/components/json-ld.tsx.
  • Re-ran npm run lint -- --filter=@tambo-ai/docs and confirmed lint succeeds with only existing warnings (e.g., nested ternaries in docs/src/components/ai-actions/llm-copy-button.tsx, no-img-element in docs/src/app/layout.config.tsx, and an unused collectDocPaths in docs/next-sitemap.config.js).
  • Validated that git status --porcelain is clean on magan/docs-seo-geo, npm run build:sdk (building @tambo-ai/react) completes successfully, and gh pr edit 2210 --add-reviewer CharlieHelps re-requests review with the latest commits like aa16e2b9 fix(docs): clarify sitemap and JSON-LD contracts and 70f6a0f0 fix(docs): harden sitemap and doc metadata visible in git log.
@charliecreates
Copy link
Contributor

charliecreates bot commented Feb 5, 2026

#2210 (comment)

Addressed the review feedback around sitemap determinism/exclusions, JSON-LD crawlability, canonical URL correctness, and “modified now” fallbacks. Updates are pushed to magan/docs-seo-geo.

Changes
  • Sitemap (docs/next-sitemap.config.js)
    • Stopped using new Date() as a lastmod fallback; lastmod is now omitted when it can’t be determined.
    • Ensured excluded paths can’t be reintroduced via additionalPaths / include (shared isExcludedSitemapPath predicate).
  • JSON-LD (docs/src/components/json-ld.tsx)
    • Switched from next/script with strategy="afterInteractive" to a plain inline <script type="application/ld+json"> so JSON-LD is present in initial HTML.
    • Made dateModified optional (omitted when unknown) to avoid advertising fake freshness.
  • Doc page metadata (docs/src/app/(docs)/[[...slug]]/page.tsx)
    • Canonical is now the absolute pageUrl (not the relative page.url).
    • dateModified is derived from the MDX file’s mtime (stable) and the route is explicitly runtime = "nodejs" since it uses node:fs.
  • Typecheck fix (unrelated but required for docs typecheck)
    • Updated react-sdk/src/v1/hooks/use-tambo-v1-thread-list.test.tsx to pass limit: "10" to match ThreadListParams.limit (string) from @tambo-ai/typescript-sdk.
Verification
# Build React SDK types (for downstream typechecking)
$ npm run build:sdk
# turbo: 1 task successful

# Lint (scoped)
$ npm run lint -- --filter=@tambo-ai/docs
# turbo: 4 tasks successful (warnings only)

# TypeScript (scoped)
$ npm run check-types -- --filter=@tambo-ai/docs
# turbo: 3 tasks successful

# Tests (scoped)
$ npm run test -- --filter=@tambo-ai/docs
# @tambo-ai/react: 65 test suites passed (1019 tests)
# @tambo-ai/ui-registry: 13 test suites passed (165 tests)

Self-review notes intentionally not addressed:

  • react-sdk/...use-tambo-v1-thread-list...: kept limit as a string to match the upstream SDK type (follow-up ergonomics change to accept number | string is out of scope for this PR).
Copy link
Contributor

@charliecreates charliecreates bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this addresses the prior SEO review points well: sitemap entries are now deterministic and JSON-LD is server-rendered. Remaining risks are (1) a potential filesystem path traversal via getDocPageLastModified(page.path) unless page.path is strictly sanitized by the content layer, and (2) exclusion logic drift where additionalPaths filtering doesn’t fully mirror glob-based exclusions. The React SDK test change also suggests the public API may have become less ergonomic by requiring string limit values.

Additional notes (3)
  • Compatibility | docs/src/app/(docs)/[[...slug]]/page.tsx:101-131
    openGraph.images[0].url and twitter.images are set to a root-relative path (/logo/lockup/Tambo-Lockup.png). While metadataBase is set globally, per-page metadata can be merged in ways that still lead to relative URLs being emitted depending on how Next resolves these fields. Using an absolute URL here is safer and avoids broken previews in clients that don’t resolve relative URLs consistently.

  • Maintainability | docs/next-sitemap.config.js:108-126
    exclude includes "/llms.mdx" (an exact path) even though MDX source files typically aren’t routable URLs. In contrast, you also exclude "/llms.mdx/*" and use path.startsWith("/llms.mdx/") in isExcludedSitemapPath, which suggests you’re treating it as a URL prefix.

This is minor, but it’s easy for sitemap configs to accumulate confusing/contradictory entries that later get “fixed” in multiple places. Consider tightening the contract: either (a) treat llms as a routable path (e.g. /llms) or (b) keep exclusions purely in URL-space and drop file-extension-ish entries unless you know they’re emitted by Next routes.

  • Maintainability | react-sdk/src/v1/hooks/use-tambo-v1-thread-list.test.tsx:81-81
    The test change forces limit to be a string ("10"). If the underlying API expects a number semantically, this can hide regressions (e.g., "10" vs 10 sorting/validation) and makes call sites less ergonomic. If the real contract is "string because query params", consider asserting the hook is responsible for coercion rather than requiring consumers to pass strings.
Summary of changes

Summary of changes

🗺️ Sitemap generation hardening (docs/next-sitemap.config.js)

  • Made getLastModForFile() return undefined when a stable timestamp can’t be determined (instead of new Date()), so lastmod can be omitted.
  • Centralized excluded sitemap paths into:
    • excludedSitemapExactPathList
    • excludedSitemapGlobPatterns
    • isExcludedSitemapPath() helper
  • Improved determinism/perf by precomputing enumeratedByUrl (Map) and using conditional spreads: ...(lastmod != null ? { lastmod } : {}).
  • Prevented excluded routes from being reintroduced via additionalPaths and include by filtering enumerated routes.

📄 Doc pages SEO + JSON-LD (docs/src/app/(docs)/[[...slug]]/page.tsx)

  • Added TechArticle JSON-LD per doc page via createDocPageSchema() + <PageJsonLd />.
  • Added per-page metadata: alternates.canonical, openGraph, and twitter fields.
  • Added a Node.js runtime and a per-process cache to compute dateModified from MDX file mtime.

🌐 Global metadata + structured data (docs/src/app/layout.tsx)

  • Expanded global metadata with keywords, authors, creator, publisher, Open Graph images/locale, Twitter fields, and category.
  • Injected global JSON-LD in the document <head> via <GlobalJsonLd />.

🤖 Robots rules (docs/src/app/robots.ts)

  • Switched from a single rule object to an array of bot-specific rules.
  • Added host and kept sitemap pointing to ${baseUrl}/sitemap.xml.

✅ React SDK test adjustment (react-sdk/src/v1/hooks/use-tambo-v1-thread-list.test.tsx)

  • Updated the test to pass limit as a string ("10") and assert the API receives the same.
Comment on lines 27 to 38
const getDocPageLastModified = (contentPath: string): string | undefined => {
const cached = docLastModifiedCache.get(contentPath);
if (cached) return cached;

try {
const filePath = join(process.cwd(), "content", "docs", contentPath);
const lastModified = statSync(filePath).mtime.toISOString();
docLastModifiedCache.set(contentPath, lastModified);
return lastModified;
} catch {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDocPageLastModified(page.path) relies on page.path being a stable, safe relative path under content/docs. If page.path ever contains .. segments or a leading slash (even accidentally via content tooling), join(process.cwd(), "content", "docs", contentPath) could escape the intended directory and stat arbitrary files on the server. Even if today source guarantees safety, this is a fragile security boundary and worth hardening at the call site since it touches the filesystem.

Suggestion

Harden the path resolution to ensure the computed file path stays within the docs content root.

For example:

import { resolve, relative, sep } from "node:path";

const docsRoot = resolve(process.cwd(), "content", "docs");

const getDocPageLastModified = (contentPath: string): string | undefined => {
  const cached = docLastModifiedCache.get(contentPath);
  if (cached) return cached;

  try {
    const candidate = resolve(docsRoot, contentPath);
    const rel = relative(docsRoot, candidate);
    if (rel.startsWith("..") || rel.startsWith(sep)) return undefined;

    const lastModified = statSync(candidate).mtime.toISOString();
    docLastModifiedCache.set(contentPath, lastModified);
    return lastModified;
  } catch {
    return undefined;
  }
};

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this hardening.

Comment on lines 19 to 39
const siteUrl = process.env.NEXT_PUBLIC_SITE_URL || "https://docs.tambo.co";

export const runtime = "nodejs";

// Best-effort cache (per Node.js process) to avoid re-statting the same MDX file.
// This is used only for structured data metadata and doesn't need to be real-time.
const docLastModifiedCache = new Map<string, string>();

const getDocPageLastModified = (contentPath: string): string | undefined => {
const cached = docLastModifiedCache.get(contentPath);
if (cached) return cached;

try {
const filePath = join(process.cwd(), "content", "docs", contentPath);
const lastModified = statSync(filePath).mtime.toISOString();
docLastModifiedCache.set(contentPath, lastModified);
return lastModified;
} catch {
return;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime = "nodejs" forces this route onto the Node runtime. That may be intentional for statSync, but it’s a meaningful platform constraint for a docs page: it prevents running on the Edge runtime and can affect deployment strategies.

Also, getDocPageLastModified(page.path) implicitly assumes page.path maps 1:1 to a file under content/docs/. If page.path ever changes format (or includes nested segments with unexpected values), you’ll silently omit dateModified—which is okay for correctness, but can make metadata flaky across environments without visibility.

Suggestion

If Node runtime is required, document it explicitly in a short comment near export const runtime = "nodejs" (e.g., “required for filesystem mtime used by JSON-LD”). If you’d prefer to avoid pinning runtime, consider sourcing dateModified from page.data/frontmatter (or the same route enumeration logic used for the sitemap) so the page can remain runtime-agnostic.

Reply with "@CharlieHelps yes please" if you’d like me to add a commit that derives dateModified from the existing sitemap enumeration (removing the need for statSync in the page).

Comment on lines 108 to 163
const excludedSitemapExactPathList = [
"/llms.txt",
"/llms-full.txt",
"/robots.txt",
"/llms.mdx",
];

const excludedSitemapGlobPatterns = ["/_next/*", "/api/*", "/llms.mdx/*"];

const excludedSitemapGlobs = [
...excludedSitemapGlobPatterns,
...excludedSitemapExactPathList,
];

const excludedSitemapExactPaths = new Set(excludedSitemapExactPathList);

const isExcludedSitemapPath = (path) =>
excludedSitemapExactPaths.has(path) || path.startsWith("/llms.mdx/");

// Enumerate routes once at config load for deterministic, build-time-only sitemap generation
const enumerated = enumerateRoutes();
const enumeratedByUrl = new Map(enumerated.map((e) => [e.url, e.lastmod]));

module.exports = {
siteUrl,
generateRobotsTxt: false,
exclude: ["/_next/*", "/api/*"],
// Exclude non-page routes and duplicates
exclude: excludedSitemapGlobs,
changefreq: "weekly",
priority: 0.8,
transform: async (config, path) => {
const found = enumerated.find((e) => e.url === path);
transform: async (_config, path) => {
// Skip paths that shouldn't be in sitemap
if (isExcludedSitemapPath(path)) {
return null;
}
const lastmod = enumeratedByUrl.get(path);
return {
loc: `${siteUrl}${path}`,
changefreq: "weekly",
priority: path === "/" ? 1.0 : 0.8,
lastmod: found?.lastmod ?? new Date(0).toISOString(),
...(lastmod != null ? { lastmod } : {}),
alternateRefs: [],
};
},
// explicitly include all enumerated paths for determinism
additionalPaths: async () =>
enumerated.map((e) => ({ loc: `${siteUrl}${e.url}`, lastmod: e.lastmod })),
enumerated
.filter((e) => !isExcludedSitemapPath(e.url))
.map((e) => ({
loc: `${siteUrl}${e.url}`,
...(e.lastmod != null ? { lastmod: e.lastmod } : {}),
})),
// include ensures index and known root routes are emitted
include: enumerated.map((e) => e.url),
include: enumerated
.filter((e) => !isExcludedSitemapPath(e.url))
.map((e) => e.url),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isExcludedSitemapPath helper only checks exact matches plus path.startsWith("/llms.mdx/"), but your exclude list also contains glob patterns like "/_next/*" and "/api/*". Those are likely handled by the sitemap tool’s exclude, but note that additionalPaths is now the main path emitter and it only filters via isExcludedSitemapPath. If enumerateRoutes() ever adds /_next/... or /api/... entries (by mistake or via future routing changes), they could still leak into the sitemap via additionalPaths because isExcludedSitemapPath doesn’t mirror the glob exclusions.

Suggestion

Make isExcludedSitemapPath cover all exclusions that matter for additionalPaths/include (at least the obvious prefixes), or reuse the same source of truth.

For example:

const isExcludedSitemapPath = (path) =>
  excludedSitemapExactPaths.has(path) ||
  path.startsWith("/llms.mdx/") ||
  path.startsWith("/_next/") ||
  path.startsWith("/api/");

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this alignment.

@charliecreates charliecreates bot removed the request for review from CharlieHelps February 5, 2026 03:25
@michaelmagan
Copy link
Contributor Author

@CharlieHelps can you fix these issues, but first rebase on main, then make sure all the tests, lint, and builds work before committing each time.

@charliecreates
Copy link
Contributor

charliecreates bot commented Feb 6, 2026

Expand this to see my work.
  • Fetched latest origin/main, confirmed PR #2210 was based on an out-of-date magan/docs-seo-geo branch (mergeable: CONFLICTING), and verified a clean working tree before rebasing.
  • Rebased magan/docs-seo-geo onto origin/main, resolved the conflict in docs/src/app/layout.tsx by merging the new SEO metadata fields with the existing config (including a derived logoUrl for absolute Open Graph/Twitter images), and completed the rebase cleanly.
  • Confirmed all conflict markers were removed from docs/src/app/layout.tsx and that the updated commit history cleanly covers docs/src/components/json-ld.tsx and related docs files.
  • Reviewed docs/src/app/(docs)/[[...slug]]/page.tsx to understand getDocPageLastModified, docLastModifiedCache, and how createDocPageSchema/PageJsonLd integrate with generateMetadata for Open Graph/Twitter tags and TechArticle structured data.
  • Analyzed docs/next-sitemap.config.js plus references to llms.mdx in docs/src/middleware.ts and docs/next.config.mjs to ensure sitemap behavior matches MDX routing and middleware exclusions.
  • Updated docs/src/app/(docs)/[[...slug]]/page.tsx so getDocPageLastModified uses a docsRoot with resolve/relative for safer path handling and generateMetadata now emits absolute social image URLs derived from siteUrl/logoUrl.
  • Adjusted docs/next-sitemap.config.js so isExcludedSitemapPath also skips /_next, /api, and /llms.mdx/* paths in line with the exclude globs and MDX rewrites, tightening sitemap output around the docs surface area.
  • Ran npm run lint -- --filter=@tambo-ai/docs --filter=@tambo-ai/react and confirmed lint passes for the affected packages with only warnings in react-sdk tests, docs/next-sitemap.config.js, docs/src/app/layout.config.tsx, docs/src/components/ai-actions/llm-copy-button.tsx, docs/src/components/header-bar.tsx, and docs/src/lib/thread-hooks.ts (no errors).
  • Completed npm ci, npm run check-types -- --filter=@tambo-ai/docs --filter=@tambo-ai/react, and npm run build -- --filter=@tambo-ai/docs --filter=@tambo-ai/react, all exiting 0 while successfully building @tambo-ai/docs, @tambo-ai/react, @tambo-ai/react-ui-base, and @tambo-ai/vite-config without TypeScript or build errors in the updated code.
  • Committed the doc/SEO and test tweaks as fix(docs,react-sdk): harden metadata and thread list limit, confirmed a clean working tree via git status, force-pushed magan/docs-seo-geo (now mergeable: MERGEABLE) to update PR #2210, and re-requested review from @CharlieHelps.
@vercel vercel bot temporarily deployed to Preview – cloud February 6, 2026 00:40 Inactive
@vercel vercel bot temporarily deployed to Preview – showcase February 6, 2026 00:40 Inactive
@charliecreates charliecreates bot requested a review from CharlieHelps February 6, 2026 00:40
@github-actions github-actions bot removed the area: react-sdk Changes to the React SDK label Feb 6, 2026
@charliecreates
Copy link
Contributor

charliecreates bot commented Feb 6, 2026

#2210 (comment)

Rebased magan/docs-seo-geo onto main (force-pushed) and then fixed the remaining review items around sitemap exclusion drift, JSON-LD dateModified path hardening, and absolute social preview URLs. Also corrected the React SDK thread list test to match the current SDK limit type.

Latest fixes are in 5736b529.

Changes
  • Doc page filesystem hardening: getDocPageLastModified() now ensures page.path can’t escape docs/content/docs (prevents path traversal). (docs/src/app/(docs)/[[...slug]]/page.tsx)
  • Preview URL correctness: doc-page Open Graph / Twitter image URLs are now absolute (same change also applied globally in the layout during the rebase conflict resolution). (docs/src/app/(docs)/[[...slug]]/page.tsx, docs/src/app/layout.tsx)
  • Sitemap exclusion alignment: isExcludedSitemapPath() now mirrors the exclude glob intent so additionalPaths can’t reintroduce /_next/* or /api/*. (docs/next-sitemap.config.js)
  • React SDK test: thread list limit is back to a number to match ThreadListParams.limit?: number. (react-sdk/src/v1/hooks/use-tambo-v1-thread-list.test.tsx)
Verification
$ npm ci

$ npm run lint -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# turbo: 4 tasks successful (warnings only)

$ npm run check-types -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# turbo: 7 tasks successful

$ npm run test -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# @tambo-ai/react: 66 test suites passed (1042 tests)
# @tambo-ai/react-ui-base: 7 test suites passed (118 tests)
# @tambo-ai/ui-registry: 8 test suites passed (113 tests)

$ npm run build -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# turbo: 4 tasks successful
Copy link
Contributor

@charliecreates charliecreates bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main remaining concerns are (1) JSON-LD script-breakout risk from embedding JSON.stringify output directly into dangerouslySetInnerHTML without escaping </script> sequences, and (2) potential runtime performance impact from statSync on requests plus an unbounded cache in the doc page. The sitemap exclusions are significantly improved, but the exclusion contract is still split between exclude globs and isExcludedSitemapPath, which is prone to drift. Addressing these will make the SEO/GEO changes safer and more robust under load.

Additional notes (2)
  • Performance | docs/src/app/(docs)/[[...slug]]/page.tsx:27-34
    The per-process docLastModifiedCache is unbounded. In long-lived Node processes (or if a lot of doc paths are crawled), this can grow without limit. Even if the docs set is finite, it’s safer to enforce a reasonable cap or cache only known params (e.g., from generateStaticParams).

  • Compatibility | docs/src/app/layout.tsx:84-84
    Injecting <GlobalJsonLd /> inside a manually-declared <head> works, but in the Next.js App Router this can be fragile because Next manages <head> merging. It’s easy to accidentally introduce duplicate head tags or unexpected ordering with other head contributors.

Since JSON-LD is static and global, the more robust pattern is to render it as part of the layout return without explicitly adding a <head> element (or to use the metadata / generateMetadata pipeline where possible).

Summary of changes

Summary of changes

🗺️ Sitemap generation hardening (docs/next-sitemap.config.js)

  • Made getLastModForFile() return undefined (omit lastmod) when a stable timestamp can’t be determined.
  • Added centralized exclusion configuration via:
    • excludedSitemapExactPathList
    • excludedSitemapGlobPatterns
    • isExcludedSitemapPath()
  • Improved determinism/performance by precomputing enumeratedByUrl: Map and using conditional spreads (...(lastmod != null ? { lastmod } : {})).
  • Ensured additionalPaths and include filter out excluded routes so they can’t be reintroduced.

📄 Per-doc JSON-LD + metadata (docs/src/app/(docs)/[[...slug]]/page.tsx)

  • Added per-page TechArticle JSON-LD via createDocPageSchema() and <PageJsonLd />.
  • Added per-page alternates.canonical, Open Graph, and Twitter card metadata using absolute URLs (pageUrl, logoUrl).
  • Implemented dateModified derivation using filesystem mtime with a per-process cache and path traversal hardening (resolve + relative).
  • Explicitly pinned route to export const runtime = "nodejs" to allow node:fs usage.

🌐 Global SEO metadata + global JSON-LD (docs/src/app/layout.tsx)

  • Expanded global metadata with keywords, authors, creator, publisher, category, and richer Open Graph/Twitter settings.
  • Added logoUrl and set global Open Graph/Twitter images to absolute URLs.
  • Injected <GlobalJsonLd /> into <head>.

🤖 Robots rules (docs/src/app/robots.ts)

  • Changed rules from a single rule to an array of bot-specific rules (GPTBot, ClaudeBot, PerplexityBot, etc.).
  • Added host and kept sitemap pointing to ${baseUrl}/sitemap.xml.

🧩 New JSON-LD component (docs/src/components/json-ld.tsx)

  • Added server-rendered inline <script type="application/ld+json"> renderer (JsonLd) and typed JsonLdSchema.
  • Added global schemas (Organization, SoftwareApplication, WebSite) plus helpers for TechArticle, FAQPage, and HowTo.
  • Made dateModified optional in TechArticle (omitted when unknown) to avoid fake freshness.
Comment on lines +14 to +48
import { statSync } from "node:fs";
import { relative, resolve, sep } from "node:path";
import { notFound } from "next/navigation";
import { Suspense } from "react";

const siteUrl = process.env.NEXT_PUBLIC_SITE_URL || "https://docs.tambo.co";
const logoUrl = new URL("/logo/lockup/Tambo-Lockup.png", siteUrl).toString();

const docsRoot = resolve(process.cwd(), "content", "docs");

// Requires Node.js runtime for filesystem `mtime` reads for JSON-LD `dateModified`.
export const runtime = "nodejs";

// Best-effort cache (per Node.js process) to avoid re-statting the same MDX file.
// This is used only for structured data metadata and doesn't need to be real-time.
const docLastModifiedCache = new Map<string, string>();

const getDocPageLastModified = (contentPath: string): string | undefined => {
const cached = docLastModifiedCache.get(contentPath);
if (cached) return cached;

try {
const candidatePath = resolve(docsRoot, contentPath);
const rel = relative(docsRoot, candidatePath);
if (rel.startsWith("..") || rel.startsWith(sep)) {
return;
}

const lastModified = statSync(candidatePath).mtime.toISOString();
docLastModifiedCache.set(contentPath, lastModified);
return lastModified;
} catch {
return;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDocPageLastModified() correctly hardens against path traversal using resolve + relative, but it still uses statSync, which blocks the Node.js event loop for every uncached page path. This route is a docs page that can be hit frequently; even with the per-process cache, cold starts and cache misses can stack up under load.

Since this value is only used for metadata/JSON-LD, it doesn’t need to be synchronous.

Suggestion

Switch to async fs.promises.stat (or stat from node:fs/promises) and make getDocPageLastModified async to avoid blocking:

import { stat } from "node:fs/promises";

const getDocPageLastModified = async (contentPath: string) => {
  const cached = docLastModifiedCache.get(contentPath);
  if (cached) return cached;

  try {
    const candidatePath = resolve(docsRoot, contentPath);
    const rel = relative(docsRoot, candidatePath);
    if (rel.startsWith("..") || rel.startsWith(sep)) return undefined;

    const lastModified = (await stat(candidatePath)).mtime.toISOString();
    docLastModifiedCache.set(contentPath, lastModified);
    return lastModified;
  } catch {
    return undefined;
  }
};

Then dateModified: await getDocPageLastModified(page.path). Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this refactor.

Comment on lines +22 to +48
const docsRoot = resolve(process.cwd(), "content", "docs");

// Requires Node.js runtime for filesystem `mtime` reads for JSON-LD `dateModified`.
export const runtime = "nodejs";

// Best-effort cache (per Node.js process) to avoid re-statting the same MDX file.
// This is used only for structured data metadata and doesn't need to be real-time.
const docLastModifiedCache = new Map<string, string>();

const getDocPageLastModified = (contentPath: string): string | undefined => {
const cached = docLastModifiedCache.get(contentPath);
if (cached) return cached;

try {
const candidatePath = resolve(docsRoot, contentPath);
const rel = relative(docsRoot, candidatePath);
if (rel.startsWith("..") || rel.startsWith(sep)) {
return;
}

const lastModified = statSync(candidatePath).mtime.toISOString();
docLastModifiedCache.set(contentPath, lastModified);
return lastModified;
} catch {
return;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDocPageLastModified() correctly uses resolve + relative to keep reads under content/docs, but the guard rel.startsWith(sep) is ineffective on most platforms because relative() typically returns a path without a leading separator. The important check is the .. traversal; the separator check doesn’t add real protection and could give a false sense of safety.

Also, you’re using the raw contentPath as the cache key. If the content layer ever varies path normalization (e.g., a/../b.mdx vs b.mdx), you could cache duplicate entries for the same file (minor, but avoidable).

Suggestion

Tighten the traversal guard and normalize the cache key to the resolved candidate path:

const getDocPageLastModified = (contentPath: string): string | undefined => {
  try {
    const candidatePath = resolve(docsRoot, contentPath);
    const rel = relative(docsRoot, candidatePath);
    if (rel.startsWith("..") || rel === "") return undefined; // keep or adjust as desired

    const cached = docLastModifiedCache.get(candidatePath);
    if (cached) return cached;

    const lastModified = statSync(candidatePath).mtime.toISOString();
    docLastModifiedCache.set(candidatePath, lastModified);
    return lastModified;
  } catch {
    return undefined;
  }
};

If you want a stricter guard, also reject absolute contentPath early via path.isAbsolute(contentPath). Reply with "@CharlieHelps yes please" if you’d like me to add a commit with these changes.

Comment on lines +18 to +26
export function JsonLd({ id, schema }: JsonLdProps) {
return (
<script
id={id}
type="application/ld+json"
dangerouslySetInnerHTML={{ __html: JSON.stringify(schema) }}
/>
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonLd renders JSON.stringify(schema) directly into dangerouslySetInnerHTML. While JSON-LD is intended to be JSON, it’s still possible for schema values (especially createFAQSchema answers or other future content-derived fields) to contain </script> sequences, which can prematurely terminate the script tag and enable HTML injection.

This is a known gotcha when embedding JSON in <script> tags.

Suggestion

Harden the JSON serialization for safe embedding by escaping the </script sequence (and optionally <!--):

const toJsonLd = (value: unknown) =>
  JSON.stringify(value).replace(/<\//g, "<\\/");

export function JsonLd({ id, schema }: JsonLdProps) {
  return (
    <script
      id={id}
      type="application/ld+json"
      dangerouslySetInnerHTML={{ __html: toJsonLd(schema) }}
    />
  );
}

This keeps JSON valid while preventing script-breakout. Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this hardening.

Comment on lines 5 to +58
export default function robots(): MetadataRoute.Robots {
return {
rules: {
userAgent: "*",
allow: "/",
disallow: ["/api/"],
},
rules: [
// Default rule for all bots
{
userAgent: "*",
allow: "/",
disallow: ["/api/"],
},
// Explicitly allow OpenAI's GPTBot (used for ChatGPT training and browsing)
{
userAgent: "GPTBot",
allow: "/",
},
// Explicitly allow ChatGPT's browsing feature
{
userAgent: "ChatGPT-User",
allow: "/",
},
// Explicitly allow Perplexity AI
{
userAgent: "PerplexityBot",
allow: "/",
},
// Explicitly allow Claude/Anthropic
{
userAgent: "ClaudeBot",
allow: "/",
},
{
userAgent: "anthropic-ai",
allow: "/",
},
// Explicitly allow Google's AI bot
{
userAgent: "Google-Extended",
allow: "/",
},
// Explicitly allow Bing/Microsoft Copilot
{
userAgent: "Bingbot",
allow: "/",
},
// Explicitly allow Meta AI
{
userAgent: "Meta-ExternalAgent",
allow: "/",
},
// Explicitly allow Cohere
{
userAgent: "cohere-ai",
allow: "/",
},
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In robots.ts, you’re explicitly allowing a set of AI crawlers, but you’re not applying the same disallow: ["/api/"] restriction to them that you apply to "*". Depending on the robots parser, the more specific user-agent blocks may override the generic one and therefore allow crawling of /api/ for those bots.

If the intent is to keep /api/ blocked for everyone, each explicit rule should include the same disallow list (or you should avoid adding bot-specific blocks unless you’re changing behavior).

Suggestion

Add disallow: ["/api/"] to each bot-specific rule (or remove bot-specific rules entirely if they aren’t intended to differ from the default).

Example:

const apiDisallow = ["/api/"];

{
  userAgent: "GPTBot",
  allow: "/",
  disallow: apiDisallow,
}

Reply with "@CharlieHelps yes please" if you’d like me to add a commit applying this consistently across the rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: config Changes to repository configuration files area: documentation Improvements or additions to documentation change: feat New feature contributor: ai AI-assisted contribution status: in progress Work is currently being done

2 participants