Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Main concerns are sitemap correctness (excluded routes can be reintroduced via additionalPaths) and JSON-LD discoverability (using next/script with strategy="afterInteractive" can prevent some crawlers from seeing structured data). There are also SEO correctness risks around using a relative canonical where an absolute URL is already computed, and data integrity concerns from using new Date() as a dateModified fallback in JSON-LD. Addressing these will make the SEO/GEO changes more reliable and deterministic.
Additional notes (1)
- Maintainability |
docs/next-sitemap.config.js:112-143
exclude+transformare now doing overlapping filtering, butadditionalPathsstill unconditionally emits allenumeratedURLs. IfenumerateRoutes()ever includes any of the excluded/special routes (e.g./robots.txt,/llms.txt,/llms-full.txt), they can still be reintroduced viaadditionalPaths, defeating the exclusion/transform logic and potentially producing invalid sitemap entries.
Summary of changes
What changed
SEO / GEO metadata
- Added global metadata in
docs/src/app/layout.tsxincludingkeywords,authors,creator/publisher, Open Graph/Twitter fields, andcategory. - Added per-doc-page metadata in
docs/src/app/(docs)/[[...slug]]/page.tsx:alternates.canonicalopenGraphandtwittercards
JSON-LD structured data
- Introduced
docs/src/components/json-ld.tsxwith:- global schemas:
Organization,SoftwareApplication,WebSite - helpers for
TechArticle,FAQPage, andHowTo - renderers:
GlobalJsonLdandPageJsonLd
- global schemas:
- Wired JSON-LD into:
- root layout via
<GlobalJsonLd /> - doc pages via
<PageJsonLd schema={createDocPageSchema(...)} />
- root layout via
Robots + sitemap
- Expanded
docs/src/app/robots.tsto emit multiple bot-specificrules(GPTBot, PerplexityBot, ClaudeBot, etc.) and addedhost. - Updated
docs/next-sitemap.config.jsto:- exclude more non-page/duplicate paths
- skip certain paths in
transform - change
lastmodfallback from epoch (new Date(0)) tonew Date()
|
@CharlieHelps fix all the code review comments. |
Expand this to see my work.
|
|
Addressed the review feedback around sitemap determinism/exclusions, JSON-LD crawlability, canonical URL correctness, and “modified now” fallbacks. Updates are pushed to Changes
Verification# Build React SDK types (for downstream typechecking)
$ npm run build:sdk
# turbo: 1 task successful
# Lint (scoped)
$ npm run lint -- --filter=@tambo-ai/docs
# turbo: 4 tasks successful (warnings only)
# TypeScript (scoped)
$ npm run check-types -- --filter=@tambo-ai/docs
# turbo: 3 tasks successful
# Tests (scoped)
$ npm run test -- --filter=@tambo-ai/docs
# @tambo-ai/react: 65 test suites passed (1019 tests)
# @tambo-ai/ui-registry: 13 test suites passed (165 tests)Self-review notes intentionally not addressed:
|
There was a problem hiding this comment.
Overall this addresses the prior SEO review points well: sitemap entries are now deterministic and JSON-LD is server-rendered. Remaining risks are (1) a potential filesystem path traversal via getDocPageLastModified(page.path) unless page.path is strictly sanitized by the content layer, and (2) exclusion logic drift where additionalPaths filtering doesn’t fully mirror glob-based exclusions. The React SDK test change also suggests the public API may have become less ergonomic by requiring string limit values.
Additional notes (3)
-
Compatibility |
docs/src/app/(docs)/[[...slug]]/page.tsx:101-131
openGraph.images[0].urlandtwitter.imagesare set to a root-relative path (/logo/lockup/Tambo-Lockup.png). WhilemetadataBaseis set globally, per-page metadata can be merged in ways that still lead to relative URLs being emitted depending on how Next resolves these fields. Using an absolute URL here is safer and avoids broken previews in clients that don’t resolve relative URLs consistently. -
Maintainability |
docs/next-sitemap.config.js:108-126
excludeincludes"/llms.mdx"(an exact path) even though MDX source files typically aren’t routable URLs. In contrast, you also exclude"/llms.mdx/*"and usepath.startsWith("/llms.mdx/")inisExcludedSitemapPath, which suggests you’re treating it as a URL prefix.
This is minor, but it’s easy for sitemap configs to accumulate confusing/contradictory entries that later get “fixed” in multiple places. Consider tightening the contract: either (a) treat llms as a routable path (e.g. /llms) or (b) keep exclusions purely in URL-space and drop file-extension-ish entries unless you know they’re emitted by Next routes.
- Maintainability |
react-sdk/src/v1/hooks/use-tambo-v1-thread-list.test.tsx:81-81
The test change forceslimitto be a string ("10"). If the underlying API expects a number semantically, this can hide regressions (e.g.,"10"vs10sorting/validation) and makes call sites less ergonomic. If the real contract is "string because query params", consider asserting the hook is responsible for coercion rather than requiring consumers to pass strings.
Summary of changes
Summary of changes
🗺️ Sitemap generation hardening (docs/next-sitemap.config.js)
- Made
getLastModForFile()returnundefinedwhen a stable timestamp can’t be determined (instead ofnew Date()), solastmodcan be omitted. - Centralized excluded sitemap paths into:
excludedSitemapExactPathListexcludedSitemapGlobPatternsisExcludedSitemapPath()helper
- Improved determinism/perf by precomputing
enumeratedByUrl(Map) and using conditional spreads:...(lastmod != null ? { lastmod } : {}). - Prevented excluded routes from being reintroduced via
additionalPathsandincludeby filtering enumerated routes.
📄 Doc pages SEO + JSON-LD (docs/src/app/(docs)/[[...slug]]/page.tsx)
- Added
TechArticleJSON-LD per doc page viacreateDocPageSchema()+<PageJsonLd />. - Added per-page metadata:
alternates.canonical,openGraph, andtwitterfields. - Added a Node.js runtime and a per-process cache to compute
dateModifiedfrom MDX filemtime.
🌐 Global metadata + structured data (docs/src/app/layout.tsx)
- Expanded global
metadatawithkeywords,authors,creator,publisher, Open Graph images/locale, Twitter fields, andcategory. - Injected global JSON-LD in the document
<head>via<GlobalJsonLd />.
🤖 Robots rules (docs/src/app/robots.ts)
- Switched from a single rule object to an array of bot-specific
rules. - Added
hostand keptsitemappointing to${baseUrl}/sitemap.xml.
✅ React SDK test adjustment (react-sdk/src/v1/hooks/use-tambo-v1-thread-list.test.tsx)
- Updated the test to pass
limitas a string ("10") and assert the API receives the same.
| const getDocPageLastModified = (contentPath: string): string | undefined => { | ||
| const cached = docLastModifiedCache.get(contentPath); | ||
| if (cached) return cached; | ||
|
|
||
| try { | ||
| const filePath = join(process.cwd(), "content", "docs", contentPath); | ||
| const lastModified = statSync(filePath).mtime.toISOString(); | ||
| docLastModifiedCache.set(contentPath, lastModified); | ||
| return lastModified; | ||
| } catch { | ||
| return; | ||
| } |
There was a problem hiding this comment.
getDocPageLastModified(page.path) relies on page.path being a stable, safe relative path under content/docs. If page.path ever contains .. segments or a leading slash (even accidentally via content tooling), join(process.cwd(), "content", "docs", contentPath) could escape the intended directory and stat arbitrary files on the server. Even if today source guarantees safety, this is a fragile security boundary and worth hardening at the call site since it touches the filesystem.
Suggestion
Harden the path resolution to ensure the computed file path stays within the docs content root.
For example:
import { resolve, relative, sep } from "node:path";
const docsRoot = resolve(process.cwd(), "content", "docs");
const getDocPageLastModified = (contentPath: string): string | undefined => {
const cached = docLastModifiedCache.get(contentPath);
if (cached) return cached;
try {
const candidate = resolve(docsRoot, contentPath);
const rel = relative(docsRoot, candidate);
if (rel.startsWith("..") || rel.startsWith(sep)) return undefined;
const lastModified = statSync(candidate).mtime.toISOString();
docLastModifiedCache.set(contentPath, lastModified);
return lastModified;
} catch {
return undefined;
}
};Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this hardening.
| const siteUrl = process.env.NEXT_PUBLIC_SITE_URL || "https://docs.tambo.co"; | ||
|
|
||
| export const runtime = "nodejs"; | ||
|
|
||
| // Best-effort cache (per Node.js process) to avoid re-statting the same MDX file. | ||
| // This is used only for structured data metadata and doesn't need to be real-time. | ||
| const docLastModifiedCache = new Map<string, string>(); | ||
|
|
||
| const getDocPageLastModified = (contentPath: string): string | undefined => { | ||
| const cached = docLastModifiedCache.get(contentPath); | ||
| if (cached) return cached; | ||
|
|
||
| try { | ||
| const filePath = join(process.cwd(), "content", "docs", contentPath); | ||
| const lastModified = statSync(filePath).mtime.toISOString(); | ||
| docLastModifiedCache.set(contentPath, lastModified); | ||
| return lastModified; | ||
| } catch { | ||
| return; | ||
| } | ||
| }; |
There was a problem hiding this comment.
runtime = "nodejs" forces this route onto the Node runtime. That may be intentional for statSync, but it’s a meaningful platform constraint for a docs page: it prevents running on the Edge runtime and can affect deployment strategies.
Also, getDocPageLastModified(page.path) implicitly assumes page.path maps 1:1 to a file under content/docs/. If page.path ever changes format (or includes nested segments with unexpected values), you’ll silently omit dateModified—which is okay for correctness, but can make metadata flaky across environments without visibility.
Suggestion
If Node runtime is required, document it explicitly in a short comment near export const runtime = "nodejs" (e.g., “required for filesystem mtime used by JSON-LD”). If you’d prefer to avoid pinning runtime, consider sourcing dateModified from page.data/frontmatter (or the same route enumeration logic used for the sitemap) so the page can remain runtime-agnostic.
Reply with "@CharlieHelps yes please" if you’d like me to add a commit that derives dateModified from the existing sitemap enumeration (removing the need for statSync in the page).
| const excludedSitemapExactPathList = [ | ||
| "/llms.txt", | ||
| "/llms-full.txt", | ||
| "/robots.txt", | ||
| "/llms.mdx", | ||
| ]; | ||
|
|
||
| const excludedSitemapGlobPatterns = ["/_next/*", "/api/*", "/llms.mdx/*"]; | ||
|
|
||
| const excludedSitemapGlobs = [ | ||
| ...excludedSitemapGlobPatterns, | ||
| ...excludedSitemapExactPathList, | ||
| ]; | ||
|
|
||
| const excludedSitemapExactPaths = new Set(excludedSitemapExactPathList); | ||
|
|
||
| const isExcludedSitemapPath = (path) => | ||
| excludedSitemapExactPaths.has(path) || path.startsWith("/llms.mdx/"); | ||
|
|
||
| // Enumerate routes once at config load for deterministic, build-time-only sitemap generation | ||
| const enumerated = enumerateRoutes(); | ||
| const enumeratedByUrl = new Map(enumerated.map((e) => [e.url, e.lastmod])); | ||
|
|
||
| module.exports = { | ||
| siteUrl, | ||
| generateRobotsTxt: false, | ||
| exclude: ["/_next/*", "/api/*"], | ||
| // Exclude non-page routes and duplicates | ||
| exclude: excludedSitemapGlobs, | ||
| changefreq: "weekly", | ||
| priority: 0.8, | ||
| transform: async (config, path) => { | ||
| const found = enumerated.find((e) => e.url === path); | ||
| transform: async (_config, path) => { | ||
| // Skip paths that shouldn't be in sitemap | ||
| if (isExcludedSitemapPath(path)) { | ||
| return null; | ||
| } | ||
| const lastmod = enumeratedByUrl.get(path); | ||
| return { | ||
| loc: `${siteUrl}${path}`, | ||
| changefreq: "weekly", | ||
| priority: path === "/" ? 1.0 : 0.8, | ||
| lastmod: found?.lastmod ?? new Date(0).toISOString(), | ||
| ...(lastmod != null ? { lastmod } : {}), | ||
| alternateRefs: [], | ||
| }; | ||
| }, | ||
| // explicitly include all enumerated paths for determinism | ||
| additionalPaths: async () => | ||
| enumerated.map((e) => ({ loc: `${siteUrl}${e.url}`, lastmod: e.lastmod })), | ||
| enumerated | ||
| .filter((e) => !isExcludedSitemapPath(e.url)) | ||
| .map((e) => ({ | ||
| loc: `${siteUrl}${e.url}`, | ||
| ...(e.lastmod != null ? { lastmod: e.lastmod } : {}), | ||
| })), | ||
| // include ensures index and known root routes are emitted | ||
| include: enumerated.map((e) => e.url), | ||
| include: enumerated | ||
| .filter((e) => !isExcludedSitemapPath(e.url)) | ||
| .map((e) => e.url), |
There was a problem hiding this comment.
The isExcludedSitemapPath helper only checks exact matches plus path.startsWith("/llms.mdx/"), but your exclude list also contains glob patterns like "/_next/*" and "/api/*". Those are likely handled by the sitemap tool’s exclude, but note that additionalPaths is now the main path emitter and it only filters via isExcludedSitemapPath. If enumerateRoutes() ever adds /_next/... or /api/... entries (by mistake or via future routing changes), they could still leak into the sitemap via additionalPaths because isExcludedSitemapPath doesn’t mirror the glob exclusions.
Suggestion
Make isExcludedSitemapPath cover all exclusions that matter for additionalPaths/include (at least the obvious prefixes), or reuse the same source of truth.
For example:
const isExcludedSitemapPath = (path) =>
excludedSitemapExactPaths.has(path) ||
path.startsWith("/llms.mdx/") ||
path.startsWith("/_next/") ||
path.startsWith("/api/");Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this alignment.
|
@CharlieHelps can you fix these issues, but first rebase on main, then make sure all the tests, lint, and builds work before committing each time. |
Expand this to see my work.
|
aa16e2b to
5736b52
Compare
|
Rebased Latest fixes are in Changes
Verification$ npm ci
$ npm run lint -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# turbo: 4 tasks successful (warnings only)
$ npm run check-types -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# turbo: 7 tasks successful
$ npm run test -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# @tambo-ai/react: 66 test suites passed (1042 tests)
# @tambo-ai/react-ui-base: 7 test suites passed (118 tests)
# @tambo-ai/ui-registry: 8 test suites passed (113 tests)
$ npm run build -- --filter=@tambo-ai/docs --filter=@tambo-ai/react
# turbo: 4 tasks successful |
There was a problem hiding this comment.
Main remaining concerns are (1) JSON-LD script-breakout risk from embedding JSON.stringify output directly into dangerouslySetInnerHTML without escaping </script> sequences, and (2) potential runtime performance impact from statSync on requests plus an unbounded cache in the doc page. The sitemap exclusions are significantly improved, but the exclusion contract is still split between exclude globs and isExcludedSitemapPath, which is prone to drift. Addressing these will make the SEO/GEO changes safer and more robust under load.
Additional notes (2)
-
Performance |
docs/src/app/(docs)/[[...slug]]/page.tsx:27-34
The per-processdocLastModifiedCacheis unbounded. In long-lived Node processes (or if a lot of doc paths are crawled), this can grow without limit. Even if the docs set is finite, it’s safer to enforce a reasonable cap or cache only known params (e.g., fromgenerateStaticParams). -
Compatibility |
docs/src/app/layout.tsx:84-84
Injecting<GlobalJsonLd />inside a manually-declared<head>works, but in the Next.js App Router this can be fragile because Next manages<head>merging. It’s easy to accidentally introduce duplicate head tags or unexpected ordering with other head contributors.
Since JSON-LD is static and global, the more robust pattern is to render it as part of the layout return without explicitly adding a <head> element (or to use the metadata / generateMetadata pipeline where possible).
Summary of changes
Summary of changes
🗺️ Sitemap generation hardening (docs/next-sitemap.config.js)
- Made
getLastModForFile()returnundefined(omitlastmod) when a stable timestamp can’t be determined. - Added centralized exclusion configuration via:
excludedSitemapExactPathListexcludedSitemapGlobPatternsisExcludedSitemapPath()
- Improved determinism/performance by precomputing
enumeratedByUrl: Mapand using conditional spreads (...(lastmod != null ? { lastmod } : {})). - Ensured
additionalPathsandincludefilter out excluded routes so they can’t be reintroduced.
📄 Per-doc JSON-LD + metadata (docs/src/app/(docs)/[[...slug]]/page.tsx)
- Added per-page TechArticle JSON-LD via
createDocPageSchema()and<PageJsonLd />. - Added per-page
alternates.canonical, Open Graph, and Twitter card metadata using absolute URLs (pageUrl,logoUrl). - Implemented
dateModifiedderivation using filesystemmtimewith a per-process cache and path traversal hardening (resolve+relative). - Explicitly pinned route to
export const runtime = "nodejs"to allownode:fsusage.
🌐 Global SEO metadata + global JSON-LD (docs/src/app/layout.tsx)
- Expanded global
metadatawithkeywords,authors,creator,publisher,category, and richer Open Graph/Twitter settings. - Added
logoUrland set global Open Graph/Twitter images to absolute URLs. - Injected
<GlobalJsonLd />into<head>.
🤖 Robots rules (docs/src/app/robots.ts)
- Changed
rulesfrom a single rule to an array of bot-specific rules (GPTBot, ClaudeBot, PerplexityBot, etc.). - Added
hostand keptsitemappointing to${baseUrl}/sitemap.xml.
🧩 New JSON-LD component (docs/src/components/json-ld.tsx)
- Added server-rendered inline
<script type="application/ld+json">renderer (JsonLd) and typedJsonLdSchema. - Added global schemas (
Organization,SoftwareApplication,WebSite) plus helpers forTechArticle,FAQPage, andHowTo. - Made
dateModifiedoptional inTechArticle(omitted when unknown) to avoid fake freshness.
| import { statSync } from "node:fs"; | ||
| import { relative, resolve, sep } from "node:path"; | ||
| import { notFound } from "next/navigation"; | ||
| import { Suspense } from "react"; | ||
|
|
||
| const siteUrl = process.env.NEXT_PUBLIC_SITE_URL || "https://docs.tambo.co"; | ||
| const logoUrl = new URL("/logo/lockup/Tambo-Lockup.png", siteUrl).toString(); | ||
|
|
||
| const docsRoot = resolve(process.cwd(), "content", "docs"); | ||
|
|
||
| // Requires Node.js runtime for filesystem `mtime` reads for JSON-LD `dateModified`. | ||
| export const runtime = "nodejs"; | ||
|
|
||
| // Best-effort cache (per Node.js process) to avoid re-statting the same MDX file. | ||
| // This is used only for structured data metadata and doesn't need to be real-time. | ||
| const docLastModifiedCache = new Map<string, string>(); | ||
|
|
||
| const getDocPageLastModified = (contentPath: string): string | undefined => { | ||
| const cached = docLastModifiedCache.get(contentPath); | ||
| if (cached) return cached; | ||
|
|
||
| try { | ||
| const candidatePath = resolve(docsRoot, contentPath); | ||
| const rel = relative(docsRoot, candidatePath); | ||
| if (rel.startsWith("..") || rel.startsWith(sep)) { | ||
| return; | ||
| } | ||
|
|
||
| const lastModified = statSync(candidatePath).mtime.toISOString(); | ||
| docLastModifiedCache.set(contentPath, lastModified); | ||
| return lastModified; | ||
| } catch { | ||
| return; | ||
| } | ||
| }; |
There was a problem hiding this comment.
getDocPageLastModified() correctly hardens against path traversal using resolve + relative, but it still uses statSync, which blocks the Node.js event loop for every uncached page path. This route is a docs page that can be hit frequently; even with the per-process cache, cold starts and cache misses can stack up under load.
Since this value is only used for metadata/JSON-LD, it doesn’t need to be synchronous.
Suggestion
Switch to async fs.promises.stat (or stat from node:fs/promises) and make getDocPageLastModified async to avoid blocking:
import { stat } from "node:fs/promises";
const getDocPageLastModified = async (contentPath: string) => {
const cached = docLastModifiedCache.get(contentPath);
if (cached) return cached;
try {
const candidatePath = resolve(docsRoot, contentPath);
const rel = relative(docsRoot, candidatePath);
if (rel.startsWith("..") || rel.startsWith(sep)) return undefined;
const lastModified = (await stat(candidatePath)).mtime.toISOString();
docLastModifiedCache.set(contentPath, lastModified);
return lastModified;
} catch {
return undefined;
}
};Then dateModified: await getDocPageLastModified(page.path). Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this refactor.
| const docsRoot = resolve(process.cwd(), "content", "docs"); | ||
|
|
||
| // Requires Node.js runtime for filesystem `mtime` reads for JSON-LD `dateModified`. | ||
| export const runtime = "nodejs"; | ||
|
|
||
| // Best-effort cache (per Node.js process) to avoid re-statting the same MDX file. | ||
| // This is used only for structured data metadata and doesn't need to be real-time. | ||
| const docLastModifiedCache = new Map<string, string>(); | ||
|
|
||
| const getDocPageLastModified = (contentPath: string): string | undefined => { | ||
| const cached = docLastModifiedCache.get(contentPath); | ||
| if (cached) return cached; | ||
|
|
||
| try { | ||
| const candidatePath = resolve(docsRoot, contentPath); | ||
| const rel = relative(docsRoot, candidatePath); | ||
| if (rel.startsWith("..") || rel.startsWith(sep)) { | ||
| return; | ||
| } | ||
|
|
||
| const lastModified = statSync(candidatePath).mtime.toISOString(); | ||
| docLastModifiedCache.set(contentPath, lastModified); | ||
| return lastModified; | ||
| } catch { | ||
| return; | ||
| } | ||
| }; |
There was a problem hiding this comment.
getDocPageLastModified() correctly uses resolve + relative to keep reads under content/docs, but the guard rel.startsWith(sep) is ineffective on most platforms because relative() typically returns a path without a leading separator. The important check is the .. traversal; the separator check doesn’t add real protection and could give a false sense of safety.
Also, you’re using the raw contentPath as the cache key. If the content layer ever varies path normalization (e.g., a/../b.mdx vs b.mdx), you could cache duplicate entries for the same file (minor, but avoidable).
Suggestion
Tighten the traversal guard and normalize the cache key to the resolved candidate path:
const getDocPageLastModified = (contentPath: string): string | undefined => {
try {
const candidatePath = resolve(docsRoot, contentPath);
const rel = relative(docsRoot, candidatePath);
if (rel.startsWith("..") || rel === "") return undefined; // keep or adjust as desired
const cached = docLastModifiedCache.get(candidatePath);
if (cached) return cached;
const lastModified = statSync(candidatePath).mtime.toISOString();
docLastModifiedCache.set(candidatePath, lastModified);
return lastModified;
} catch {
return undefined;
}
};If you want a stricter guard, also reject absolute contentPath early via path.isAbsolute(contentPath). Reply with "@CharlieHelps yes please" if you’d like me to add a commit with these changes.
| export function JsonLd({ id, schema }: JsonLdProps) { | ||
| return ( | ||
| <script | ||
| id={id} | ||
| type="application/ld+json" | ||
| dangerouslySetInnerHTML={{ __html: JSON.stringify(schema) }} | ||
| /> | ||
| ); | ||
| } |
There was a problem hiding this comment.
JsonLd renders JSON.stringify(schema) directly into dangerouslySetInnerHTML. While JSON-LD is intended to be JSON, it’s still possible for schema values (especially createFAQSchema answers or other future content-derived fields) to contain </script> sequences, which can prematurely terminate the script tag and enable HTML injection.
This is a known gotcha when embedding JSON in <script> tags.
Suggestion
Harden the JSON serialization for safe embedding by escaping the </script sequence (and optionally <!--):
const toJsonLd = (value: unknown) =>
JSON.stringify(value).replace(/<\//g, "<\\/");
export function JsonLd({ id, schema }: JsonLdProps) {
return (
<script
id={id}
type="application/ld+json"
dangerouslySetInnerHTML={{ __html: toJsonLd(schema) }}
/>
);
}This keeps JSON valid while preventing script-breakout. Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this hardening.
| export default function robots(): MetadataRoute.Robots { | ||
| return { | ||
| rules: { | ||
| userAgent: "*", | ||
| allow: "/", | ||
| disallow: ["/api/"], | ||
| }, | ||
| rules: [ | ||
| // Default rule for all bots | ||
| { | ||
| userAgent: "*", | ||
| allow: "/", | ||
| disallow: ["/api/"], | ||
| }, | ||
| // Explicitly allow OpenAI's GPTBot (used for ChatGPT training and browsing) | ||
| { | ||
| userAgent: "GPTBot", | ||
| allow: "/", | ||
| }, | ||
| // Explicitly allow ChatGPT's browsing feature | ||
| { | ||
| userAgent: "ChatGPT-User", | ||
| allow: "/", | ||
| }, | ||
| // Explicitly allow Perplexity AI | ||
| { | ||
| userAgent: "PerplexityBot", | ||
| allow: "/", | ||
| }, | ||
| // Explicitly allow Claude/Anthropic | ||
| { | ||
| userAgent: "ClaudeBot", | ||
| allow: "/", | ||
| }, | ||
| { | ||
| userAgent: "anthropic-ai", | ||
| allow: "/", | ||
| }, | ||
| // Explicitly allow Google's AI bot | ||
| { | ||
| userAgent: "Google-Extended", | ||
| allow: "/", | ||
| }, | ||
| // Explicitly allow Bing/Microsoft Copilot | ||
| { | ||
| userAgent: "Bingbot", | ||
| allow: "/", | ||
| }, | ||
| // Explicitly allow Meta AI | ||
| { | ||
| userAgent: "Meta-ExternalAgent", | ||
| allow: "/", | ||
| }, | ||
| // Explicitly allow Cohere | ||
| { | ||
| userAgent: "cohere-ai", | ||
| allow: "/", | ||
| }, | ||
| ], |
There was a problem hiding this comment.
In robots.ts, you’re explicitly allowing a set of AI crawlers, but you’re not applying the same disallow: ["/api/"] restriction to them that you apply to "*". Depending on the robots parser, the more specific user-agent blocks may override the generic one and therefore allow crawling of /api/ for those bots.
If the intent is to keep /api/ blocked for everyone, each explicit rule should include the same disallow list (or you should avoid adding bot-specific blocks unless you’re changing behavior).
Suggestion
Add disallow: ["/api/"] to each bot-specific rule (or remove bot-specific rules entirely if they aren’t intended to differ from the default).
Example:
const apiDisallow = ["/api/"];
{
userAgent: "GPTBot",
allow: "/",
disallow: apiDisallow,
}Reply with "@CharlieHelps yes please" if you’d like me to add a commit applying this consistently across the rules.
Summary
Why
GEO (Generative Engine Optimization) improves visibility in AI search engines like ChatGPT, Perplexity, and Claude.
🤖 Generated with Claude Code