Support PDF inputs for BYOK endpoints#323836
Open
AntonioLujanoLuna wants to merge 5 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds first-class PDF file-input capability signaling and end-to-end PDF payload preservation for BYOK models (Responses/Messages APIs), including conversions to Anthropic document blocks and coverage across capability propagation, endpoint gating, and prompt rendering.
Changes:
- Introduces
fileInputMimeTypesas a proposed language model capability and wires it through VS Code model metadata and Copilot endpoint abstractions. - Preserves PDF payloads across raw↔VS Code message conversions and converts PDFs into Anthropic
documentblocks (including in tool results). - Adds regression tests for PDF capability propagation, prompt rendering, conversion correctness, and disabling for Chat Completions.
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/vscode-dts/vscode.proposed.languageModelCapabilities.d.ts | Adds proposed fileInputMimeTypes on language model capabilities surface. |
| src/vscode-dts/vscode.proposed.chatProvider.d.ts | Adds proposed fileInputMimeTypes to LanguageModelChatCapabilities for providers. |
| src/vs/workbench/contrib/chat/common/languageModels.ts | Extends workbench-side chat model metadata to carry fileInputMimeTypes. |
| src/vs/workbench/api/common/extHostLanguageModels.ts | Forwards fileInputMimeTypes between extension host and main thread model representations. |
| extensions/copilot/src/platform/networking/common/networking.ts | Adds optional fileInputMimeTypes to Copilot endpoint interface. |
| extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.ts | Tests raw→VS Code conversion for PDF document parts. |
| extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts | Converts raw PDF document parts into LanguageModelDataPart and exposes endpoint capability. |
| extensions/copilot/src/platform/endpoint/test/node/chatModelCapabilities.spec.ts | Tests precedence rules for explicit PDF support vs legacy family/vision heuristics. |
| extensions/copilot/src/platform/endpoint/common/chatModelCapabilities.ts | Implements modelSupportsPDFDocuments using explicit MIME types, with legacy fallback. |
| extensions/copilot/src/extension/prompts/node/panel/test/fileVariable.spec.ts | Adds coverage ensuring PDF rendering works for explicitly configured custom models. |
| extensions/copilot/src/extension/prompts/node/panel/fileVariable.tsx | Switches PDF gating to rely on modelSupportsPDFDocuments rather than supportsVision directly. |
| extensions/copilot/src/extension/conversation/vscode-node/test/languageModelAccess.test.ts | Tests prompt rendering preserves PDF parts into raw document content parts. |
| extensions/copilot/src/extension/conversation/vscode-node/languageModelAccessPrompt.tsx | Renders PDF LanguageModelDataPart as prompt-tsx <Document> blocks. |
| extensions/copilot/src/extension/byok/vscode-node/test/customEndpointProvider.spec.ts | Tests file-input advertisement behavior across API types (Responses/Messages vs Chat Completions). |
| extensions/copilot/src/extension/byok/vscode-node/test/byokModelInfo.spec.ts | Tests BYOK model info includes configured fileInputMimeTypes. |
| extensions/copilot/src/extension/byok/vscode-node/customEndpointProvider.ts | Gates configured file inputs based on inferred/declared API type. |
| extensions/copilot/src/extension/byok/vscode-node/anthropicProvider.ts | Advertises PDF input support for Anthropic BYOK models. |
| extensions/copilot/src/extension/byok/common/test/anthropicMessageConverter.spec.ts | Adds test for PDF data part → Anthropic document block conversion. |
| extensions/copilot/src/extension/byok/common/byokProvider.ts | Extends BYOK capabilities/model-info mapping to include fileInputMimeTypes. |
| extensions/copilot/src/extension/byok/common/anthropicMessageConverter.ts | Converts PDF parts to Anthropic document blocks (including inside tool results). |
| extensions/copilot/package.json | Adds fileInputMimeTypes to Custom Endpoint model configuration schema (PDF only). |
Comments suppressed due to low confidence (1)
src/vs/workbench/api/common/extHostLanguageModels.ts:253
fileInputMimeTypesis forwarded only when truthy, so an explicit empty array is lost. That prevents providers from explicitly advertising “no supported file inputs” and can break endpoint-type gating where an empty list is meaningful.
configurationSchema: m.configurationSchema as IJSONSchema | undefined,
capabilities: m.capabilities ? {
vision: m.capabilities.imageInput,
...(m.capabilities.fileInputMimeTypes ? { fileInputMimeTypes: m.capabilities.fileInputMimeTypes } : {}),
editTools: m.capabilities.editTools,
toolCalling: !!m.capabilities.toolCalling,
agentMode: !!m.capabilities.toolCalling
} : undefined,
Comment on lines
165
to
170
| capabilities: { | ||
| toolCalling: capabilities.toolCalling, | ||
| imageInput: capabilities.vision, | ||
| ...(capabilities.fileInputMimeTypes ? { fileInputMimeTypes: capabilities.fileInputMimeTypes } : {}), | ||
| editTools: capabilities.editTools, | ||
| }, |
Comment on lines
462
to
467
| capabilities: { | ||
| supportsImageToText: model.metadata.capabilities?.vision ?? false, | ||
| ...(model.metadata.capabilities?.fileInputMimeTypes ? { fileInputMimeTypes: model.metadata.capabilities.fileInputMimeTypes } : {}), | ||
| supportsToolCalling: !!model.metadata.capabilities?.toolCalling, | ||
| editToolsHint: model.metadata.capabilities?.editTools, | ||
| }, |
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds explicit PDF input support for BYOK models using the Responses or
Messages APIs.
Previously, PDF attachments could be represented in chat prompts but were
lost while converting between raw prompt messages and VS Code language-model
messages. Custom models also had no way to advertise PDF support independently
of vision support.
This change:
fileInputMimeTypesto the proposed language-model capabilities andCustom Endpoint model configuration.
application/pdffor configured Responses and Messages endpoints.conversion.
documentblocks, including tool-resultcontent.
prompt rendering, and endpoint-type gating.
How to test
Configure a BYOK Custom Endpoint model with
"apiType": "responses"or"apiType": "messages"and: