Python + AI Weekly Office Hours: Recordings & Resources #280
Replies: 133 comments 3 replies
-
|
2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to? Yes, this is already happening quite a bit. Common use cases include:
A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL. The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you set up Entra OBO (On-Behalf-Of) flow for Python MCP servers? 📹 5:48 The demo showed how to use the Graph API with the OBO flow to find out the groups of a signed-in user and use that to decide whether to allow access to a particular tool. The flow works as follows:
For the authentication dance, FastMCP handles the DCR (Dynamic Client Registration) flow since Entra itself doesn't support DCR natively. To test from scratch:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: Which MCP inspector should I use for testing servers with Entra authentication? 📹 20:24 The standard MCP Inspector doesn't work well with Entra authentication because it doesn't do the DCR (Dynamic Client Registration) dance properly. MCP Jam is recommended instead because it properly handles the OAuth flow with DCR. To set it up:
MCP Jam also has nice features like:
One note: enum values in tools don't yet show as dropdowns in MCP Jam (issue to be filed). Links shared: What's the difference between MCP Jam and LM Studio? 📹 34:19 LM Studio is primarily for playing around with LLMs locally. MCP Jam has some overlap since it includes a chat interface with access to models, but its main purpose is to help you develop MCP servers and apps. It's focused on the development workflow rather than just chatting with models. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you track LLM usage tokens and costs? 📹 28:04 For basic tracking, Azure portal shows metrics for token usage in your OpenAI accounts. You can see input tokens and output tokens in the metrics section. You can also:
If you use multiple providers, you need a way to consolidate the tracking. OpenTelemetry metrics could work but you'd need a way to hook into each system. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you keep yourself updated with all the new changes related to AI? 📹 30:32 Several sources recommended:
Particularly recommended:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you build a Microsoft Copilot agent in Python with custom API calls? 📹 36:30 For building agents that work with Microsoft 365 Copilot (which appears in Windows Copilot and other Microsoft surfaces):
The agent framework team is responsive if there are issues. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: As a backend developer with a non-CS background, how do I learn about AI from scratch? 📹 46:39 Recommended approach:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: What's new with the RAG demo (azure-search-openai-demo) after the SharePoint data source was added? 📹 49:50 The main work is around improving ACL (Access Control List) support. The cloud ingestion feature was added recently, but it doesn't yet support ACLs. The team is working on making ACLs compatible with all features including:
A future feature idea: adding an MCP server to the RAG repo for internal documentation use cases, leveraging the Entra OBO flow for access control. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to? 📹 53:53 Yes, this is already happening quite a bit. Common use cases include:
A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL. The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: What advantages do other formats have over .txt for prompts? How do you improve prompts with DSPy and evals? 📹 4:55 Prompty is a template format that mixes Jinja and YAML together. The YAML goes at the top for metadata, and the rest is Jinja templating. Jinja is the most common templating system for Python (used by Flask, etc.). The nice thing about Jinja is you can pass in template variables—useful for customization, passing in citations, etc. Prompty turns the file into a Python list of chat messages with roles and contents. However, we're moving from Prompty to plain Jinja files because:
Recommendation: Keep prompts separate from code when possible, especially long system prompts. Use plain .txt or .md if you don't need variables, or Jinja if you want to render variables. With agents and tools, some LLM-facing text (like tool descriptions in docstrings) will inevitably live in your code—that's fine. For iterating on prompts: Run evaluations, change the prompt, and see whether it improves things. There are tools like DSPy and Agent Framework's Lightning that do automated prompt optimization/fine-tuning. Lightning says it "fine-tunes agents" but may actually be doing prompt changes. Most of the time, prompt changes don't make a huge difference, but sometimes they might. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: What is the future of AI and which specialization should I pursue? 📹 11:54 If you enjoy software engineering and full-stack engineering, it's more about understanding the models so you understand why they do what they do, but it's really about how you're building on top of those models. There's lots of interesting stuff to learn, and it really depends on you and what you're most interested in doing. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: Which livestream series should I follow to build a project using several tools and agents, and should I use a framework? 📹 13:33 Everyone should understand tool calling before moving on to agents. From the original 9-part Python + AI series, start with tool calling, then watch the high-level agents overview. The upcoming six-part series in February will dive deeper into each topic, especially how to use Agent Framework. At the bare minimum, you should understand LLMs, tool calling, and agents. Then you can decide whether to do everything with just tool calling (you can do it yourself with an LLM that has tool calling) or use an agent framework like LangChain or Agent Framework if you think it has enough benefits for you. It's important to understand that agents are based on tool calling—it's the foundation of agents. The success and failure of agents has to do with the ability of LLMs to use tool calling. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: How does Azure manage the context window? How do I maintain a long conversation with a small context window? 📹 15:21 There are three general approaches:
With today's large context windows (128K, 256K), it's often easier to just wait for an error and tell the user to start a new chat, or do summarization when the error occurs. This approach is most likely to work across models since every model should throw an error when you're over the context window. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: How do we deal with context rot and how do we summarize context using progressive disclosure techniques? 📹 19:17 Read through Kelly Hong's (Chroma researcher) blog post on context rot. The key point is that even with a 1 million token context window, you don't have uniform performance across that context window. She does various tests to see when performance starts getting worse, including tests on ambiguity, distractors, and implications. A general tip for coding agents with long-running tasks: use a main agent that breaks the task into subtasks and spawns sub-agents for each one, where each sub-agent has its own focused context. This is the approach used by the LangChain Deep Agents repo. You can also look at how different projects implement summarization. LangChain's summarization middleware is open source—you can see their summary prompt and approach. They do approximate token counting and trigger summarization when 80% of the context is reached. Links shared:
How do I deal with context issues when using the Foundry SDK with a single agent? 📹 25:03 If you're using the Foundry SDK with a single agent (hosted agent), you can implement something like middleware through hooks or events. Another approach is the LangChain Deep Agents pattern: implement sub-agents as tools where each tool has a limited context and reports back a summary of its results to the main agent. For the summarization approach with Foundry agents, you'd need to figure out what events, hooks, or middleware systems they have available. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: Have you seen or implemented anything related to AG-UI or A2UI? 📹 29:02 AG-UI (Agent User Interaction Protocol) is an open standard introduced by the CopilotKit team that standardizes how front-end applications communicate with AI agents. Both Pydantic AI and Microsoft Agent Framework have support for AG-UI—they provide adapters to convert messages to the AG-UI format. The advantage of standardization is that if people agree on a protocol between backend and frontend, it means you can build reusable front-end components that understand how to use that backend. Agent Framework also supports different UI event stream protocols, including Vercel AI (though Vercel is a competitor, so support may be limited). These are adapters—you can always adapt output into another format if needed, but it's nice when it's built in. A2UI is created by Google with Consortium CopilotKit and relates to A2A (Agent-to-Agent). A2UI appears to be newer with less support currently in Agent Framework, though A2A is supported. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/13: Are there good resources to dig deeper on AI agents deployment? 📹 52:41 For Foundry hosted agents specifically, Pamela recommends waiting about two weeks for the upcoming live stream series (Host your agents on Foundry, Apr 27-30), since the SDKs are actively being redone and things are changing rapidly. In the meantime, she recommends starting with the seattle-hotel-agent AZD example repo and the corresponding blog post about azd AI agent debugging. That's what she used as the basis for her own hosted agent, and what she had a colleague use to get started. If you run into issues, file them in the AZD repo. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/13: Announcements 📹 00:57 Responses API migration: The azure-openai-to-responses migration agent is live. Pamela has now migrated pretty much every sample over to the Azure Responses API, including the large RAG sample. The Responses API enables easy access to built-in code interpreter and web search tools. Copilot CLI remote control from mobile: You can now monitor and steer a running Copilot CLI session from your phone using Copilot CLI multi-model reflection (rubber duck): The new rubber duck feature has Copilot CLI use a different model family to provide a second opinion and critique on plans and implementations. VS Code agent customizations: A new VS Code window shows all your agent customizations in one place — AGENTS.md files, custom instructions, skills, and more. Agent-first development video series: Gwen's introduction to agent-first development series covers building apps with VS Code and Copilot using an agent-first approach. ParseBench document parsing benchmark: ParseBench is a new benchmark with 2,000+ human-verified pages and 167K test rules for evaluating document OCR quality across tables, charts, formatting, and more. DSPy meetup and talks: Recent DSPy meetup featured talks on reasoning models and the GEPA optimize_anything approach. Dropbox also presented on search relevance with DSPy. MCP conformance suite: The MCP conformance suite is a tool for testing whether your MCP server complies with the MCP specification. Review PR comments skill: Pamela built a review-pr-comments Copilot CLI skill that reviews comments on an active pull request and decides whether to accept, iterate, or reject the suggested changes. Personal projects:
Anthropic — Project Glasswing: Discussed Project Glasswing, Anthropic's new initiative to secure critical software for the AI era. |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/20: What's a good workflow for pulling entities out of PDFs? Is MarkItDown a good library? 📹 4:02 Pamela demonstrated several approaches for extracting data from PDFs, starting with MarkItDown — a Microsoft open-source library that converts documents (DOCX, PDF, etc.) to Markdown. She showed an entity extraction example where a Word document was converted to Markdown and then sent to an LLM to extract fields like title, author, and headings. She then compared MarkItDown vs. PyMuPDF (specifically For documents with images, Pamela demonstrated MarkItDown's OCR plugin, which uses an LLM to describe images found in documents. The best results came from Azure Document Intelligence, which she demonstrated through her RAG application. Document Intelligence extracted far more figures and structural information from the PDF. Combined with an LLM for image descriptions (using a prompt like "describe the image with no more than five sentences"), this approach produced the richest output — including both text content and detailed figure descriptions that go beyond simple OCR text extraction. She also mentioned Azure Content Understanding as a newer alternative hosted service worth exploring, and noted that Pablo shared an Azure Content Understanding MCP server (built in .NET) for quick experimentation. Links shared:
How do you access PDFs stored in SharePoint? 📹 35:39 If you just want to ask questions about a document, you could use Work IQ (which can query SharePoint content). But if you need the full document — for example, to run your own extraction pipeline — you'll need to use the Microsoft Graph API. In the future, Work IQ may add more Graph API functionality, but currently it's limited for full document retrieval. |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/20: Bug report: Sporadic 400 errors from the Azure AI Search vectorization endpoint 📹 30:43 A community member reported intermittent 400 (Bad Request) errors when using text-embedding-small with Azure AI Search's integrated vectorization. Pamela first suggested checking RBAC permissions — specifically, the search service's managed identity needs the Cognitive Services OpenAI User role assigned to it. She showed this setup in her Bicep templates. However, since the error was intermittent (working sometimes, failing other times), Pamela suspected it might be a rate limit error that isn't surfacing clearly. She messaged the Azure AI Search PM directly and asked the community member to share their search service ID, subscription, and timestamp of the error so the team could check the logs. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/20: Bug report: Authentication succeeds but tool calls fail with the Foundry Atlassian MCP server 📹 36:35 A community member reported that OAuth authentication succeeds for the Atlassian MCP server added from the Foundry catalog to a prompt agent, but tool calls fail. Pamela acknowledged there are known issues with remote MCP servers on Foundry, showing a similar internal server error she encountered. For debugging, she recommended:
The new version of Foundry hosted agents is expected to ship this week or the following week. |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/20: Any tips for the Vancouver Web Summit hackathon? 📹 42:25 A community member based in Vancouver mentioned they planned to submit to the Microsoft Vancouver Web Summit GitHub Copilot SDK Hackathon. Pamela suggested looking at the recently announced Agents League hackathon winners for inspiration on what judges look for. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/20: Announcements 📹 0:42 GitHub Copilot pricing changes: New signups for GitHub Copilot Pro, Pro+, and Student plans are paused due to high demand. Free tier (with rate limits) is still available, and Business/Enterprise plans are unaffected. Additionally, Opus models have been removed from Pro — only Pro+ gets Opus 4.7. VS Code 1.116 updates: Copilot is now built-in to VS Code, Claude Opus 4.7 is GA in Copilot, thinking effort is configurable in Copilot CLI, and the Agent Host Protocol now supports subagents and teams. Foundry hosted agents livestream series: Starting April 27-30, covering Agent Framework agents on Foundry, LangChain/LangGraph agents on Foundry, and evaluation/safety. New Microsoft certifications: Two new AI certifications were shared — AI Agent Builder Associate (Copilot Studio focused) and Azure AI Apps and Agents Developer Associate (Python/Azure AI focused, with an 80% discount for the first 300 people before May 7th).
PyCon US 2026: Pamela will be giving an MCP tutorial on Wednesday, a tutorial at the Edu Summit on Thursday, and a sponsored session on Thursday or Friday. Microsoft booth will be open Thursday evening through Saturday. Upcoming events:
|
Beta Was this translation helpful? Give feedback.
-
|
2026/04/28: Update: Do Foundry evaluations stay in your tenant? 📹 1:03 Pamela followed up on a question from the previous day's office hours. She confirmed with the Foundry evaluation team that you must bring your own storage container if you want evaluation data to stay in your tenant. This is a hard requirement, not just a recommendation. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/28: How could we use GraphRAG from Cosmos DB in a hosted agent for memory and knowledge? 📹 4:52 The term "graph RAG" gets used in different ways — the Microsoft Research GraphRAG project versus any RAG approach that does a graph query. The Cosmos DB Conf session covered the approach described in this Cazton blog post, which benchmarks four AI agent memory strategies (including an entity graph approach using Cosmos DB and OpenAI) across recall, token cost, and latency — the entity graph strategy achieved 100% recall. For integrating any Azure service (including Cosmos DB) into a hosted agent using keyless auth:
For the memory use case, implement a custom context provider for Agent Framework — context providers are called on every agent invocation to inject memory. Look at the existing Redis or Mem0 context provider implementations as a starting point and ask GitHub Copilot to adapt one for Cosmos DB. For the knowledge retrieval use case, implement a tool instead — tools are better for knowledge because you typically want the agent to decide when to query, whereas memory should always be checked. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/28: Which model is best for RAG-based chatbots? 📹 19:17 Avoid GPT-4o ( The GPT-5.5 prompting guide recommends treating it as an entirely new model family to tune for, not just an incremental upgrade — worth reading if you're planning to migrate. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/28: How come I can't deploy the Mistral OCR model anymore? 📹 21:52 There is a known open issue where Mistral models are not showing up in the Foundry catalog — the Foundry team is actively working on it. This is not a deprecation. The model should reappear once the issue is resolved. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/28: I'm getting 408 timeouts when asking the model to query multiple tools at once — is it a prompt issue or a model issue? 📹 35:08 The "The operation was timeout." error message is a known Azure OpenAI error. A few things to investigate:
The key is to gather more data before guessing at the root cause — look at token counts, check whether the timeout happens before or after a response is received, and narrow down which tool or model call is failing. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/28: Any inputs on PageIndex vs. vector RAG? 📹 39:59 Based on feedback from Pamela's colleague who specializes in retrieval: PageIndex does work, but it's document-type dependent. It tends to perform best on long documents where traditional chunking and vector search struggle. It may not be a universal improvement. The recommendation is to set up your own evaluations with your actual data and compare retrieval quality. There is no formal Microsoft support for PageIndex in any of the current RAG demos, but it's worth experimenting with if you have long-document use cases. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/04/28: Announcements 📹 1:56 Foundry Hosted Agents public preview launched: The new hosted agents platform (with fast microVM infrastructure) launched last week. It's in public preview and still stabilizing — some roughness expected.
GitHub Copilot moving to usage-based billing: Starting June 1, Copilot usage will consume GitHub AI Credits. Pamela noted she's been trying to use smaller models (Sonnet, Haiku, GPT-4.1) as a result. Strategies to manage costs: choose models intentionally, use auto mode (VS Code is improving task-based routing), or bring your own API keys.
GPT-5.5 now in Azure Foundry: Available as of April 23rd. MAI-Image-2: Microsoft's new in-house text-to-image model, available in Foundry. Pamela demonstrated it generating a photorealistic Jedi costume image from a photo of her face — impressive facial likeness quality. Pydantic Monty $5K sandbox escape bounty: Pydantic is running a competition to find exploits in the Monty Python sandbox. A good example of open-source security hardening through incentivized bug hunting. FastMCP 3.2: Full MCP Apps support released. GitHub merge queues deep-dive: A useful blog post for maintainers who merge many PRs concurrently — merge queues test PRs in order before merge to ensure compatibility. Deploying Anthropic (Claude) to Foundry: Pamela showed a demo repo with Bicep for deploying Anthropic models to Foundry. The Bicep is similar to OpenAI model deployments but requires an organization name, country code, and industry. Not available on internal Microsoft accounts, but works on personal/customer subscriptions. presentation-skills repo: Pamela published a new repo collecting all her Copilot skills for working with presentations (creating and writing up talks). Azure Cosmos DB Conf: Happening April 28 (today), live stream available. Upcoming events: |
Beta Was this translation helpful? Give feedback.
-
|
For teams moving from single-agent Python prototypes to multi-agent production, the Python patterns that seem natural become anti-patterns at scale. A few common issues: Sharing OpenAI/Anthropic clients across agents — fine in prototypes, breaks in production. Each agent should have its own client (or at least its own rate limit tracking) because a single starved agent can exhaust the shared client's rate limits, silently degrading all other agents. Using Python asyncio for agent concurrency — the event loop becomes a bottleneck. Agent LLM calls are I/O-bound but the response processing (context assembly, memory updates) can be CPU-bound. Better: use separate worker processes with message queues between them. Context = conversation history — the natural pattern in Python AI code is to append every turn to a list and pass it as messages. At 100+ turns, this becomes a context explosion. You need progressive compaction: summarize older turns, keep recent ones verbatim. No budget tracking = production accidents — Python makes it easy to Naive retry logic amplifies costs — These patterns from production: https://blog.kinthai.ai/agent-wallet-economic-models-autonomous-agents |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Each week, we hold weekly office hours about all things Python + AI in the Foundry Discord.
Join the Discord here: http://aka.ms/aipython/oh
This thread will list the recordings of each office hours, and any other resources that come out of the OH sessions. The questions and answers are automatically posted (based on the transcript) as comments in this thread.
April 28, 2026
Topics covered:
April 20, 2026
Topics covered:
April 13, 2026
Topics covered:
April 7, 2026
Topics covered:
March 31, 2026
Topics covered:
March 24, 2026
Topics covered:
March 17, 2026
Topics covered:
February 17, 2026
Topics covered:
February 10, 2026
Topics covered:
February 3, 2026
Topics covered:
January 27, 2026
Topics covered:
January 20, 2026
Topics covered:
January 13, 2026
Topics covered:
January 6, 2025
Topics covered:
Beta Was this translation helpful? Give feedback.
All reactions