KL MCP: Data Chaos to Insights with Azure AI

Knowledge Library MCP: Turning Data Chaos into Actionable Insights with Azure AI

Bots and code unlock precise insights from complex data

By Constantine Vassilev

March 2025

This report introduces Knowledge Library MCP (KL MCP), a custom tool I developed to tame disorganized data—Machine Learning (ML) files, thousands of SEC filings like Tesla’s 10-Ks and 10-Qs, process workflows, and scattered charts. Built with Azure AI Agent Service, which addresses 80% of Retrieval-Augmented Generation (RAG) challenges, and inspired by Anthropic’s Model Context Protocol (MCP), KL MCP quickly locates documents and enables chat-based insights. It’s a practical, code-driven solution for professionals needing clarity from complexity.

The Problem: Data Overload

I was drowning in documentation—PDFs of ML pipelines, financial filings, deployment steps, and visuals, all mixed with code. My first attempt stored chunked, vectorized data in Cosmos DB NoSQL, but as files multiplied, finding specifics became inefficient. Inspired by MCP’s focus on context, I envisioned a scalable system to locate and chat with diverse files—PDFs, Word, Excel, PowerPoint, text, HTML, and images—using live data and dynamic code responsibly. Azure AI Agent Service made it possible.

How I Built KL MCP

KL MCP organizes data like a library, powered by my code and Azure’s tools: Knowledge Tools, Azure Functions, and Code Interpreter. It outperforms flatter systems like Azure AI Search. Here’s the breakdown:

- Knowledge Tools with MCP Bots: I coded specialized bots—DocBot for ML and workflows, SECBot for filings—to manage domains. Scripts upload files, creating vector stores for PDFs, Office documents, text, HTML, and images (via OCR/labels). Metadata (e.g., file type, date) is stored in Cosmos DB NoSQL and vectorized for semantic search. Bots retrieve precise content—text, tables, or visuals—fixing RAG’s retrieval gaps with a scalable, hierarchical approach.

- Azure Functions: My scripts pull live data, like stock prices or process updates, via APIs, keeping chats current—unlike static RAG setups.

- Code Interpreter: Python scripts analyze data on the fly—think profit trends or workflow bottlenecks—delivering insights Search can’t replicate.

- Azure AI Studio Agent: One GPT-4o agent, coded to integrate bots, functions, and code, drives seamless chats with results.

- C Client: A simple app I built handles uploads, queries, and threads via the Azure AI Agents SDK.

Ask, “Find Tesla’s profit trends,” and SECBot returns: “Q4 10-Ks show a 15% rise, $2 billion in text—ready to chat.” Query, “Locate deployment steps,” and DocBot replies: “PDFs list five steps, bottleneck at three—chat enabled.”

How It Operates

KL MCP scales to 10,000 files (512 MB/5M tokens each). Bots search vector stores and metadata, fetch live data via Functions, and compute insights with Code Interpreter. It rewrites queries, runs parallel searches, and attaches relevant files for chat—solving RAG’s scale and staleness issues.

From Chaos to Clarity

Before, I wasted hours sifting files. Now, KL MCP finds them instantly. For ML, “Find pipeline docs” preps files; “What’s the bottleneck?” pinpoints step three with a code estimate. For filings, “Locate trends” pulls 10-Qs; “Revenue outlook?” blends text, charts, and live data. For workflows, “Get deployment docs” sets up a chat; “Optimize this” yields a Python efficiency plot—all streamlined.

Built Responsibly

I coded KL MCP for fairness (diverse data), transparency (source logs), privacy (Azure Blob Storage, Cosmos DB), and reliability (rigorous testing).

Why It Matters

KL MCP handles text, images, live data, and code responsibly, outscaling Search with a library-like system. It’s for engineers with ML files, analysts with filings, or teams with workflows—anyone needing chat-ready insights from data clutter.

↑Back to Top