AI TechnologyMay 19, 202510 min read

Feeding LLMs Without Leaking Secrets: A Guide for Companies On How To Add Your Company Data

In my previous article about AI, I mentioned how everyone suddenly seems to be an AI expert and how that creates a lot of noise. In this post, I'll break down one of the most important topics: how (and why) you should add your company's data to large language models.

Len Debets

CTO & co-founder

Feeding LLMs Without Leaking Secrets: A Guide for Companies On How To Add Your Company Data

In my previous article 9 Things I Really Hate About AI, I mentioned how everyone suddenly seems to be an AI expert and how that creates a lot of noise. In the next few posts, I’ll break down some key AI concepts specifically for business professionals. Why? Because even if you’re actively looking for information, much of what’s out there is either inaccurate or way too technical for the average manager to make sense of.

My goal is to make these complex topics understandable for anyone who needs to make smarter business decisions. This week, I’m kicking things off with one of the most important ones: how (and why) you should add your company’s data to large language models.

Let’s start with the underlying question:

Why should you add (company) data to a large language model?

Your data transforms a general LLM into a powerful tool that understands your specific needs and context, ultimately leading to better insights, automation, and competitive advantages.

Imagine a general LLM is like a very smart person who knows a lot about the world from reading countless books and articles. However, they don’t know anything specific about your business. Adding your data is like giving that smart person your company’s internal documents, customer conversations, product information, and industry-specific reports. This focused information allows the LLM to understand your unique context and provide much more valuable results.

Whether you’ve added context to a question or uploaded a document to tools like ChatGPT, you’ve already experienced the power of providing data to these models. This personal approach works well for individual use. However, when building solutions for your customers or employees, a more robust strategy for integrating your company’s data is essential. Let’s explore the various methods currently being used to effectively feed your data to large language models at scale.

The currently most used methods are:

Adding information directly to your question
Adding a file to the model (uploading a document)
Function calling (connecting an API)
Retrieval Augmented Generation (RAG)
Cache Augmented Generation (CAG)
Fine-tuning an existing model (create your own LLM)
Training a new foundational model (compete with OpenAI, Anthropic, etc.)

These are ranked from easy to hard. Let’s break them down.

1. Adding information directly to your question

Most people do this: You ask the model your question, followed by extra context (or background) to help the model answer better. This works. It’s also quick and dirty. If you need something done fast and you’re not worried about security or long-term reuse, it’s fine.

But this method has limits. You can only add so much information before the model starts to ignore parts of it. The longer your prompt, the more the model will focus on what came last. So, if you throw in 20 pages of data and ask a question at the end, chances are the first few pages will be ignored.

Also, you’re sending potentially sensitive data to a third party (yes, even if they say they don’t store it). So this method is fine for brainstorming or playing around, but you don’t want your board reports or customer data in here.

These methods (and the next three) are all limited by the model’s context window. That’s the maximum number of tokens (think of tokens as chunks of words or characters) the model can process in a single request. Depending on which model you’re using, that window can be quite small, which means you can’t simply dump all your company data in at once.

On top of that, you’re charged per token. So every time you send a large prompt, you’re paying more. If you try to scale this up across hundreds or thousands of requests, it quickly becomes expensive and inefficient. That’s why methods like RAG and CAG exist.

2. Adding a file to the model (uploading a document)

This is the method most people try after prompt injection. You upload a document, like a PDF, a policy document, or a user manual and then ask the model questions about it. Tools like ChatGPT (with Pro or Enterprise plans) and Claude make this easy. You upload the document in the chat interface, and the model appears to “read” it and answer your questions.

But here’s the catch: LLMs don’t actually “read” documents like humans do. Instead, they break the text into chunks (usually 200–500 words at a time), embed those chunks into a vector format, and then retrieve the most relevant ones when you ask a question. This is often invisible to you, but it’s happening behind the scenes.

This feels safer, but it’s not. The same risks as above apply. If you’re using ChatGPT or any third-party tool, your data goes through their servers. Unless you pay for enterprise-level privacy controls (and read the fine print), this is not where confidential company documents belong.

This method is perfect for quick document review or summarizing files. But if you want to build a company-wide solution (like a smart assistant or internal knowledge bot), you’ll need something more robust, Function calling or RAG.

3. Function Calling (Connecting an API)

Function calling is one of the most promising recent developments in LLMs. Instead of trying to make the model guess everything from natural language, you give it structured access to your systems and tools. That means the model doesn’t just answer questions, but it can trigger real actions.

Think of it like this: the LLM becomes the brain, and your APIs become the hands. You describe what functions are available (like “get customer order history” or “calculate monthly revenue”), and the model learns when and how to call them.

You don’t need to train the model to know your backend logic. Instead, you define the interface. Then, when someone asks: “What’s the weather in New York City?” the LLM knows it should trigger the a weather function, to get the temperature for that location. This same logic applies if you connect your company’s systems.

OpenAI’s GPT, Google’s Gemini, and Anthropic’s Claude models support function calling out of the box. You can use it to connect the LLM with your CRM, ERP, or support systems. Microsoft Copilot uses similar techniques to integrate with Excel, Outlook, and Teams.

Doing this with open-source models requires a lot more engineering, but can be done with popular meta-frameworks like LangChain or LlamaIndex.

4. Retrieval Augmented Generation (RAG)

RAG is the industry standard for adding company data to LLMs at scale. It’s how most enterprise AI systems are built today.

Here’s how it works:

You take your company’s documents, policies, manuals, and data
You break them into chunks (typically 200–500 words)
Each chunk gets converted into a vector embedding (a numerical representation)
These vectors are stored in a vector database (like Pinecone, Weaviate, or Qdrant)
When a user asks a question, their question also gets converted to a vector
The system searches the database for the most similar vectors
Those relevant chunks get fed to the LLM as context
The LLM generates an answer based on that context

The beauty of RAG is that it keeps your data separate from the model. You’re not training anything. You’re just giving the model real-time access to the information it needs. This means you can update your data without retraining, and you maintain full control.

RAG is what powers most enterprise chatbots, document Q&A systems, and internal knowledge assistants. It’s secure (you control the data), scalable (you can add millions of documents), and relatively affordable.

5. Cache Augmented Generation (CAG)

CAG is newer and less common, but it’s gaining traction. The idea is to cache frequently used prompts and responses so the model doesn’t have to regenerate the same answer repeatedly.

Imagine your support team gets asked the same 50 questions every day. Instead of running those questions through the full LLM pipeline each time, you cache the answers. When a similar question comes in, you serve the cached response instantly.

This reduces costs (fewer API calls), improves speed (no generation time), and ensures consistency (same answer every time). But it requires careful cache management and invalidation strategies.

Some providers like Anthropic have started offering prompt caching as a built-in feature, making this easier to implement.

6. Fine-tuning an existing model

Fine-tuning means taking a pre-trained model (like GPT-4, Llama, or Mistral) and training it further on your specific data. This is more involved than RAG, but it can produce better results for specialized tasks.

There are different levels of fine-tuning:

a) Light fine-tuning (LoRA, QLoRA): You don’t retrain the entire model. Instead, you add small adapter layers that learn your specific patterns. This is much cheaper and faster.

b) Full fine-tuning: You retrain the entire model on your data. This requires significant compute power (think dozens of GPUs) and expertise. Most companies don’t need this.

c) Instruction tuning: You fine-tune the model to follow specific instruction formats or domain-specific patterns. This is common for customer service bots or internal tools.

d) Deep customization: This is close to building your own model. You start with a base model checkpoint (like LLaMA, Mistral, or DeepSeek) and train it further on massive datasets, potentially hundreds of millions of tokens or more.

At this point, you’re creating your own model variant. You need serious MLOps. Evaluation pipelines. Guardrails. This is what AI-native companies do. It’s powerful, but probably not what your company needs, unless AI is your product.

7. Training a new foundational model

Unless you are OpenAI, Google, Mistral, Meta or Anthropic, just don’t. This costs tens (or hundreds) of millions. It requires large GPU infrastructure, research, and talent that most companies don’t have. OpenAI pays its AI engineers more than some companies pay their CEOs.

It also demands vast amounts of data, which isn’t easily accessible without significant resources or a large budget. Bloomberg tackled this challenge by developing its foundational model: BloombergGPT. To train it, they compiled a dataset of 363 billion finance-specific tokens from their proprietary database, along with an additional 345 billion general-purpose tokens from public online sources such as Wikipedia.

Some companies say they’ve built their own models. Most haven’t. They’ve fine-tuned open ones. Which is fine. But let’s not confuse that with building from scratch.

Breakdown of these methods and when to use them

It’s not easy to decide what you need for your use case, but here’s a quick comparison:

Quick Takeaways:

If you want speed and low cost, go with prompt injections or document uploads, but accept low safety and limited quality.
If you want enterprise-grade quality and safety, start with RAG and function calling.
If you’re AI-native or working in a specialized domain, consider fine-tuning.
If you’re not OpenAI, DeepMind, or Meta, avoid creating your own model.

Final Thoughts

If you want any more info on this, let me know in the comments, or just keep following my regular tech updates. I try to break down complex topics like this in a way that’s actually useful, especially for business folks trying to make sense of all the AI noise.

My next article will dive into image and video generation: how it works, what’s possible today, and what’s just hype.

Len Debets

CTO & co-founder

Published on May 19, 2025

AI TechnologyMay 12, 2025

9 Things I Really Hate About AI

Let's be honest: I think it's great that technology is so embedded in our daily lives. It helps us get knowledge faster, complete tasks more efficiently, gives us inspiration, and occasionally scares the hell out of us. But after spending a ridiculous amount of time with these technologies, I feel it's time to reflect on what I really hate about AI.

AI TechnologyMarch 28, 2023

At this point, most people are afraid to ask a simple question; what is artificial intelligence (AI)?

Someone once told me, stupid questions don't exist. This might be true, but people are still afraid to look stupid by asking questions that are assumed to be general knowledge. In this article, my goal is to get everyone up-to-speed on the basics of AI.

AI TechnologyFebruary 20, 2023

Business predictions based on my years of experience with ChatGPT and its predecessors

Over 5 years ago I started a chatbot (ad)venture called Blits.ai because I believed the way we as humans interact with data will change to a more conversation-based approach. With the current ChatGPT hype, more and more people are coming to the same conclusion.

Stay Updated

Get the latest insights on conversational AI, enterprise automation, and customer experience delivered to your inbox

No spam, unsubscribe at any time

Products

Deployment options

Models & Integrations

Industries

Use-cases