How Our RAG Pipeline Works

What is RAG?

RAG stands for Retrieval Augmented Generation. It's a technique that combines two things: a search system that finds relevant content, and a language model that generates answers from that content.

Instead of asking an AI to answer from its training data, RAG makes the AI answer from a specific set of documents you provide. Think of it as giving the AI a closed-book exam with only your notes allowed.

Step 1 — You upload a document

When you upload a PDF, paste text, or add a URL, Kognix reads the full content and breaks it into overlapping chunks of roughly 500 words each, with 50 words of overlap between chunks so context isn't lost at boundaries.

Step 2 — We convert chunks to embeddings

Each chunk is passed through an embedding model (we use OpenAI's text-embedding-3-small) which converts the text into a list of numbers — a vector — that represents the semantic meaning of that chunk. Similar chunks produce similar vectors.

These vectors are stored in Pinecone, a vector database, under your personal namespace so your documents never mix with another user's.

Step 3 — You ask a question

When you type a question, Kognix converts your question into an embedding using the same model. It then searches Pinecone for the 5 chunks whose vectors are most similar to your question's vector.

This is called cosine similarity search — it finds chunks that are semantically similar to your question, not just keyword-matching.

Step 4 — Claude answers from your chunks

The top 5 retrieved chunks are injected into a structured prompt sent to Claude. The prompt instructs Claude to answer only from the provided context and to cite which chunk each part of the answer came from.

Claude then streams the answer back token by token — which is why you see it appear word by word in real time.

The entire pipeline — upload to answer — takes under 3 seconds for most documents. Embedding runs in parallel with chunking to minimize latency.

Why this matters for accuracy

Because Claude only sees your chunks, it cannot hallucinate from training data. If the answer isn't in your documents, Claude will say so. The citation trail means every answer is verifiable — you can check the exact passage it came from.