AI & Machine Learning

Building RAG: Why My First Three Attempts Failed

December 27, 2024 3 min read By Amey Lokare

🎯 The Goal

I wanted to build a RAG (Retrieval Augmented Generation) system for my documentation. Users could ask questions, and the system would find relevant docs and generate answers.

Simple, right? Wrong. It took me three attempts to get it right.

The journey: Three attempts, three failures, one success. Here's what I learned.

❌ Attempt #1: Naive Vector Search

What I did: Chunked documents, embedded them, stored in a vector database. Simple similarity search.

What went wrong:

  • Chunks were too small (100 tokens) - lost context
  • No metadata filtering - returned irrelevant results
  • Simple cosine similarity - poor ranking
  • No re-ranking - first result wasn't always best

Result: Answers were generic, often wrong, missing important context.

❌ Attempt #2: Over-Engineering

What I did: Added complex reranking, multiple retrieval strategies, hybrid search.

What went wrong:

  • Too complex - hard to debug
  • Slow - multiple retrieval passes
  • Overfitting - worked for test cases, failed in production
  • Maintenance nightmare

Result: System was slow, unreliable, and hard to maintain.

❌ Attempt #3: Wrong Embedding Model

What I did: Used a general-purpose embedding model.

What went wrong:

  • Model wasn't trained on technical docs
  • Poor understanding of code snippets
  • Didn't capture domain-specific concepts

Result: Poor retrieval quality, especially for technical content.

✅ What Finally Worked

1. Smart Chunking

Chunk by semantic boundaries, not just size:

def chunk_document(text, max_size=500):
    # Split by paragraphs first
    paragraphs = text.split('\n\n')
    
    chunks = []
    current_chunk = []
    current_size = 0
    
    for para in paragraphs:
        para_size = len(para.split())
        if current_size + para_size > max_size and current_chunk:
            chunks.append('\n\n'.join(current_chunk))
            current_chunk = [para]
            current_size = para_size
        else:
            current_chunk.append(para)
            current_size += para_size
    
    if current_chunk:
        chunks.append('\n\n'.join(current_chunk))
    
    return chunks

2. Domain-Specific Embeddings

Used embeddings trained on technical documentation:

from sentence_transformers import SentenceTransformer

# Use model trained on technical docs
model = SentenceTransformer('all-MiniLM-L6-v2')
# Or better: fine-tune on your docs

3. Metadata Filtering

Add metadata to chunks and filter during retrieval:

chunk_metadata = {
    'document_id': doc_id,
    'section': section_name,
    'doc_type': 'api' | 'tutorial' | 'reference',
    'tags': ['authentication', 'api', 'security']
}

# Filter during retrieval
results = vector_db.search(
    query_embedding,
    filter={'doc_type': 'api', 'tags': 'authentication'},
    top_k=5
)

4. Simple Re-ranking

Use a cross-encoder for final ranking:

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Rerank top 10 results
reranked = reranker.predict([
    (query, chunk) for chunk in top_10_chunks
])

📊 Results

Attempt Accuracy Speed Status
Attempt 1 45% Fast Failed
Attempt 2 65% Slow Failed
Attempt 3 55% Fast Failed
Final Solution 85% Fast Success

💡 Key Lessons

  • Chunking matters: Semantic chunking beats size-based
  • Embeddings matter: Use domain-specific models
  • Metadata matters: Filtering improves relevance
  • Simplicity matters: Don't over-engineer
  • Iteration matters: Learn from each failure

🎯 Conclusion

Building RAG is harder than it looks. My first three attempts failed because I focused on the wrong things. The solution that worked was simpler, but it required understanding the fundamentals: good chunking, right embeddings, and smart filtering.

Don't give up after the first failure. Each attempt teaches you something valuable.

Comments

Leave a Comment

Related Posts