Complete RAG Workflow Guide#

This guide demonstrates a complete Retrieval Augmented Generation (RAG) workflow using embapi as your vector database.

Overview#

A typical RAG workflow involves:

Generate embeddings from your text content (using an external LLM service)
Upload embeddings to embapi
Search for similar documents based on a query
Retrieve the relevant context
Use the context with an LLM to generate responses

Prerequisites#

Access to embapi API with a valid API key
An external LLM service for generating embeddings (e.g., OpenAI, Cohere)
Text content you want to process

Step 1: Generate Embeddings Externally#

First, use your chosen LLM service to generate embeddings for your text content. Here’s an example using OpenAI’s API:

import openai

# Initialize OpenAI client
client = openai.OpenAI(api_key="your-openai-key")

# Generate embeddings for your text
text = "The quick brown fox jumps over the lazy dog"
response = client.embeddings.create(
    model="text-embedding-3-large",
    input=text,
    dimensions=3072
)

embedding_vector = response.data[0].embedding

Step 2: Create LLM Service Instance#

Before uploading embeddings, create an LLM service instance in embapi that matches your embedding configuration:

curl -X PUT "https://api.example.com/v1/llm-services/alice/my-openai" \
  -H "Authorization: Bearer alice_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "endpoint": "https://api.openai.com/v1/embeddings",
    "api_standard": "openai",
    "model": "text-embedding-3-large",
    "dimensions": 3072,
    "description": "OpenAI large embedding model",
    "api_key_encrypted": "sk-proj-your-openai-key"
  }'

Response:

{
  "instance_id": 123,
  "instance_handle": "my-openai",
  "owner": "alice",
  "endpoint": "https://api.openai.com/v1/embeddings",
  "model": "text-embedding-3-large",
  "dimensions": 3072
}

Step 3: Create a Project#

Create a project to organize your embeddings:

curl -X PUT "https://api.example.com/v1/projects/alice/my-documents" \
  -H "Authorization: Bearer alice_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "project_handle": "my-documents",
    "description": "Document embeddings for RAG",
    "instance_id": 123
  }'

Step 4: Upload Embeddings to embapi#

Upload your pre-generated embeddings along with metadata and optional text content:

curl -X POST "https://api.example.com/v1/embeddings/alice/my-documents" \
  -H "Authorization: Bearer alice_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "embeddings": [{
      "text_id": "doc001",
      "instance_handle": "my-openai",
      "vector": [0.021, -0.015, 0.043, ...],
      "vector_dim": 3072,
      "text": "The quick brown fox jumps over the lazy dog",
      "metadata": {
        "source": "example.txt",
        "author": "Alice",
        "category": "animals"
      }
    }]
  }'

Tip: Upload multiple embeddings in batches for efficiency (see Batch Operations Guide).

Step 5: Search for Similar Documents#

When you need to retrieve relevant context for a query:

Option A: Search by Stored Document ID#

If you already have a document in your database that represents your query:

curl -X GET "https://api.example.com/v1/similars/alice/my-documents/doc001?count=5&threshold=0.7" \
  -H "Authorization: Bearer alice_api_key"

Option B: Search with Raw Query Embedding#

Generate an embedding for your query and search without storing it:

# Generate query embedding
query = "What animals are mentioned?"
query_response = client.embeddings.create(
    model="text-embedding-3-large",
    input=query,
    dimensions=3072
)
query_vector = query_response.data[0].embedding

curl -X POST "https://api.example.com/v1/similars/alice/my-documents?count=5&threshold=0.7" \
  -H "Authorization: Bearer alice_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.032, -0.018, 0.056, ...]
  }'

Response:

{
  "user_handle": "alice",
  "project_handle": "my-documents",
  "results": [
    {
      "id": "doc001",
      "similarity": 0.95
    },
    {
      "id": "doc042",
      "similarity": 0.87
    },
    {
      "id": "doc103",
      "similarity": 0.82
    }
  ]
}

Step 6: Retrieve Context Documents#

Retrieve the full content and metadata for the most similar documents:

curl -X GET "https://api.example.com/v1/embeddings/alice/my-documents/doc001" \
  -H "Authorization: Bearer alice_api_key"

Response:

{
  "text_id": "doc001",
  "text": "The quick brown fox jumps over the lazy dog",
  "metadata": {
    "source": "example.txt",
    "author": "Alice",
    "category": "animals"
  },
  "vector_dim": 3072
}

Step 7: Use Context with LLM#

Combine the retrieved context with your original query to generate an informed response:

# Collect context from similar documents
context_docs = []
for result in similarity_results['results'][:3]:
    doc = get_document(result['id'])  # Your function to fetch document
    context_docs.append(doc['text'])

# Build context string
context = "\n\n".join(context_docs)

# Generate response with context
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "Answer based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
    ]
)

answer = response.choices[0].message.content

Complete Python Example#

Here’s a complete example combining all steps:

import openai
import requests

# Configuration
DHAMPS_API = "https://api.example.com"
EMBAPI_KEY = "your-embapi-key"
OPENAI_KEY = "your-openai-key"

# Initialize OpenAI
client = openai.OpenAI(api_key=OPENAI_KEY)

def embed_and_store(text_id, text, metadata=None):
    """Generate embedding and store in embapi"""
    # Generate embedding
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=text,
        dimensions=3072
    )
    vector = response.data[0].embedding
    
    # Upload to embapi
    requests.post(
        f"{DHAMPS_API}/v1/embeddings/alice/my-documents",
        headers={
            "Authorization": f"Bearer {EMBAPI_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "embeddings": [{
                "text_id": text_id,
                "instance_handle": "my-openai",
                "vector": vector,
                "vector_dim": 3072,
                "text": text,
                "metadata": metadata or {}
            }]
        }
    )

def search_similar(query, count=5):
    """Search for similar documents using query text"""
    # Generate query embedding
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=query,
        dimensions=3072
    )
    query_vector = response.data[0].embedding
    
    # Search in embapi
    result = requests.post(
        f"{DHAMPS_API}/v1/similars/alice/my-documents?count={count}",
        headers={
            "Authorization": f"Bearer {EMBAPI_KEY}",
            "Content-Type": "application/json"
        },
        json={"vector": query_vector}
    )
    return result.json()['results']

def retrieve_context(doc_ids):
    """Retrieve full document content"""
    docs = []
    for doc_id in doc_ids:
        response = requests.get(
            f"{DHAMPS_API}/v1/embeddings/alice/my-documents/{doc_id}",
            headers={"Authorization": f"Bearer {EMBAPI_KEY}"}
        )
        docs.append(response.json())
    return docs

def rag_query(query):
    """Complete RAG workflow"""
    # Search for similar documents
    similar = search_similar(query, count=3)
    
    # Retrieve context
    context_docs = retrieve_context([r['id'] for r in similar])
    context = "\n\n".join([doc['text'] for doc in context_docs])
    
    # Generate answer with LLM
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "Answer based on the provided context."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
        ]
    )
    
    return response.choices[0].message.content

# Usage
embed_and_store("doc001", "The quick brown fox jumps over the lazy dog", 
                {"category": "animals"})
answer = rag_query("What animals are mentioned?")
print(answer)

Best Practices#

Batch Upload: Upload embeddings in batches of 100-1000 for better performance
Use Metadata: Include rich metadata for better filtering and organization
Set Thresholds: Use similarity thresholds (e.g., 0.7) to filter low-quality matches
Cache Embeddings: Cache generated embeddings to avoid redundant API calls
Monitor Dimensions: Ensure all embeddings use consistent dimensions (3072 for text-embedding-3-large)

Advanced Features#

Metadata Filtering#

Exclude certain documents from search results using metadata filters:

# Exclude documents from the same author as the query
curl -X GET "https://api.example.com/v1/similars/alice/my-documents/doc001?metadata_path=author&metadata_value=Alice" \
  -H "Authorization: Bearer alice_api_key"

See the Metadata Filtering Guide for more details.

Metadata Validation#

Enforce consistent metadata structure using JSON Schema validation:

curl -X PATCH "https://api.example.com/v1/projects/alice/my-documents" \
  -H "Authorization: Bearer alice_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "metadataScheme": "{\"type\":\"object\",\"properties\":{\"author\":{\"type\":\"string\"},\"category\":{\"type\":\"string\"}},\"required\":[\"author\"]}"
  }'

See the Metadata Validation Guide for more details.

Batch Operations Guide - Efficiently upload large datasets
Metadata Filtering Guide - Advanced search filtering
Metadata Validation Guide - Schema validation
Instance Management Guide - Managing LLM service instances

Troubleshooting#

Dimension Mismatch Error#

{
  "title": "Bad Request",
  "status": 400,
  "detail": "dimension validation failed: vector dimension mismatch"
}

Solution: Ensure the vector_dim field matches the dimensions configured in your LLM service instance.

No Similar Results#

If searches return no results, try:

Lowering the similarity threshold (e.g., from 0.8 to 0.5)
Increasing the count parameter
Verifying embeddings are uploaded correctly
Checking that query embeddings use the same model and dimensions

Complete RAG Workflow Guide#

Overview#

Prerequisites#

Step 1: Generate Embeddings Externally#

Step 2: Create LLM Service Instance#

Step 3: Create a Project#

Step 4: Upload Embeddings to embapi#

Step 5: Search for Similar Documents#

Option A: Search by Stored Document ID#

Option B: Search with Raw Query Embedding#

Step 6: Retrieve Context Documents#

Step 7: Use Context with LLM#

Complete Python Example#

Best Practices#

Advanced Features#

Metadata Filtering#

Metadata Validation#

Related Documentation#

Troubleshooting#

Dimension Mismatch Error#

No Similar Results#