Complete RAG Workflow Guide#
This guide demonstrates a complete Retrieval Augmented Generation (RAG) workflow using embapi as your vector database.
Overview#
A typical RAG workflow involves:
- Generate embeddings from your text content (using an external LLM service)
- Upload embeddings to embapi
- Search for similar documents based on a query
- Retrieve the relevant context
- Use the context with an LLM to generate responses
Prerequisites#
- Access to embapi API with a valid API key
- An external LLM service for generating embeddings (e.g., OpenAI, Cohere)
- Text content you want to process
Step 1: Generate Embeddings Externally#
First, use your chosen LLM service to generate embeddings for your text content. Here’s an example using OpenAI’s API:
import openai
# Initialize OpenAI client
client = openai.OpenAI(api_key="your-openai-key")
# Generate embeddings for your text
text = "The quick brown fox jumps over the lazy dog"
response = client.embeddings.create(
model="text-embedding-3-large",
input=text,
dimensions=3072
)
embedding_vector = response.data[0].embeddingStep 2: Create LLM Service Instance#
Before uploading embeddings, create an LLM service instance in embapi that matches your embedding configuration:
curl -X PUT "https://api.example.com/v1/llm-services/alice/my-openai" \
-H "Authorization: Bearer alice_api_key" \
-H "Content-Type: application/json" \
-d '{
"endpoint": "https://api.openai.com/v1/embeddings",
"api_standard": "openai",
"model": "text-embedding-3-large",
"dimensions": 3072,
"description": "OpenAI large embedding model",
"api_key_encrypted": "sk-proj-your-openai-key"
}'Response:
{
"instance_id": 123,
"instance_handle": "my-openai",
"owner": "alice",
"endpoint": "https://api.openai.com/v1/embeddings",
"model": "text-embedding-3-large",
"dimensions": 3072
}Step 3: Create a Project#
Create a project to organize your embeddings:
curl -X PUT "https://api.example.com/v1/projects/alice/my-documents" \
-H "Authorization: Bearer alice_api_key" \
-H "Content-Type: application/json" \
-d '{
"project_handle": "my-documents",
"description": "Document embeddings for RAG",
"instance_id": 123
}'Step 4: Upload Embeddings to embapi#
Upload your pre-generated embeddings along with metadata and optional text content:
curl -X POST "https://api.example.com/v1/embeddings/alice/my-documents" \
-H "Authorization: Bearer alice_api_key" \
-H "Content-Type: application/json" \
-d '{
"embeddings": [{
"text_id": "doc001",
"instance_handle": "my-openai",
"vector": [0.021, -0.015, 0.043, ...],
"vector_dim": 3072,
"text": "The quick brown fox jumps over the lazy dog",
"metadata": {
"source": "example.txt",
"author": "Alice",
"category": "animals"
}
}]
}'Tip: Upload multiple embeddings in batches for efficiency (see Batch Operations Guide).
Step 5: Search for Similar Documents#
When you need to retrieve relevant context for a query:
Option A: Search by Stored Document ID#
If you already have a document in your database that represents your query:
curl -X GET "https://api.example.com/v1/similars/alice/my-documents/doc001?count=5&threshold=0.7" \
-H "Authorization: Bearer alice_api_key"Option B: Search with Raw Query Embedding#
Generate an embedding for your query and search without storing it:
# Generate query embedding
query = "What animals are mentioned?"
query_response = client.embeddings.create(
model="text-embedding-3-large",
input=query,
dimensions=3072
)
query_vector = query_response.data[0].embeddingcurl -X POST "https://api.example.com/v1/similars/alice/my-documents?count=5&threshold=0.7" \
-H "Authorization: Bearer alice_api_key" \
-H "Content-Type: application/json" \
-d '{
"vector": [0.032, -0.018, 0.056, ...]
}'Response:
{
"user_handle": "alice",
"project_handle": "my-documents",
"results": [
{
"id": "doc001",
"similarity": 0.95
},
{
"id": "doc042",
"similarity": 0.87
},
{
"id": "doc103",
"similarity": 0.82
}
]
}Step 6: Retrieve Context Documents#
Retrieve the full content and metadata for the most similar documents:
curl -X GET "https://api.example.com/v1/embeddings/alice/my-documents/doc001" \
-H "Authorization: Bearer alice_api_key"Response:
{
"text_id": "doc001",
"text": "The quick brown fox jumps over the lazy dog",
"metadata": {
"source": "example.txt",
"author": "Alice",
"category": "animals"
},
"vector_dim": 3072
}Step 7: Use Context with LLM#
Combine the retrieved context with your original query to generate an informed response:
# Collect context from similar documents
context_docs = []
for result in similarity_results['results'][:3]:
doc = get_document(result['id']) # Your function to fetch document
context_docs.append(doc['text'])
# Build context string
context = "\n\n".join(context_docs)
# Generate response with context
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Answer based on the provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
]
)
answer = response.choices[0].message.contentComplete Python Example#
Here’s a complete example combining all steps:
import openai
import requests
# Configuration
DHAMPS_API = "https://api.example.com"
EMBAPI_KEY = "your-embapi-key"
OPENAI_KEY = "your-openai-key"
# Initialize OpenAI
client = openai.OpenAI(api_key=OPENAI_KEY)
def embed_and_store(text_id, text, metadata=None):
"""Generate embedding and store in embapi"""
# Generate embedding
response = client.embeddings.create(
model="text-embedding-3-large",
input=text,
dimensions=3072
)
vector = response.data[0].embedding
# Upload to embapi
requests.post(
f"{DHAMPS_API}/v1/embeddings/alice/my-documents",
headers={
"Authorization": f"Bearer {EMBAPI_KEY}",
"Content-Type": "application/json"
},
json={
"embeddings": [{
"text_id": text_id,
"instance_handle": "my-openai",
"vector": vector,
"vector_dim": 3072,
"text": text,
"metadata": metadata or {}
}]
}
)
def search_similar(query, count=5):
"""Search for similar documents using query text"""
# Generate query embedding
response = client.embeddings.create(
model="text-embedding-3-large",
input=query,
dimensions=3072
)
query_vector = response.data[0].embedding
# Search in embapi
result = requests.post(
f"{DHAMPS_API}/v1/similars/alice/my-documents?count={count}",
headers={
"Authorization": f"Bearer {EMBAPI_KEY}",
"Content-Type": "application/json"
},
json={"vector": query_vector}
)
return result.json()['results']
def retrieve_context(doc_ids):
"""Retrieve full document content"""
docs = []
for doc_id in doc_ids:
response = requests.get(
f"{DHAMPS_API}/v1/embeddings/alice/my-documents/{doc_id}",
headers={"Authorization": f"Bearer {EMBAPI_KEY}"}
)
docs.append(response.json())
return docs
def rag_query(query):
"""Complete RAG workflow"""
# Search for similar documents
similar = search_similar(query, count=3)
# Retrieve context
context_docs = retrieve_context([r['id'] for r in similar])
context = "\n\n".join([doc['text'] for doc in context_docs])
# Generate answer with LLM
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Answer based on the provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
]
)
return response.choices[0].message.content
# Usage
embed_and_store("doc001", "The quick brown fox jumps over the lazy dog",
{"category": "animals"})
answer = rag_query("What animals are mentioned?")
print(answer)Best Practices#
- Batch Upload: Upload embeddings in batches of 100-1000 for better performance
- Use Metadata: Include rich metadata for better filtering and organization
- Set Thresholds: Use similarity thresholds (e.g., 0.7) to filter low-quality matches
- Cache Embeddings: Cache generated embeddings to avoid redundant API calls
- Monitor Dimensions: Ensure all embeddings use consistent dimensions (3072 for text-embedding-3-large)
Advanced Features#
Metadata Filtering#
Exclude certain documents from search results using metadata filters:
# Exclude documents from the same author as the query
curl -X GET "https://api.example.com/v1/similars/alice/my-documents/doc001?metadata_path=author&metadata_value=Alice" \
-H "Authorization: Bearer alice_api_key"See the Metadata Filtering Guide for more details.
Metadata Validation#
Enforce consistent metadata structure using JSON Schema validation:
curl -X PATCH "https://api.example.com/v1/projects/alice/my-documents" \
-H "Authorization: Bearer alice_api_key" \
-H "Content-Type: application/json" \
-d '{
"metadataScheme": "{\"type\":\"object\",\"properties\":{\"author\":{\"type\":\"string\"},\"category\":{\"type\":\"string\"}},\"required\":[\"author\"]}"
}'See the Metadata Validation Guide for more details.
Related Documentation#
- Batch Operations Guide - Efficiently upload large datasets
- Metadata Filtering Guide - Advanced search filtering
- Metadata Validation Guide - Schema validation
- Instance Management Guide - Managing LLM service instances
Troubleshooting#
Dimension Mismatch Error#
{
"title": "Bad Request",
"status": 400,
"detail": "dimension validation failed: vector dimension mismatch"
}Solution: Ensure the vector_dim field matches the dimensions configured in your LLM service instance.
No Similar Results#
If searches return no results, try:
- Lowering the similarity threshold (e.g., from 0.8 to 0.5)
- Increasing the count parameter
- Verifying embeddings are uploaded correctly
- Checking that query embeddings use the same model and dimensions