Performance Optimization Guide#
This guide covers performance optimization strategies for embapi, including query optimization, indexing, caching, and performance testing.
Query Optimization#
GetAllAccessibleInstances Query#
Problem: The original implementation uses a LEFT JOIN with OR conditions, which can result in inefficient query execution.
Current Implementation:
SELECT instances.*,
COALESCE(instances_shared_with.role, 'owner') as role,
(instances.owner = $1) as is_owner
FROM instances
LEFT JOIN instances_shared_with
ON instances.instance_id = instances_shared_with.instance_id
WHERE instances.owner = $1
OR instances_shared_with.user_handle = $1
ORDER BY instances.owner ASC, instances.instance_handle ASC
LIMIT $2 OFFSET $3;Issue: The query planner may struggle to use indexes effectively with LEFT JOIN combined with OR conditions in the WHERE clause.
Recommended: UNION ALL Pattern#
Use UNION ALL to separate owned instances from shared instances:
-- Get owned instances
SELECT instances.*,
'owner' as role,
true as is_owner
FROM instances
WHERE instances.owner = $1
UNION ALL
-- Get shared instances
SELECT instances.*,
instances_shared_with.role,
false as is_owner
FROM instances
INNER JOIN instances_shared_with
ON instances.instance_id = instances_shared_with.instance_id
WHERE instances_shared_with.user_handle = $1
AND instances.owner != $1 -- Avoid duplicates
ORDER BY owner ASC, instance_handle ASC
LIMIT $2 OFFSET $3;Benefits:
- Separate index scans: Query planner can use different indexes for each UNION branch
- Owned instances: Can use index on
(owner) - Shared instances: Can use index on
(user_handle) - Clearer execution plan: Easier to understand and optimize
- Better performance: Especially noticeable with large datasets
Trade-offs:
- Slightly more complex SQL
- Need to deduplicate if user somehow has instance both owned and shared (unlikely scenario)
- Both queries must have same column structure
Implementation Example#
Before (queries.sql):
-- name: GetAllAccessibleInstances :many
SELECT instances.*, ...
FROM instances
LEFT JOIN instances_shared_with ON ...
WHERE instances.owner = $1 OR instances_shared_with.user_handle = $1After (queries.sql):
-- name: GetAllAccessibleInstances :many
SELECT instances.*, 'owner' as role, true as is_owner
FROM instances
WHERE instances.owner = $1
UNION ALL
SELECT instances.*, isw.role, false as is_owner
FROM instances
INNER JOIN instances_shared_with isw ON instances.instance_id = isw.instance_id
WHERE isw.user_handle = $1 AND instances.owner != $1
ORDER BY owner, instance_handle
LIMIT $2 OFFSET $3;When to optimize:
- Current implementation is correct and works well for small-medium datasets
- Consider optimization if performance becomes an issue with:
- Large numbers of instances (>1000)
- Many shared relationships (>100 shares per user)
- Query time consistently >100ms
Always profile first:
EXPLAIN ANALYZE
SELECT instances.*, ...
FROM instances
LEFT JOIN instances_shared_with ON ...
WHERE instances.owner = 'alice' OR instances_shared_with.user_handle = 'alice';Index Optimization#
Current Indexes#
From migration 004_refactor_llm_services_architecture.sql:
-- API Standards / Definitions
CREATE INDEX idx_definitions_handle
ON definitions(definition_handle);
CREATE INDEX idx_definitions_owner_handle
ON definitions(owner, definition_handle);
-- Instances
CREATE INDEX idx_instances_handle
ON instances(instance_handle);
-- Sharing (implicit from PRIMARY KEY)
-- instances_shared_with(instance_id, user_handle)Recommended Additional Indexes#
1. Owner Lookups#
-- For queries: WHERE instances.owner = ?
CREATE INDEX idx_instances_owner
ON instances(owner);Benefit: Fast retrieval of user’s owned instances
Use case:
SELECT * FROM instances WHERE owner = 'alice';2. Shared Instance Lookups#
-- For queries: WHERE user_handle = ?
CREATE INDEX idx_instances_shared_user
ON instances_shared_with(user_handle);Benefit: Fast retrieval of instances shared with user
Use case:
SELECT i.* FROM instances i
INNER JOIN instances_shared_with isw ON i.instance_id = isw.instance_id
WHERE isw.user_handle = 'bob';3. Composite Owner+Handle Index#
-- For unique constraint and lookups
CREATE UNIQUE INDEX idx_instances_owner_handle
ON instances(owner, instance_handle);Benefit: Enforces uniqueness and enables index-only scans
Use case:
SELECT * FROM instances
WHERE owner = 'alice' AND instance_handle = 'my-service';4. Embedding Dimension Filtering#
-- Critical for similarity search performance
CREATE INDEX idx_embeddings_project_dim
ON embeddings(project_id, vector_dim);Benefit: Filters embeddings by dimension before vector comparison
Use case:
SELECT * FROM embeddings
WHERE project_id = 123
AND vector_dim = 1536
ORDER BY vector <=> $1::vector
LIMIT 10;5. Project Ownership#
CREATE INDEX idx_projects_owner
ON projects(owner);
CREATE UNIQUE INDEX idx_projects_owner_handle
ON projects(owner, project_handle);Index Analysis#
Check if indexes are being used:
-- Analyze query plan
EXPLAIN ANALYZE
SELECT * FROM instances WHERE owner = 'alice';
-- Check index usage statistics
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname = 'public';Index Maintenance#
-- Update statistics for query planner
ANALYZE instances;
ANALYZE instances_shared_with;
ANALYZE embeddings;
-- Rebuild index if fragmented
REINDEX INDEX idx_instances_owner;
-- Check index size
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;Vector Index Optimization#
HNSW vs IVFFlat#
Current implementation uses HNSW:
CREATE INDEX embedding_vector_idx
ON embeddings
USING hnsw (vector vector_cosine_ops);HNSW (Hierarchical Navigable Small World)#
Pros:
- Better recall (finds more similar results)
- Better query performance
- No training required
Cons:
- Slower index build time
- Higher memory usage
- Larger index size
Configuration options:
-- Default: m=16, ef_construction=64
CREATE INDEX embedding_vector_idx
ON embeddings
USING hnsw (vector vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Higher quality (slower build): m=32, ef_construction=128
CREATE INDEX embedding_vector_idx_hq
ON embeddings
USING hnsw (vector vector_cosine_ops)
WITH (m = 32, ef_construction = 128);Parameters:
m: Number of connections per layer (default 16, range 2-100)ef_construction: Size of candidate list during build (default 64, range 4-1000)- Higher values = better recall but slower build and more memory
IVFFlat (Inverted File with Flat compression)#
Alternative for very large datasets:
CREATE INDEX embedding_vector_idx_ivf
ON embeddings
USING ivfflat (vector vector_cosine_ops)
WITH (lists = 100);Pros:
- Faster index build
- Lower memory usage
- Smaller index size
Cons:
- Requires training (ANALYZE before creating index)
- Lower recall than HNSW
- Need to tune
listsparameter
When to use:
- Dataset >1M embeddings
- Build time is critical
- Memory constrained environment
Configuration:
-- Rule of thumb: lists = sqrt(total_rows)
-- For 100K embeddings: lists = 316
-- For 1M embeddings: lists = 1000
-- Train the index
ANALYZE embeddings;
-- Create with appropriate lists
CREATE INDEX embedding_vector_idx_ivf
ON embeddings
USING ivfflat (vector vector_cosine_ops)
WITH (lists = 1000);
-- Set probes at query time
SET ivfflat.probes = 10; -- Default is 1, higher = better recallQuery-Time Optimization#
For HNSW, set hnsw.ef_search:
-- Default: ef_search = 40
-- Higher = better recall but slower queries
SET hnsw.ef_search = 100;
SELECT vector <=> $1::vector AS distance
FROM embeddings
WHERE project_id = 123
ORDER BY distance
LIMIT 10;Dimension-Specific Indexes#
For multi-dimensional projects:
-- Separate indexes per common dimension
CREATE INDEX idx_embeddings_768_vector
ON embeddings USING hnsw (vector vector_cosine_ops)
WHERE vector_dim = 768;
CREATE INDEX idx_embeddings_1536_vector
ON embeddings USING hnsw (vector vector_cosine_ops)
WHERE vector_dim = 1536;
CREATE INDEX idx_embeddings_3072_vector
ON embeddings USING hnsw (vector vector_cosine_ops)
WHERE vector_dim = 3072;Benefit: Smaller indexes = faster queries for specific dimensions
Caching Strategies#
1. System Definitions Cache#
System definitions rarely change:
var (
systemDefsCache []models.Definition
systemDefsCacheMu sync.RWMutex
systemDefsCacheTTL = 5 * time.Minute
systemDefsCacheExp time.Time
)
func GetSystemDefinitions(ctx context.Context, pool *pgxpool.Pool) ([]models.Definition, error) {
systemDefsCacheMu.RLock()
if time.Now().Before(systemDefsCacheExp) && systemDefsCache != nil {
defer systemDefsCacheMu.RUnlock()
return systemDefsCache, nil
}
systemDefsCacheMu.RUnlock()
// Fetch from database
defs, err := db.GetDefinitionsByOwner(ctx, "_system")
if err != nil {
return nil, err
}
// Update cache
systemDefsCacheMu.Lock()
systemDefsCache = defs
systemDefsCacheExp = time.Now().Add(systemDefsCacheTTL)
systemDefsCacheMu.Unlock()
return defs, nil
}2. User Instances Cache#
Cache user’s instance list with short TTL:
type InstanceCache struct {
data map[string][]models.Instance
mu sync.RWMutex
ttl time.Duration
exp map[string]time.Time
}
func (c *InstanceCache) Get(userHandle string) ([]models.Instance, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
if exp, ok := c.exp[userHandle]; ok && time.Now().Before(exp) {
return c.data[userHandle], true
}
return nil, false
}
func (c *InstanceCache) Set(userHandle string, instances []models.Instance) {
c.mu.Lock()
defer c.mu.Unlock()
c.data[userHandle] = instances
c.exp[userHandle] = time.Now().Add(c.ttl)
}
func (c *InstanceCache) Invalidate(userHandle string) {
c.mu.Lock()
defer c.mu.Unlock()
delete(c.data, userHandle)
delete(c.exp, userHandle)
}Usage:
var instanceCache = &InstanceCache{
data: make(map[string][]models.Instance),
exp: make(map[string]time.Time),
ttl: 30 * time.Second,
}
func GetUserInstances(ctx context.Context, pool *pgxpool.Pool, userHandle string) ([]models.Instance, error) {
// Check cache
if instances, ok := instanceCache.Get(userHandle); ok {
return instances, nil
}
// Query database
instances, err := db.GetAccessibleInstances(ctx, userHandle)
if err != nil {
return nil, err
}
// Cache results
instanceCache.Set(userHandle, instances)
return instances, nil
}3. Project Metadata Cache#
Cache project metadata including schema:
type ProjectCache struct {
projects map[string]*models.Project // key: "owner/handle"
mu sync.RWMutex
ttl time.Duration
}
func (c *ProjectCache) Get(owner, handle string) (*models.Project, bool) {
key := fmt.Sprintf("%s/%s", owner, handle)
c.mu.RLock()
defer c.mu.RUnlock()
project, ok := c.projects[key]
return project, ok
}4. Redis-Based Caching#
For distributed deployments:
import "github.com/go-redis/redis/v8"
type RedisCache struct {
client *redis.Client
ttl time.Duration
}
func (c *RedisCache) GetProject(ctx context.Context, owner, handle string) (*models.Project, error) {
key := fmt.Sprintf("project:%s:%s", owner, handle)
data, err := c.client.Get(ctx, key).Bytes()
if err == redis.Nil {
return nil, nil // Not in cache
} else if err != nil {
return nil, err
}
var project models.Project
err = json.Unmarshal(data, &project)
return &project, err
}
func (c *RedisCache) SetProject(ctx context.Context, project *models.Project) error {
key := fmt.Sprintf("project:%s:%s", project.Owner, project.Handle)
data, err := json.Marshal(project)
if err != nil {
return err
}
return c.client.Set(ctx, key, data, c.ttl).Err()
}Cache Invalidation#
Always invalidate cache on updates:
func UpdateProject(ctx context.Context, pool *pgxpool.Pool, project *models.Project) error {
// Update database
err := db.UpdateProject(ctx, project)
if err != nil {
return err
}
// Invalidate cache
projectCache.Invalidate(project.Owner, project.Handle)
return nil
}Connection Pool Optimization#
Pool Configuration#
func InitDB(opts *models.Options) *pgxpool.Pool {
config, err := pgxpool.ParseConfig(connString)
if err != nil {
log.Fatal(err)
}
// Connection pool settings
config.MaxConns = 20 // Max concurrent connections
config.MinConns = 5 // Keep-alive connections
config.MaxConnLifetime = time.Hour // Recycle connections
config.MaxConnIdleTime = 5 * time.Minute // Close idle connections
config.HealthCheckPeriod = time.Minute // Health check frequency
// Statement cache
config.ConnConfig.StatementCacheCapacity = 100
pool, err := pgxpool.NewWithConfig(context.Background(), config)
return pool
}Pool Sizing#
General rule:
MaxConns = (available_cores * 2) + effective_spindle_countExample scenarios:
- 4-core CPU, SSD: MaxConns = 10-20
- 8-core CPU, SSD: MaxConns = 20-40
- Under heavy load: Start conservative, increase based on monitoring
Monitor Pool Usage#
func monitorPool(pool *pgxpool.Pool) {
ticker := time.NewTicker(30 * time.Second)
for range ticker.C {
stat := pool.Stat()
log.Printf("Pool stats: total=%d, idle=%d, acquired=%d, waiting=%d",
stat.TotalConns(),
stat.IdleConns(),
stat.AcquiredConns(),
stat.MaxConns()-stat.TotalConns(),
)
}
}Performance Testing#
Load Testing Setup#
Use vegeta for HTTP load testing:
# Install vegeta
go install github.com/tsenart/vegeta@latest
# Create targets file
cat > targets.txt <<EOF
GET http://localhost:8880/v1/projects/alice
Authorization: Bearer alice_api_key
POST http://localhost:8880/v1/similars/alice/project1/doc1
Authorization: Bearer alice_api_key
Content-Type: application/json
GET http://localhost:8880/v1/embeddings/alice/project1?limit=100
Authorization: Bearer alice_api_key
EOF
# Run load test
echo "GET http://localhost:8880/v1/projects/alice" | \
vegeta attack -duration=60s -rate=100 -header="Authorization: Bearer key" | \
vegeta reportBenchmark Tests#
Create Go benchmarks:
func BenchmarkGetAccessibleInstances(b *testing.B) {
pool := setupBenchmarkDB(b)
defer pool.Close()
// Create test data
createTestInstances(b, pool, 1000)
ctx := context.Background()
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, err := db.GetAllAccessibleInstances(ctx, "testuser", 10, 0)
if err != nil {
b.Fatal(err)
}
}
}
func BenchmarkSimilaritySearch(b *testing.B) {
pool := setupBenchmarkDB(b)
defer pool.Close()
// Create embeddings
createTestEmbeddings(b, pool, 10000)
ctx := context.Background()
queryVector := generateRandomVector(1536)
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, err := SearchSimilar(ctx, pool, "alice", "project1", queryVector, 10, 0.5)
if err != nil {
b.Fatal(err)
}
}
}Run benchmarks:
# Run all benchmarks
go test -bench=. -benchmem ./...
# Run specific benchmark
go test -bench=BenchmarkSimilaritySearch -benchmem ./internal/handlers
# Compare before/after
go test -bench=. -benchmem ./... > old.txt
# Make changes
go test -bench=. -benchmem ./... > new.txt
benchcmp old.txt new.txtDatabase Performance Testing#
Test with realistic data:
-- Generate test data
INSERT INTO embeddings (project_id, text_id, vector, vector_dim, metadata)
SELECT
1,
'doc_' || generate_series,
array_fill(random()::real, ARRAY[1536])::vector,
1536,
'{"author": "test"}'::jsonb
FROM generate_series(1, 100000);
-- Test query performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT text_id, vector <=> $1::vector AS distance
FROM embeddings
WHERE project_id = 1 AND vector_dim = 1536
ORDER BY distance
LIMIT 10;Metrics and Monitoring#
Application Metrics#
Track key metrics:
type Metrics struct {
QueryDuration prometheus.Histogram
QueryCount prometheus.Counter
CacheHits prometheus.Counter
CacheMisses prometheus.Counter
PoolWaitDuration prometheus.Histogram
}
func recordQueryMetrics(start time.Time, query string) {
duration := time.Since(start).Seconds()
metrics.QueryDuration.Observe(duration)
metrics.QueryCount.Inc()
}PostgreSQL Metrics#
Monitor database performance:
-- Slow queries
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
-- Table statistics
SELECT
schemaname,
tablename,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup
FROM pg_stat_user_tables;
-- Index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;Performance Targets#
Based on typical usage:
| Operation | Target | Acceptable | Action Required |
|---|---|---|---|
| Single instance lookup | < 10ms | < 50ms | > 50ms |
| List accessible instances (<100) | < 50ms | < 100ms | > 100ms |
| Create/update instance | < 100ms | < 200ms | > 200ms |
| Similarity search (10 results) | < 50ms | < 100ms | > 100ms |
| Similarity search (100 results) | < 100ms | < 200ms | > 200ms |
| Embedding insert (single) | < 50ms | < 100ms | > 100ms |
| Embedding batch (100) | < 500ms | < 1000ms | > 1000ms |
Implementation Priority#
High Priority#
- Profile current performance with realistic data
- Add dimension filtering index:
idx_embeddings_project_dim - Monitor slow queries with pg_stat_statements
Medium Priority#
- Implement UNION ALL optimization if GetAllAccessibleInstances > 100ms
- Add caching for system definitions
- Optimize connection pool settings based on load
Low Priority#
- Add Redis caching layer for high-traffic deployments
- Implement application metrics with Prometheus
- Add additional indexes based on actual query patterns
- Tune HNSW parameters for specific use cases
General Best Practices#
1. Measure Before Optimizing#
# Profile in production
EXPLAIN ANALYZE your_query;
# Load test
vegeta attack -duration=60s -rate=100
# Benchmark
go test -bench=. -benchmem2. Start Conservative#
- Don’t optimize prematurely
- Use simple queries first
- Add complexity only when needed
3. Monitor Continuously#
- Track query performance
- Monitor connection pool
- Watch index usage
- Alert on slow queries
4. Document Optimizations#
- Note why optimization was needed
- Include before/after metrics
- Document trade-offs made