Architecture#
embapi is a vector database API designed for RAG (Retrieval Augmented Generation) workflows in Digital Humanities research.
System Overview#
┌─────────────┐
│ Client │
│ Application │
└──────┬──────┘
│ HTTP/REST
│
┌──────▼──────────────────────────┐
│ embapi API Server │
│ ┌──────────────────────────┐ │
│ │ Authentication Layer │ │
│ └────────┬─────────────────┘ │
│ ┌────────▼─────────────────┐ │
│ │ Request Handlers │ │
│ │ (Users, Projects, etc) │ │
│ └────────┬─────────────────┘ │
│ ┌────────▼─────────────────┐ │
│ │ Validation Layer │ │
│ │ (Dimensions, Metadata) │ │
│ └────────┬─────────────────┘ │
│ ┌────────▼─────────────────┐ │
│ │ SQLC Queries │ │
│ │ (Type-safe SQL) │ │
│ └────────┬─────────────────┘ │
└───────────┼──────────────────────┘
│
┌───────▼──────────────┐
│ PostgreSQL + 16 │
│ with pgvector 0.7 │
│ │
│ ┌────────────────┐ │
│ │ Vector Index │ │
│ │ (HNSW/IVFFlat) │ │
│ └────────────────┘ │
└──────────────────────┘Core Components#
API Layer#
Built with Huma framework on top of Go’s http.ServeMux:
- OpenAPI documentation generation
- Automatic request/response validation
- JSON schema support
- REST endpoint routing
Authentication#
Token-based authentication using API keys:
- Admin key: For administrative operations (user creation, system management)
- User keys: SHA-256 hashed, unique per user
- Bearer token: Transmitted in
Authorizationheader
Data Storage#
PostgreSQL with pgvector extension:
- Vector storage: Native pgvector support for embeddings
- Vector search: Cosine similarity using
<=>operator - ACID compliance: Transactional consistency
- Relational integrity: Foreign keys and constraints
Code Generation#
Uses sqlc for type-safe database queries:
- SQL queries → Go functions
- Compile-time type checking
- No ORM overhead
- Direct PostgreSQL integration
Data Model#
Core Entities#
users
├── projects (1:many)
│ ├── embeddings (1:many)
│ └── instance (1:1)
│
└── instances (1:many)
└── definition (many:1, optional)
_system (special user)
└── definitions (1:many)Key Relationships#
Users → Projects
- One user owns many projects
- Projects can be shared with other users (reader/editor roles)
- Projects can be public (unauthenticated read access)
Projects → Instances
- Each project references exactly one LLM service instance
- Instance defines embedding dimensions and configuration
Projects → Embeddings
- One project contains many embeddings
- Each embedding has a unique text_id within the project
- Embeddings store vector, metadata, and optional text
Users → Instances
- Users own their instances
- Instances can be shared with other users
- Instances store encrypted API keys
Instances → Definitions
- Instances can optionally reference a definition (template)
- System definitions (
_systemowner) provide defaults - User definitions allow custom templates
Request Flow#
1. Create Embedding#
Client Request
↓
Authentication Middleware
↓
Authorization Check (owner/editor?)
↓
Dimension Validation (vector_dim matches instance?)
↓
Metadata Validation (matches project schema?)
↓
Database Insert (with transaction)
↓
Response2. Similarity Search#
Client Request (text_id or vector)
↓
Authentication Middleware (or public check)
↓
Authorization Check (owner/reader/public?)
↓
Dimension Validation (if raw vector)
↓
Vector Similarity Query
├── Cosine distance calculation
├── Threshold filtering
├── Metadata filtering (exclude matches)
└── Limit/offset pagination
↓
Results (sorted by similarity)
↓
ResponseStorage Architecture#
Vector Index#
pgvector supports multiple index types:
- IVFFlat: Faster build, approximate search
- HNSW: Slower build, better recall
Current implementation uses HNSW for better accuracy.
Vector Storage Format#
CREATE TABLE embeddings (
embedding_id SERIAL PRIMARY KEY,
text_id TEXT NOT NULL,
project_id INT REFERENCES projects,
vector vector(3072), -- Dimension varies
vector_dim INT NOT NULL,
metadata JSONB,
text TEXT,
...
)Index Strategy#
CREATE INDEX embedding_vector_idx
ON embeddings
USING hnsw (vector vector_cosine_ops);Optimized for cosine similarity searches.
Security Architecture#
API Key Encryption#
- Algorithm: AES-256-GCM
- Key Source:
ENCRYPTION_KEYenvironment variable - Key Derivation: SHA-256 hash to ensure 32-byte key
- Storage: Binary (BYTEA) in database
Access Control#
Three-tier access model:
- Owner: Full control (read, write, delete, share, transfer)
- Editor: Read and write embeddings
- Reader: Read-only access to embeddings and search
Special access:
- Admin: System-wide operations (user management, sanity checks)
- Public: Unauthenticated read access (if
public_read=true)
Data Isolation#
- Users can only access their own resources or shared resources
- Cross-user queries are prevented at the database level
- Project ownership enforced via foreign keys
Migration System#
Uses tern for database migrations:
migrations/
├── 001_create_initial_scheme.sql
├── 002_create_emb_index.sql
├── 003_add_public_read_flag.sql
└── 004_refactor_llm_services_architecture.sqlMigrations run automatically on startup with rollback support.
Performance Characteristics#
Vector Search Performance#
- Small datasets (<10K embeddings): <10ms per query
- Medium datasets (10K-100K): 10-50ms per query
- Large datasets (>100K): 50-200ms per query
Performance depends on:
- Vector dimensions
- Index type and parameters
- Hardware (CPU, RAM, disk)
- Number of results requested
Scaling Considerations#
Vertical Scaling:
- More RAM = faster searches (more vectors in memory)
- Faster CPUs = faster vector comparisons
- SSD storage = faster index scans
Horizontal Scaling:
- Read replicas for search queries
- Separate write/read workloads
- Connection pooling for concurrent requests
Technology Stack#
Core Technologies#
- Language: Go 1.21+
- Web Framework: Huma 2.x
- Database: PostgreSQL 16+
- Vector Extension: pgvector 0.7.4
- Query Generator: sqlc 1.x
- Migration Tool: tern 2.x
Development Tools#
- Testing: Go standard library + testcontainers
- Documentation: OpenAPI 3.0 (auto-generated)
- Building: Docker multi-stage builds
- Deployment: Docker Compose
Design Principles#
1. Type Safety#
- sqlc generates type-safe Go code from SQL
- Strong typing prevents SQL injection
- Compile-time validation of queries
2. Simplicity#
- REST API (not GraphQL)
- Straightforward URL patterns
- Standard HTTP methods
3. Security#
- API key encryption at rest
- No API keys in responses
- Role-based access control
4. Validation#
- Automatic dimension validation
- Optional metadata schema validation
- Request/response validation via OpenAPI
5. Extensibility#
- User-defined metadata schemas
- Custom LLM service configurations
- Flexible sharing model
Limitations#
Current Constraints#
- No multi-tenancy: Each installation is single-tenant
- No replication: Manual setup required for HA
- No caching: All queries hit database
- Synchronous API: No async/batch upload endpoints
Future Enhancements#
See Roadmap for planned improvements.