Database Setup#
embapi requires PostgreSQL 11 or later with the pgvector extension.
Requirements#
- PostgreSQL: Version 11 or higher
- pgvector: Extension for vector similarity search
- Storage: Depends on your embeddings volume (estimate: 4 bytes × dimensions × embeddings count)
Installing PostgreSQL with pgvector#
Using Docker (Recommended)#
The easiest way to get PostgreSQL with pgvector:
docker run -d \
--name postgres-pgvector \
-p 5432:5432 \
-e POSTGRES_PASSWORD=secure_password \
-v postgres_data:/var/lib/postgresql/data \
pgvector/pgvector:0.7.4-pg16On Ubuntu/Debian#
# Add PostgreSQL APT repository
sudo apt install curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
https://www.postgresql.org/media/keys/ACCC4CF8.asc
echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" | \
sudo tee /etc/apt/sources.list.d/pgdg.list
# Install PostgreSQL
sudo apt update
sudo apt install postgresql-16 postgresql-contrib-16
# Install pgvector
sudo apt install postgresql-16-pgvectorOn macOS#
# Using Homebrew
brew install postgresql@16
brew install pgvector
# Start PostgreSQL
brew services start postgresql@16On RHEL/CentOS/Fedora#
# Install PostgreSQL
sudo dnf install postgresql16-server postgresql16-contrib
# Install pgvector (build from source)
sudo dnf install postgresql16-devel git gcc make
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install PG_CONFIG=/usr/pgsql-16/bin/pg_configDatabase Configuration#
Step 1: Create Database#
Connect to PostgreSQL as superuser:
# Local connection
sudo -u postgres psql
# Remote connection
psql -h db.example.com -U postgresCreate the database:
-- Create database
CREATE DATABASE embapi;Step 2: Create User#
Create a dedicated user for embapi:
-- Create user with password
CREATE USER embapi_user WITH PASSWORD 'secure_password_here';Best practices:
- Use strong, randomly generated password (e.g.,
openssl rand -base64 32) - Store password securely (password manager, secrets management)
- Rotate passwords regularly
Step 3: Grant Privileges#
Grant necessary permissions:
-- Grant database privileges
GRANT ALL PRIVILEGES ON DATABASE embapi TO embapi_user;
-- Connect to the database
\c embapi
-- Grant schema privileges
GRANT ALL ON SCHEMA public TO embapi_user;
-- For PostgreSQL 15+, also grant:
GRANT CREATE ON DATABASE embapi TO embapi_user;Step 4: Enable pgvector Extension#
Still connected to the embapi database:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify extension is installed
\dxExpected output should include:
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------
vector | 0.7.4 | public | vector data type and ivfflat and hnsw access methodsStep 5: Verify Setup#
Test the setup:
-- Test vector creation
SELECT '[1,2,3]'::vector;
-- Should return: [1,2,3]
-- Test vector distance
SELECT '[1,2,3]'::vector <=> '[4,5,6]'::vector AS distance;
-- Should return a float (distance value)Exit psql:
\qConnection String Format#
embapi connects using these environment variables:
SERVICE_DBHOST=localhost # Database hostname
SERVICE_DBPORT=5432 # Database port
SERVICE_DBUSER=embapi_user # Database username
SERVICE_DBPASSWORD=password # Database password
SERVICE_DBNAME=embapi # Database nameThe connection string format used internally:
postgresql://username:password@host:port/database?sslmode=disableProduction Configuration#
PostgreSQL Tuning#
Edit postgresql.conf for better vector search performance:
# Memory settings
shared_buffers = 4GB # 25% of RAM
effective_cache_size = 12GB # 75% of RAM
maintenance_work_mem = 1GB
work_mem = 50MB
# Parallel query settings
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
# Connection settings
max_connections = 100
# For pgvector specifically
shared_preload_libraries = 'vector' # Add if not present
# Write-ahead log
wal_level = replica # For replication
max_wal_senders = 3 # For replicasRestart PostgreSQL after changes:
# On systemd systems
sudo systemctl restart postgresql
# On Docker
docker restart postgres-containerConnection Pooling#
For high-load scenarios, use connection pooling with PgBouncer:
# Install PgBouncer
sudo apt install pgbouncer
# Configure /etc/pgbouncer/pgbouncer.ini
[databases]
embapi = host=localhost port=5432 dbname=embapi
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20Connect embapi to PgBouncer instead of PostgreSQL directly.
SSL/TLS Encryption#
Enable SSL in postgresql.conf:
ssl = on
ssl_cert_file = '/etc/postgresql/server.crt'
ssl_key_file = '/etc/postgresql/server.key'
ssl_ca_file = '/etc/postgresql/ca.crt'Update connection string:
# Update embapi configuration
SERVICE_DBHOST=db.example.com
# Change connection to use SSL (requires code modification or PostgreSQL parameter)Network Security#
Restrict access in pg_hba.conf:
# TYPE DATABASE USER ADDRESS METHOD
# Allow embapi_user from application server only
host embapi embapi_user 10.0.1.0/24 md5
# Allow localhost connections
local all all peer
host all all 127.0.0.1/32 md5
host all all ::1/128 md5Reload PostgreSQL:
sudo systemctl reload postgresqlBackup and Recovery#
Automated Backups#
Daily backup script:
#!/bin/bash
# /usr/local/bin/backup-embapi.sh
BACKUP_DIR="/backups/embapi"
DATE=$(date +%Y%m%d_%H%M%S)
DB_NAME="embapi"
# Create backup
pg_dump -U embapi_user -h localhost $DB_NAME | gzip > "$BACKUP_DIR/embapi-$DATE.sql.gz"
# Keep last 30 days
find $BACKUP_DIR -name "embapi-*.sql.gz" -mtime +30 -delete
# Verify backup
if [ $? -eq 0 ]; then
echo "Backup successful: $DATE"
else
echo "Backup failed: $DATE" >&2
exit 1
fiSet up cron job:
# Run daily at 2 AM
0 2 * * * /usr/local/bin/backup-embapi.shRestore from Backup#
# Decompress and restore
gunzip -c /backups/embapi/embapi-20240208.sql.gz | \
psql -U embapi_user -h localhost embapiPoint-in-Time Recovery (PITR)#
Enable WAL archiving in postgresql.conf:
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'Monitoring#
Check Database Size#
-- Database size
SELECT pg_size_pretty(pg_database_size('embapi'));
-- Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;Check Connection Status#
-- Active connections
SELECT count(*) FROM pg_stat_activity WHERE datname = 'embapi';
-- Connection details
SELECT
pid,
usename,
application_name,
client_addr,
state,
query
FROM pg_stat_activity
WHERE datname = 'embapi';Monitor Performance#
-- Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- View slow queries
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_stat_statements%'
ORDER BY mean_time DESC
LIMIT 10;Troubleshooting#
Cannot Connect to Database#
# Check if PostgreSQL is running
sudo systemctl status postgresql
# Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-16-main.log
# Test connection
psql -h localhost -U embapi_user -d embapipgvector Extension Not Found#
-- Check available extensions
SELECT * FROM pg_available_extensions WHERE name = 'vector';
-- If not listed, install pgvector package
-- See installation section abovePermission Denied#
-- Reconnect as superuser
\c embapi postgres
-- Re-grant privileges
GRANT ALL PRIVILEGES ON DATABASE embapi TO embapi_user;
GRANT ALL ON SCHEMA public TO embapi_user;
GRANT ALL ON ALL TABLES IN SCHEMA public TO embapi_user;
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO embapi_user;
-- For future objects
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO embapi_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO embapi_user;Out of Disk Space#
# Check disk usage
df -h
# Check PostgreSQL data directory
du -sh /var/lib/postgresql/16/main/
# Clean up old WAL files (if safe)
# Check archive status firstSlow Queries#
-- Check missing indexes
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation
FROM pg_stats
WHERE schemaname = 'public'
ORDER BY n_distinct DESC;
-- Analyze tables
ANALYZE;
-- Vacuum tables
VACUUM ANALYZE;Migration from Other Databases#
embapi is designed specifically for PostgreSQL with pgvector. Migration from other databases requires:
- Export data from source database
- Set up PostgreSQL with pgvector
- Transform data to match embapi schema
- Import using embapi API or direct SQL