Database Optimization: Performance Strategies for Modern Applications

Database performance is often the bottleneck in web applications. As data grows and user bases expand, what once seemed like adequately performing queries can become significant pain points. This post explores proven strategies for optimizing database performance in modern applications.

Understanding Performance Bottlenecks

Before optimizing, it's crucial to identify where your database is struggling:

Common Performance Issues

Slow queries: Individual queries taking too long to execute
High concurrency: Multiple users causing lock contention
Resource constraints: CPU, memory, or disk I/O limitations
Poor schema design: Inefficient table structures and relationships

Monitoring and Profiling

Effective optimization starts with good monitoring:

-- PostgreSQL: Enable query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1 second

-- MySQL: Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

Indexing Strategies

Indexes are your first line of defense against slow queries:

Primary and Unique Indexes

-- Ensure primary keys are optimally chosen
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

Composite Indexes

For queries filtering on multiple columns:

-- Index for common query patterns
CREATE INDEX idx_user_activity ON user_activity (user_id, activity_date, activity_type);

-- Query that benefits from the composite index
SELECT COUNT(*) 
FROM user_activity 
WHERE user_id = 123 
  AND activity_date >= '2024-01-01' 
  AND activity_type = 'login';

Partial Indexes

Save space and improve performance with conditional indexes:

-- Index only active users
CREATE INDEX idx_active_users ON users (last_login_at) 
WHERE status = 'active';

-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders (created_at, customer_id) 
WHERE created_at >= '2024-01-01';

Query Optimization Techniques

Avoiding N+1 Queries

One of the most common performance killers:

// Bad: N+1 query problem
const users = await User.findAll();
for (const user of users) {
    const posts = await Post.findAll({ where: { userId: user.id } });
    user.posts = posts;
}

// Good: Use joins or eager loading
const users = await User.findAll({
    include: [{ model: Post, as: 'posts' }]
});

Efficient Pagination

Large offset values can kill performance:

-- Bad: Large offset is expensive
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;

-- Good: Cursor-based pagination
SELECT * FROM posts 
WHERE created_at < '2024-01-15 10:30:00'
ORDER BY created_at DESC 
LIMIT 20;

Query Plan Analysis

Understanding how your database executes queries:

-- PostgreSQL
EXPLAIN ANALYZE 
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id, u.name;

-- MySQL
EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id, u.name;

Schema Design Best Practices

Normalization vs. Denormalization

Finding the right balance:

-- Normalized approach (good for writes, complex for reads)
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    order_date DATE NOT NULL
);

CREATE TABLE order_items (
    id BIGSERIAL PRIMARY KEY,
    order_id BIGINT NOT NULL,
    product_id BIGINT NOT NULL,
    quantity INTEGER NOT NULL,
    price DECIMAL(10,2) NOT NULL
);

-- Denormalized approach (faster reads, more storage)
CREATE TABLE order_summary (
    id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(12,2) NOT NULL,
    item_count INTEGER NOT NULL,
    top_category VARCHAR(100)
);

Data Types and Storage

Choosing efficient data types:

-- Use appropriate integer sizes
user_id BIGINT,           -- For large user bases
status_id SMALLINT,       -- For limited options
is_active BOOLEAN,        -- Instead of CHAR(1)

-- Optimize string storage
country_code CHAR(2),     -- Fixed length
email VARCHAR(255),       -- Variable length with limit
description TEXT          -- For large text

Caching Strategies

Application-Level Caching

// Redis caching example
const getUser = async (userId) => {
    const cacheKey = `user:${userId}`;
    
    // Try cache first
    let user = await redis.get(cacheKey);
    if (user) {
        return JSON.parse(user);
    }
    
    // Fallback to database
    user = await User.findByPk(userId);
    if (user) {
        await redis.setex(cacheKey, 3600, JSON.stringify(user));
    }
    
    return user;
};

Database Query Caching

-- PostgreSQL: Enable query plan caching
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';

-- MySQL: Enable query cache (MySQL 5.7 and earlier)
SET GLOBAL query_cache_type = ON;
SET GLOBAL query_cache_size = 268435456; -- 256MB

Connection and Resource Management

Connection Pooling

Proper connection management is crucial:

// Node.js with PostgreSQL
const pool = new Pool({
    host: 'localhost',
    port: 5432,
    database: 'myapp',
    user: 'dbuser',
    password: 'password',
    max: 20,              // Maximum connections
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000,
});

Read Replicas

Distribute read load across multiple database instances:

// Database routing based on operation type
const dbConfig = {
    write: { host: 'primary-db.example.com' },
    read: [
        { host: 'replica1-db.example.com' },
        { host: 'replica2-db.example.com' }
    ]
};

const getConnection = (operation = 'read') => {
    if (operation === 'write') {
        return dbConfig.write;
    }
    
    // Round-robin load balancing for reads
    const readReplicas = dbConfig.read;
    const index = Math.floor(Math.random() * readReplicas.length);
    return readReplicas[index];
};

Advanced Optimization Techniques

Partitioning

For very large tables:

-- Range partitioning by date
CREATE TABLE sales (
    id BIGSERIAL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2),
    customer_id BIGINT
) PARTITION BY RANGE (sale_date);

CREATE TABLE sales_2024_q1 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE sales_2024_q2 PARTITION OF sales
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

Materialized Views

Pre-compute expensive aggregations:

-- Create materialized view for complex reporting
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT 
    DATE(created_at) as sale_date,
    COUNT(*) as order_count,
    SUM(total_amount) as total_revenue,
    AVG(total_amount) as avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY DATE(created_at);

-- Refresh periodically
REFRESH MATERIALIZED VIEW daily_sales_summary;

Monitoring and Maintenance

Regular Maintenance Tasks

-- PostgreSQL: Update table statistics
ANALYZE users;

-- Rebuild fragmented indexes
REINDEX INDEX idx_user_email;

-- MySQL: Optimize tables
OPTIMIZE TABLE users;

Performance Metrics to Track

Query response times: 95th percentile latency
Throughput: Queries per second
Connection usage: Active vs. maximum connections
Cache hit ratios: Buffer pool efficiency
Lock contention: Waiting queries and deadlocks

Database-Specific Optimizations

PostgreSQL

-- Adjust configuration for better performance
ALTER SYSTEM SET shared_buffers = '256MB';
ALTER SYSTEM SET effective_cache_size = '1GB';
ALTER SYSTEM SET work_mem = '4MB';
SELECT pg_reload_conf();

MySQL

-- InnoDB optimizations
SET GLOBAL innodb_buffer_pool_size = 268435456; -- 256MB
SET GLOBAL innodb_log_file_size = 67108864;     -- 64MB
SET GLOBAL innodb_flush_log_at_trx_commit = 2;

Conclusion

Database optimization is an ongoing process that requires careful monitoring, analysis, and incremental improvements. Start with the basics—proper indexing and query optimization—before moving to more advanced techniques like partitioning and caching.

Remember that optimization is about trade-offs. What improves read performance might hurt write performance, and what works for your current scale might not work as you grow. Always measure the impact of your changes and be prepared to adapt your strategy as your application evolves.

The key to successful database optimization is understanding your specific workload patterns and optimizing for your most critical use cases while maintaining overall system health.