Event-Sourced Audit Logs for HR & Payroll at Scale

System design for immutable activity logs using event sourcing patterns, optimized for multi-tenant SaaS.

May 20, 2023

•

5 min read

TL;DR Single-table JSONB schema with monthly partitioning gives you 10-100x faster audit queries. Use dual timestamps for Kafka async writes, defense-in-depth for immutability, and an outbox pattern with DLQ for recovery when things go wrong.

Activity Logs

In HR & Payroll systems managing sensitive employee data compensation changes, benefit elections, tax withholdings, terminations every action must be auditable.

Who made the change (actor identification)
What changed (before/after values)
When the change occurred (timestamp with timezone)
Why the change was made (business context)
Immutability - audit logs cannot be modified or deleted

When you're managing 70,000+ employees per client in a multi-tenant PaaS where clients customize their HROS, audit logging becomes exponentially complex. A single "salary adjustment" might:

Touch multiple database tables (employee, compensation, tax_withholding)
Trigger downstream effects (recalculate benefits, adjust payroll deductions)
Involve custom fields defined by the client
Need to be traced across distributed microservices

This design uses a single event table with JSONB payloads, composite partitioning (time + tenant), and time-series optimizations to achieve 10-100x better query performance through partition pruning.

System Architecture

Flow:

Business service executes operation and writes event to outbox table (same transaction)
Outbox poller publishes events to Kafka topic
Consumer group writes to PostgreSQL with batch inserts
Failed events route to DLQ for investigation
Monthly archival moves old partitions to S3 Parquet
Reconciliation jobs verify event completeness

Event-Sourced Activity Log Architecture

The system uses an event sourcing pattern with a simplified, scalable schema optimized for time-series workloads and multi-tenant isolation.

Optimized Schema Design

Single Activity Event Table:

1activity_events
2├─ event_id (UUID)                   -- Globally unique event identifier
3├─ created_at (TIMESTAMPTZ)          -- When event actually occurred
4├─ inserted_at (TIMESTAMPTZ)         -- When written to DB (for reliable ordering)
5├─ tenant_id (VARCHAR)               -- Multi-tenant isolation (partition key)
6├─ entity_type (VARCHAR)             -- Target entity: 'employee', 'payroll', 'benefits'
7├─ entity_id (VARCHAR)               -- Specific entity identifier
8├─ actor_id (VARCHAR)                -- Who performed the action
9├─ actor_type (VARCHAR)              -- 'user', 'service', 'api_key'
10├─ operation (VARCHAR)               -- Operation name: 'update_compensation'
11├─ trace_id (VARCHAR)                -- Distributed tracing correlation
12├─ schema_version (SMALLINT)         -- Event schema version for evolution
13├─ metadata (JSONB)                  -- Request context, IP, user agent
14├─ payload (JSONB)                   -- Operation arguments + field changes
15└─ PRIMARY KEY (tenant_id, inserted_at, event_id)

Benefits:

Single-table design eliminates JOINs and relationship complexity
JSONB payload stores all event data efficiently with native compression
Dual timestamps solve ordering challenges: created_at captures event occurrence time, inserted_at provides reliable ordering even with async writes via Kafka
Composite primary key (tenant_id, inserted_at, event_id) optimizes for time-range queries
Event schema versioning allows evolution without migrations
Structured columns (entity_type, entity_id, actor_id, operation) eliminate need for expensive JSONB indexing
Monthly time-based partitioning keeps partition count manageable (12 partitions/year vs 365 for daily), aligns with S3 archival structure
Tenant filtering via indexed WHERE clauses avoids partition explosion (1000 tenants × 12 months = 12K partitions vs 365K with daily)
Time-based partitions allow zero-downtime archival (DROP vs DELETE)
TimescaleDB-compatible for advanced time-series features

Event Payload Structure

Consolidated JSONB Payload:

1{
2  "operation": {
3    "name": "update_compensation",
4    "args": {
5      "employee_id": "EMP001",
6      "new_salary": 85000,
7      "effective_date": "2024-02-01",
8      "reason": "merit_increase"
9    }
10  },
11  "changes": [
12    {
13      "path": "salary",
14      "old": 80000,
15      "new": 85000,
16      "type": "update"
17    },
18    {
19      "path": "pay_frequency",
20      "old": "monthly",
21      "new": "biweekly",
22      "type": "update"
23    }
24  ]
25}

Event Capture Pattern

The system follows a standard pattern across all operations:

1. Pre-Operation Snapshot → 2. Execute Business Logic → 3. Post-Operation Snapshot → 4. Compute Diff → 5. Write Event (Async)

Change Detection:

Deep object comparison captures all modifications (nested objects, arrays)
Generates dot-notation paths: address.city, benefits[2]
Supports all change types: update, insert, delete
Automatic - no manual field tracking required

Example:

1Before: { salary: 80000, address: { city: "SF" } }
2After:  { salary: 85000, address: { city: "NYC" } }
3
4Changes: [
5  { path: "salary", old: 80000, new: 85000, type: "update" },
6  { path: "address.city", old: "SF", new: "NYC", type: "update" }
7]

Enforcing Immutability - Defence-in-Depth

Application Layer: ORM entities marked as non-updatable (prevents accidental modifications)

Database Layer: Triggers block UPDATE/DELETE operations, raising exceptions for any modification attempts

Access Control: Separate database roles - event_writer (INSERT-only) and event_reader (SELECT-only)

Tiered Storage:

Hot (180 days): PostgreSQL for fast operational queries
Cold (Archive): S3 with Object Lock (WORM), organized as s3://events/{shard_id}/{year}/{month}/events.parquet for efficient access and compliance retention

Actor Identification & Request Context

Every event captures complete context through middleware injection:

Context Components:

actor_id: User ID (JWT), service account, or API key
actor_type: Distinguishes human vs. system actors
trace_id: Distributed tracing across services
metadata: IP address, user agent, custom fields

Fail-fast policy: No anonymous events allowed.

Multi-Tenant Isolation

Large Clients (>10K employees): Dedicated database shards with monthly time-only partitioning. No tenant_id column needed (implicit from shard). Partition count: 12/year.

Small/Medium Clients (<10K employees): Shared database with monthly time-based partitioning. Tenant filtering via indexed WHERE tenant_id clauses. Partition count: 12/year (not multiplied by tenant count).

Partition Management:

Automated partition creation using pg_partman extension (creates 3 months ahead)
Cron job runs weekly to maintain partitions and archive old data (sufficient for monthly partition boundaries)
Monthly alignment matches S3 archival structure: s3://events/{shard_id}/{year}/{month}/events.parquet

Example Partition Structure:

1-- Shared database for small/medium clients
2activity_events_2024_01  -- January 2024 (all tenants)
3activity_events_2024_02  -- February 2024 (all tenants)
4...
5activity_events_2024_12  -- December 2024 (all tenants)
6
7-- Single partition query
8SELECT * FROM activity_events
9WHERE tenant_id = 'ABC'
10  AND inserted_at BETWEEN '2024-01-15' AND '2024-01-18'
11-- Scans: 1 partition (2024_01), filters via (tenant_id, inserted_at) index
12
13-- Multi-partition query (spans month boundary)
14SELECT * FROM activity_events
15WHERE tenant_id = 'ABC'
16  AND inserted_at BETWEEN '2024-01-30' AND '2024-02-02'
17-- Scans: 2 partitions (2024_01 + 2024_02), no delay or manual work needed

Large clients get dedicated resources and independent scaling; small clients share infrastructure cost-effectively.

Querying & Analytics

User Dashboard Filters:

Audit dashboards provide self-service filtering powered by optimized SQL queries:

Entity filters: Employee, operation type, date range
Actor filters: User, service account, API key
Trace filters: Distributed request correlation across services

All queries benefit from partition pruning (tenant + time) and B-tree indexes on structured columns (entity_id, actor_id, operation). JSONB payloads are retrieved as-is without querying internal fields, avoiding expensive GIN index overhead.

Operational Use Cases:

Beyond compliance reporting, activity logs power critical business operations:

Anomaly detection: Flag unusual patterns before payroll runs (e.g., mass salary changes, off-hours modifications)
Fraud prevention: Detect suspicious access patterns or unauthorized data exports
Incident response: Reconstruct system state during outages or data corruption
User behavior analytics: Identify training needs based on common error patterns

Performance at Scale

Write Throughput Optimizations:

Async writes via Kafka: Decouple event logging from user operations (durable queues prevent data loss)
Batch inserts: 500-1000 records per batch where applicable (like bulk operations) achieves 10-20x throughput improvement
Separate database resources: Dedicated connection pools or shards prevent contention with main application
Strategic indexing: Limit to 4-6 B-tree indexes on structured columns only (tenant_id, inserted_at, entity_id, actor_id, operation, trace_id). No JSONB indexes - avoids 15-20% write overhead with minimal query impact

TimescaleDB:

Automatic hypertable partitioning by time with chunk compression
Continuous aggregates for real-time dashboards without expensive queries
10-100x compression on historical data (>30 days old)
Native PostgreSQL compatibility maintains existing tooling and queries

Recovery & Durability

What happens when Kafka goes down or events get lost? For audit logs, you need guarantees.

Kafka Producer Settings:

Configure producers for maximum durability:

acks=all: Wait for all in-sync replicas to acknowledge
min.insync.replicas=2: Require at least 2 replicas before acknowledging
retries=MAX_INT: Keep retrying failed sends
enable.idempotence=true: Prevent duplicate writes on retry

With these settings, Kafka won't acknowledge until events are safely replicated. If Kafka itself is unavailable, events buffer at the producer until it recovers.

Dead Letter Queue (DLQ):

Events that fail to write to PostgreSQL after N retries go to a DLQ topic. Common failure reasons:

Schema validation errors (malformed payload)
Constraint violations (missing required fields)
Transient database errors that exceed retry budget

DLQ events need manual investigation. Set up alerts when DLQ size exceeds threshold. Most teams aim for zero DLQ events in normal operation.

Reconciliation Jobs:

Run periodic checks to catch silent failures:

Entity coverage: Every employee record should have a CREATE event. Query: SELECT employee_id FROM employees WHERE employee_id NOT IN (SELECT entity_id FROM activity_events WHERE operation = 'create_employee')
Event continuity: Check for time gaps in event streams that might indicate lost data
Cross-service correlation: Verify trace_ids appear in all expected services

Weekly reconciliation catches issues that slip through other safeguards. In practice, with proper Kafka settings, reconciliation usually finds zero discrepancies.

Conclusion

This event-sourced audit log design delivers:

Technical: Single-table JSONB schema with composite partitioning (time + tenant) enables 10-100x query speedup through partition pruning. Multi-layered immutability (application, triggers, RBAC) ensures compliance. Async writes with message queues provide durability at scale.

Business: When a client asks, "Show me every change to this employee's compensation over the last 5 years," you can deliver a complete, verifiable audit trail in milliseconds. That's not just compliance, it's competitive advantage.

Treating activity logs as first-class architectural citizens builds a foundation for trust, recovery, and operational excellence that scales with your platform.

Thoughts? Hit me up at [email protected]

← Previous

Raising Curious Minds in the Age of AI

As a father of twin toddlers, I think about one question often: will LLMs help or hinder their curiosity? In a world where AI answers anything instantly, how do we raise problem solvers who still want to discover how things work?

Scaling HR & Payroll APIs: Four Patterns for Enterprise Data Distribution

A system design approach for building API platforms that serve HR and Payroll data at enterprise scale, supporting 70,000+ employees per client with four distinct access patterns.