Event-Sourced Audit Logs for HR & Payroll at Scale
System design for immutable activity logs using event sourcing patterns, optimized for multi-tenant SaaS.
May 20, 2023
•
5 min read
TL;DR Single-table JSONB schema with monthly partitioning gives you 10-100x faster audit queries. Use dual timestamps for Kafka async writes, defense-in-depth for immutability, and an outbox pattern with DLQ for recovery when things go wrong.
Activity Logs
In HR & Payroll systems managing sensitive employee data compensation changes, benefit elections, tax withholdings, terminations every action must be auditable.
- Who made the change (actor identification)
- What changed (before/after values)
- When the change occurred (timestamp with timezone)
- Why the change was made (business context)
- Immutability - audit logs cannot be modified or deleted
When you're managing 70,000+ employees per client in a multi-tenant PaaS where clients customize their HROS, audit logging becomes exponentially complex. A single "salary adjustment" might:
- Touch multiple database tables (employee, compensation, tax_withholding)
- Trigger downstream effects (recalculate benefits, adjust payroll deductions)
- Involve custom fields defined by the client
- Need to be traced across distributed microservices
This design uses a single event table with JSONB payloads, composite partitioning (time + tenant), and time-series optimizations to achieve 10-100x better query performance through partition pruning.
System Architecture
Flow:
- Business service executes operation and writes event to outbox table (same transaction)
- Outbox poller publishes events to Kafka topic
- Consumer group writes to PostgreSQL with batch inserts
- Failed events route to DLQ for investigation
- Monthly archival moves old partitions to S3 Parquet
- Reconciliation jobs verify event completeness
Event-Sourced Activity Log Architecture
The system uses an event sourcing pattern with a simplified, scalable schema optimized for time-series workloads and multi-tenant isolation.
Optimized Schema Design
Single Activity Event Table:
1activity_events2├─ event_id (UUID) -- Globally unique event identifier3├─ created_at (TIMESTAMPTZ) -- When event actually occurred4├─ inserted_at (TIMESTAMPTZ) -- When written to DB (for reliable ordering)5├─ tenant_id (VARCHAR) -- Multi-tenant isolation (partition key)6├─ entity_type (VARCHAR) -- Target entity: 'employee', 'payroll', 'benefits'7├─ entity_id (VARCHAR) -- Specific entity identifier8├─ actor_id (VARCHAR) -- Who performed the action9├─ actor_type (VARCHAR) -- 'user', 'service', 'api_key'10├─ operation (VARCHAR) -- Operation name: 'update_compensation'11├─ trace_id (VARCHAR) -- Distributed tracing correlation12├─ schema_version (SMALLINT) -- Event schema version for evolution13├─ metadata (JSONB) -- Request context, IP, user agent14├─ payload (JSONB) -- Operation arguments + field changes15└─ PRIMARY KEY (tenant_id, inserted_at, event_id)
Benefits:
- Single-table design eliminates JOINs and relationship complexity
- JSONB payload stores all event data efficiently with native compression
- Dual timestamps solve ordering challenges:
created_atcaptures event occurrence time,inserted_atprovides reliable ordering even with async writes via Kafka - Composite primary key
(tenant_id, inserted_at, event_id)optimizes for time-range queries - Event schema versioning allows evolution without migrations
- Structured columns (
entity_type,entity_id,actor_id,operation) eliminate need for expensive JSONB indexing - Monthly time-based partitioning keeps partition count manageable (12 partitions/year vs 365 for daily), aligns with S3 archival structure
- Tenant filtering via indexed WHERE clauses avoids partition explosion (1000 tenants × 12 months = 12K partitions vs 365K with daily)
- Time-based partitions allow zero-downtime archival (DROP vs DELETE)
- TimescaleDB-compatible for advanced time-series features
Event Payload Structure
Consolidated JSONB Payload:
1{2 "operation": {3 "name": "update_compensation",4 "args": {5 "employee_id": "EMP001",6 "new_salary": 85000,7 "effective_date": "2024-02-01",8 "reason": "merit_increase"9 }10 },11 "changes": [12 {13 "path": "salary",14 "old": 80000,15 "new": 85000,16 "type": "update"17 },18 {19 "path": "pay_frequency",20 "old": "monthly",21 "new": "biweekly",22 "type": "update"23 }24 ]25}
Event Capture Pattern
The system follows a standard pattern across all operations:
1. Pre-Operation Snapshot → 2. Execute Business Logic → 3. Post-Operation Snapshot → 4. Compute Diff → 5. Write Event (Async)
Change Detection:
- Deep object comparison captures all modifications (nested objects, arrays)
- Generates dot-notation paths:
address.city,benefits[2] - Supports all change types: update, insert, delete
- Automatic - no manual field tracking required
Example:
1Before: { salary: 80000, address: { city: "SF" } }2After: { salary: 85000, address: { city: "NYC" } }34Changes: [5 { path: "salary", old: 80000, new: 85000, type: "update" },6 { path: "address.city", old: "SF", new: "NYC", type: "update" }7]
Enforcing Immutability - Defence-in-Depth
Application Layer: ORM entities marked as non-updatable (prevents accidental modifications)
Database Layer: Triggers block UPDATE/DELETE operations, raising exceptions for any modification attempts
Access Control: Separate database roles - event_writer (INSERT-only) and event_reader (SELECT-only)
Tiered Storage:
- Hot (180 days): PostgreSQL for fast operational queries
- Cold (Archive): S3 with Object Lock (WORM), organized as
s3://events/{shard_id}/{year}/{month}/events.parquetfor efficient access and compliance retention
Actor Identification & Request Context
Every event captures complete context through middleware injection:
Context Components:
actor_id: User ID (JWT), service account, or API keyactor_type: Distinguishes human vs. system actorstrace_id: Distributed tracing across servicesmetadata: IP address, user agent, custom fields
Fail-fast policy: No anonymous events allowed.
Multi-Tenant Isolation
Large Clients (>10K employees): Dedicated database shards with monthly time-only partitioning. No tenant_id column needed (implicit from shard). Partition count: 12/year.
Small/Medium Clients (<10K employees): Shared database with monthly time-based partitioning. Tenant filtering via indexed WHERE tenant_id clauses. Partition count: 12/year (not multiplied by tenant count).
Partition Management:
- Automated partition creation using
pg_partmanextension (creates 3 months ahead) - Cron job runs weekly to maintain partitions and archive old data (sufficient for monthly partition boundaries)
- Monthly alignment matches S3 archival structure:
s3://events/{shard_id}/{year}/{month}/events.parquet
Example Partition Structure:
1-- Shared database for small/medium clients2activity_events_2024_01 -- January 2024 (all tenants)3activity_events_2024_02 -- February 2024 (all tenants)4...5activity_events_2024_12 -- December 2024 (all tenants)67-- Single partition query8SELECT * FROM activity_events9WHERE tenant_id = 'ABC'10 AND inserted_at BETWEEN '2024-01-15' AND '2024-01-18'11-- Scans: 1 partition (2024_01), filters via (tenant_id, inserted_at) index1213-- Multi-partition query (spans month boundary)14SELECT * FROM activity_events15WHERE tenant_id = 'ABC'16 AND inserted_at BETWEEN '2024-01-30' AND '2024-02-02'17-- Scans: 2 partitions (2024_01 + 2024_02), no delay or manual work needed
Large clients get dedicated resources and independent scaling; small clients share infrastructure cost-effectively.
Querying & Analytics
User Dashboard Filters:
Audit dashboards provide self-service filtering powered by optimized SQL queries:
- Entity filters: Employee, operation type, date range
- Actor filters: User, service account, API key
- Trace filters: Distributed request correlation across services
All queries benefit from partition pruning (tenant + time) and B-tree indexes on structured columns (entity_id, actor_id, operation). JSONB payloads are retrieved as-is without querying internal fields, avoiding expensive GIN index overhead.
Operational Use Cases:
Beyond compliance reporting, activity logs power critical business operations:
- Anomaly detection: Flag unusual patterns before payroll runs (e.g., mass salary changes, off-hours modifications)
- Fraud prevention: Detect suspicious access patterns or unauthorized data exports
- Incident response: Reconstruct system state during outages or data corruption
- User behavior analytics: Identify training needs based on common error patterns
Performance at Scale
Write Throughput Optimizations:
- Async writes via Kafka: Decouple event logging from user operations (durable queues prevent data loss)
- Batch inserts: 500-1000 records per batch where applicable (like bulk operations) achieves 10-20x throughput improvement
- Separate database resources: Dedicated connection pools or shards prevent contention with main application
- Strategic indexing: Limit to 4-6 B-tree indexes on structured columns only (
tenant_id,inserted_at,entity_id,actor_id,operation,trace_id). No JSONB indexes - avoids 15-20% write overhead with minimal query impact
TimescaleDB:
- Automatic hypertable partitioning by time with chunk compression
- Continuous aggregates for real-time dashboards without expensive queries
- 10-100x compression on historical data (>30 days old)
- Native PostgreSQL compatibility maintains existing tooling and queries
Recovery & Durability
What happens when Kafka goes down or events get lost? For audit logs, you need guarantees.
Kafka Producer Settings:
Configure producers for maximum durability:
acks=all: Wait for all in-sync replicas to acknowledgemin.insync.replicas=2: Require at least 2 replicas before acknowledgingretries=MAX_INT: Keep retrying failed sendsenable.idempotence=true: Prevent duplicate writes on retry
With these settings, Kafka won't acknowledge until events are safely replicated. If Kafka itself is unavailable, events buffer at the producer until it recovers.
Dead Letter Queue (DLQ):
Events that fail to write to PostgreSQL after N retries go to a DLQ topic. Common failure reasons:
- Schema validation errors (malformed payload)
- Constraint violations (missing required fields)
- Transient database errors that exceed retry budget
DLQ events need manual investigation. Set up alerts when DLQ size exceeds threshold. Most teams aim for zero DLQ events in normal operation.
Reconciliation Jobs:
Run periodic checks to catch silent failures:
- Entity coverage: Every employee record should have a CREATE event. Query:
SELECT employee_id FROM employees WHERE employee_id NOT IN (SELECT entity_id FROM activity_events WHERE operation = 'create_employee') - Event continuity: Check for time gaps in event streams that might indicate lost data
- Cross-service correlation: Verify trace_ids appear in all expected services
Weekly reconciliation catches issues that slip through other safeguards. In practice, with proper Kafka settings, reconciliation usually finds zero discrepancies.
Conclusion
This event-sourced audit log design delivers:
Technical: Single-table JSONB schema with composite partitioning (time + tenant) enables 10-100x query speedup through partition pruning. Multi-layered immutability (application, triggers, RBAC) ensures compliance. Async writes with message queues provide durability at scale.
Business: When a client asks, "Show me every change to this employee's compensation over the last 5 years," you can deliver a complete, verifiable audit trail in milliseconds. That's not just compliance, it's competitive advantage.
Treating activity logs as first-class architectural citizens builds a foundation for trust, recovery, and operational excellence that scales with your platform.
Thoughts? Hit me up at [email protected]