Metadata-Driven Merge: A Declarative Approach to Data Integration
Building a lightweight alternative to GraphQL for hierarchical data merging using Go, with concurrent fetching and configurable merge strategies.
Krishna C
October 18, 2020
•
4 min read
Modern applications often need to combine data from multiple sources into a unified, hierarchical structure. Consider an employee directory that needs to merge:
- Associate records from one database
- Email addresses (personal and business) from another system
- Phone numbers from a third source
- Position history linked to companies with office addresses
The traditional approach? Write custom code for every permutation of data sources. But what if users want different combinations at runtime? What if relationships change? You're stuck maintaining a tangled web of join logic.
The Solution: Metadata-Driven Merge
This Go-based project takes a fundamentally different approach: define relationships declaratively, let the engine figure out the rest.
Instead of writing imperative code like:
1// Don't do this for every combination...2associates := fetchAssociates()3for _, a := range associates {4 a.Emails = fetchEmails(a.ID)5 a.Positions = fetchPositions(a.ID)6 for _, p := range a.Positions {7 p.Company = fetchCompany(p.CompanyID)8 }9}
You define a merge template that describes the data hierarchy:
1{2 "type": "group",3 "alias": "associate",4 "mergeType": "nestedMerge",5 "children": [6 { "type": "data_contract", "alias": "associate" },7 { "type": "group", "alias": "email", "mergeType": "nestedMerge", "children": [...] },8 { "type": "group", "alias": "position", "mergeType": "nestedMerge",9 "children": [10 { "type": "data_contract", "alias": "position" },11 { "type": "group", "alias": "company", "children": [...] }12 ]13 }14 ]15}
Then simply request what you need:
1GET /associates?datasources=associate,email,position,company
The engine handles everything else.
How It Works
1. Template Shrinking
The first clever trick: the template is an AST (Abstract Syntax Tree) that gets pruned at runtime.
If you only request associate and email, the engine recursively removes the position and company branches. This means:
- No wasted fetches for unrequested data
- Validation that requested sources have valid lineage (you can't request
companywithoutposition) - Optimal query planning
2. Parallel Data Fetching
Once the template is shrunk, data contracts are resolved concurrently:
1go func(dc *dataContract) {2 defer wg.Done()3 data := processDataContract(dc)4 channel <- contractStoreItem{dc.Alias, data}5}(dataContract)
Seven data sources? Seven goroutines. The system only waits as long as the slowest source.
3. Four Merge Strategies
The real power is in how data gets combined. The system supports four merge types that handle different cardinalities:
| Merge Type | Cardinality | Result |
|---|---|---|
| flatMerge | 1-to-1 | Child fields added directly to parent |
| objectMerge | 1-to-1 | Child nested as single object property |
| nestedMerge | 1-to-Many | Children nested as array |
| arrayMerge | N/A | Combines data from multiple contracts at same level |
This means you can express:
- "Each associate has one username" →
flatMerge - "Each associate has one current address" →
objectMerge - "Each associate has many positions" →
nestedMerge - "Emails come from personal AND business systems" →
arrayMerge
4. Bottom-Up Tree Traversal
The merge happens from the leaves up. Company merges into position, position merges into associate. Each level uses join keys defined in the template:
1{2 "parentKey": "company_id",3 "currentKey": "_id"4}
The Result
A single API call transforms scattered data:
Input (7 separate data sources):
1associate.json → [{_id: "001", firstname: "Krishna"}]2username.json → [{associate_id: "001", username: "inventivepotter"}]3personal-email.json → [{associate_id: "001", email: "[email protected]"}]4business-email.json → [{associate_id: "001", email: "[email protected]"}]5position.json → [{associate_id: "001", company_id: "c01", name: "Sr. Engineer"}]6company.json → [{_id: "c01", name: "Acme Corp"}]
Output (unified hierarchy):
1{2 "associates": [{3 "_id": "001",4 "firstname": "Krishna",5 "username": "inventivepotter",6 "email": [7 {"email": "[email protected]"},8 {"email": "[email protected]"}9 ],10 "position": [{11 "name": "Sr. Engineer",12 "company": {13 "name": "Acme Corp"14 }15 }]16 }]17}
Why This Matters
Flexibility Without Code Changes
Need to add a new data source? Add it to the template. Need a new relationship? Define the merge type and keys. No recompilation, no new endpoints.
GraphQL Vibes, Simpler Implementation
This achieves similar goals to GraphQL—client-driven data selection, hierarchical responses—but with a fraction of the complexity. No schema definitions, no resolvers, no query parsing.
Production-Ready Patterns
The architecture demonstrates patterns that scale:
- Concurrent processing with goroutines and channels
- Template-based configuration for operations teams
- Hash maps for O(1) lookups during merge operations
- Memory cleanup after merge phases
What's Next?
The README hints at production features not included in this sample:
- Queryable datasources with parameterized inputs
- Batched data fetching for high-volume scenarios
- Distributed caching for the merge hashmap
- Data transformation rules and multiple output formats
- Scheduling and event-driven triggers
Conclusion
Metadata-Driven Merge demonstrates that complex data integration doesn't require complex code. By treating relationships as configuration rather than implementation, you get:
- Maintainability: Change behavior without changing code
- Performance: Parallel fetching, optimal pruning
- Flexibility: Any combination of data sources at runtime
Sometimes the best abstraction isn't a new query language—it's a well-designed metadata template and a smart engine to interpret it.
---
The full source code is available in Go, featuring clean separation between the HTTP layer, merge engine, and data resolution logic.