Software Engineering

Metadata-Driven Merge: A Declarative Approach to Data Integration

Building a lightweight alternative to GraphQL for hierarchical data merging using Go, with concurrent fetching and configurable merge strategies.

Krishna C

Krishna C

October 18, 2020

4 min read

Modern applications often need to combine data from multiple sources into a unified, hierarchical structure. Consider an employee directory that needs to merge:

  • Associate records from one database
  • Email addresses (personal and business) from another system
  • Phone numbers from a third source
  • Position history linked to companies with office addresses

The traditional approach? Write custom code for every permutation of data sources. But what if users want different combinations at runtime? What if relationships change? You're stuck maintaining a tangled web of join logic.

The Solution: Metadata-Driven Merge

This Go-based project takes a fundamentally different approach: define relationships declaratively, let the engine figure out the rest.

Instead of writing imperative code like:

1// Don't do this for every combination...
2associates := fetchAssociates()
3for _, a := range associates {
4 a.Emails = fetchEmails(a.ID)
5 a.Positions = fetchPositions(a.ID)
6 for _, p := range a.Positions {
7 p.Company = fetchCompany(p.CompanyID)
8 }
9}

You define a merge template that describes the data hierarchy:

1{
2 "type": "group",
3 "alias": "associate",
4 "mergeType": "nestedMerge",
5 "children": [
6 { "type": "data_contract", "alias": "associate" },
7 { "type": "group", "alias": "email", "mergeType": "nestedMerge", "children": [...] },
8 { "type": "group", "alias": "position", "mergeType": "nestedMerge",
9 "children": [
10 { "type": "data_contract", "alias": "position" },
11 { "type": "group", "alias": "company", "children": [...] }
12 ]
13 }
14 ]
15}

Then simply request what you need:

1GET /associates?datasources=associate,email,position,company

The engine handles everything else.

How It Works

1. Template Shrinking

The first clever trick: the template is an AST (Abstract Syntax Tree) that gets pruned at runtime.

If you only request associate and email, the engine recursively removes the position and company branches. This means:

  • No wasted fetches for unrequested data
  • Validation that requested sources have valid lineage (you can't request company without position)
  • Optimal query planning

2. Parallel Data Fetching

Once the template is shrunk, data contracts are resolved concurrently:

1go func(dc *dataContract) {
2 defer wg.Done()
3 data := processDataContract(dc)
4 channel <- contractStoreItem{dc.Alias, data}
5}(dataContract)

Seven data sources? Seven goroutines. The system only waits as long as the slowest source.

3. Four Merge Strategies

The real power is in how data gets combined. The system supports four merge types that handle different cardinalities:

Merge TypeCardinalityResult
flatMerge1-to-1Child fields added directly to parent
objectMerge1-to-1Child nested as single object property
nestedMerge1-to-ManyChildren nested as array
arrayMergeN/ACombines data from multiple contracts at same level

This means you can express:

  • "Each associate has one username" → flatMerge
  • "Each associate has one current address" → objectMerge
  • "Each associate has many positions" → nestedMerge
  • "Emails come from personal AND business systems" → arrayMerge

4. Bottom-Up Tree Traversal

The merge happens from the leaves up. Company merges into position, position merges into associate. Each level uses join keys defined in the template:

1{
2 "parentKey": "company_id",
3 "currentKey": "_id"
4}

The Result

A single API call transforms scattered data:

Input (7 separate data sources):

1associate.json → [{_id: "001", firstname: "Krishna"}]
2username.json → [{associate_id: "001", username: "inventivepotter"}]
3personal-email.json → [{associate_id: "001", email: "[email protected]"}]
4business-email.json → [{associate_id: "001", email: "[email protected]"}]
5position.json → [{associate_id: "001", company_id: "c01", name: "Sr. Engineer"}]
6company.json → [{_id: "c01", name: "Acme Corp"}]

Output (unified hierarchy):

1{
2 "associates": [{
3 "_id": "001",
4 "firstname": "Krishna",
5 "username": "inventivepotter",
6 "email": [
7 {"email": "[email protected]"},
8 {"email": "[email protected]"}
9 ],
10 "position": [{
11 "name": "Sr. Engineer",
12 "company": {
13 "name": "Acme Corp"
14 }
15 }]
16 }]
17}

Why This Matters

Flexibility Without Code Changes

Need to add a new data source? Add it to the template. Need a new relationship? Define the merge type and keys. No recompilation, no new endpoints.

GraphQL Vibes, Simpler Implementation

This achieves similar goals to GraphQL—client-driven data selection, hierarchical responses—but with a fraction of the complexity. No schema definitions, no resolvers, no query parsing.

Production-Ready Patterns

The architecture demonstrates patterns that scale:

  • Concurrent processing with goroutines and channels
  • Template-based configuration for operations teams
  • Hash maps for O(1) lookups during merge operations
  • Memory cleanup after merge phases

What's Next?

The README hints at production features not included in this sample:

  • Queryable datasources with parameterized inputs
  • Batched data fetching for high-volume scenarios
  • Distributed caching for the merge hashmap
  • Data transformation rules and multiple output formats
  • Scheduling and event-driven triggers

Conclusion

Metadata-Driven Merge demonstrates that complex data integration doesn't require complex code. By treating relationships as configuration rather than implementation, you get:

  • Maintainability: Change behavior without changing code
  • Performance: Parallel fetching, optimal pruning
  • Flexibility: Any combination of data sources at runtime

Sometimes the best abstraction isn't a new query language—it's a well-designed metadata template and a smart engine to interpret it.

---

The full source code is available in Go, featuring clean separation between the HTTP layer, merge engine, and data resolution logic.

#go

← Previous

Running Jenkins in Kubernetes: Why We Left EC2 Behind

Scaling Jenkins agents dynamically in Kubernetes beats static EC2 instances. Here's what worked, what broke, and how we solved Docker-in-Docker nightmares with BuildKit.