North Highland Reference Architecture
Our point of view on enterprise AI architecture—and the working implementation that proves it.
North Highland believes enterprises must own their AI abstraction layer—not rent it from a cloud vendor.
Your application logic should never know which foundation model provider is executing a request. AI consumption and AI provision are separate architectural concerns.
The abstraction layer must be yours—deployed where you choose, governed by your policies, observable through your tools. Not a managed service you can't inspect.
Security, privacy, audit trails, and cost controls must be architectural constraints, not afterthoughts. Every AI request flows through policy enforcement.
A reference architecture is theory. A working implementation is proof. We built RegRiskIQ on these principles to demonstrate they're not just possible—they're practical.
You face a fundamental tension in AI adoption for regulatory compliance.
You need to leverage cutting-edge AI capabilities for compliance automation, risk assessment, and regulatory intelligence today. Waiting means falling behind competitors and increasing regulatory exposure.
You require the freedom to choose the best provider for each workload, switch providers as pricing and capabilities evolve, and avoid lock-in that constrains future technology decisions.
Your organization demands consistent security, privacy, and audit controls regardless of which AI provider processes requests. Compliance cannot be an afterthought.
You want to route workloads to the most cost-effective provider without sacrificing quality. Different tasks demand different models, and your architecture should support intelligent routing.
THE REFERENCE PATTERN
Applications never communicate directly with foundation model providers. All AI interactions flow through a Model Gateway the enterprise owns and controls.
The Model Gateway acts as an anti-corruption layer between your business logic and external AI providers. This separation delivers three strategic advantages:
| Without Gateway | With Gateway |
|---|---|
| Provider-specific code in apps | One API, any provider |
| Scattered governance | Centralized controls |
| Expensive provider switching | Configuration-based routing |
| Manual cost optimization | Intelligent auto-routing |
| Fragmented observability | Unified tracing and logs |
Building an abstraction layer is straightforward—most enterprises already have one. Governing it is where organizations struggle: consistent security policies, audit trails, compliance controls, and vendor management across all providers. That's what this architecture solves.
FROM THEORY TO PRACTICE
We didn't just design this architecture—we built it. RegRiskIQ is our working implementation of the Provider-Portable AI pattern, deployed in production for regulatory compliance workloads.
WHAT WE'VE BUILT
Our implementation delivers enterprise-grade AI governance through these integrated components.
Your applications specify what they need (regulatory analysis, risk scoring, document extraction) rather than which model to use. The gateway selects the optimal provider based on cost, latency, quality, and policy requirements.
Prompts become versioned, deployable artifacts stored in a central registry. Update prompts without changing application code. Test new versions before production rollout. Maintain audit trails of prompt changes.
Enforce tenant isolation, data residency requirements, guardrails, and rate limits at the gateway level. Policies apply consistently across all AI interactions regardless of provider.
Each foundation model provider integrates through a dedicated adapter that normalizes request formats, response structures, error codes, and authentication patterns. Adding new providers requires only a new adapter.
OpenTelemetry instrumentation provides end-to-end visibility across gateway, adapters, and providers. Track token usage, costs, latencies, and error rates per tenant, per provider, and per use case.
Your retrieval pipeline operates independently from model providers. Switch inference providers without re-indexing document stores or modifying retrieval logic. Your knowledge base stays portable.
| Strategy | Optimizes For | Use Case |
|---|---|---|
| Cost | Minimize spend while meeting quality thresholds | High-volume, non-critical workloads |
| Performance | Minimize latency for interactive experiences | Real-time compliance Q&A |
| Quality | Maximize output quality for critical decisions | Regulatory filing review |
| Hybrid | Balance all factors dynamically | Default for most workloads |
Fully codified AI governance playbook with 63 controls across 14 domains. Controls implemented as code with complete regulatory framework mapping to ISO 42001, EU AI Act, and NIST AI RMF.
5 domains | 19 controls | Enterprise-wide governance structure
9 domains | 44 controls | Per-system governance controls
What the provider-portable architecture enables for your organization.
Evaluate and adopt new providers without application changes. Your business logic stays stable while AI capabilities evolve.
Route each workload to the most cost-effective option. Use premium models where quality matters, economical models where speed is sufficient.
Apply uniform security, privacy, and audit controls across all AI interactions. Meet regulatory requirements once, regardless of provider.
Architectural readiness for emerging models and providers. When the next breakthrough arrives, you adopt it through configuration.
AWS, Azure, and Google each publish reference architectures for this exact pattern—because they're competing to be YOUR abstraction layer.
"Multi-Provider Generative AI Gateway" — Official AWS guidance for routing to Azure, OpenAI, and other providers through an AWS-hosted LiteLLM gateway on ECS/EKS.
AWS Solutions Library"AI Gateway" with native Bedrock support — Microsoft's answer: use Azure APIM to govern AWS Bedrock and non-Microsoft AI providers from your Azure control plane.
Microsoft LearnModel Garden with multi-provider serving — Google's unified platform supporting Anthropic Claude, Meta Llama, and partner models alongside Gemini.
Google CloudEach hyperscaler wants to be your gateway to all the others. Our architecture gives you this pattern without the platform lock-in—your gateway runs where YOU choose, not where your cloud vendor prefers.
Provider portability is real. But it requires intentional design and ongoing investment.
Tool calling semantics, JSON output reliability, token limits, streaming formats, and content safety features vary across providers. Our adapter layer handles this complexity so your applications stay clean.
Every abstraction has overhead. The gateway layer adds processing time for routing, policy evaluation, and request normalization. For latency-sensitive workloads, this impact must be measured and optimized for your specific use cases.
Proving portability demands evaluation harnesses, golden datasets, and quality metrics across providers. We build this infrastructure as a first-class capability.
Each adapter includes a compatibility test suite that validates behavior against provider-specific edge cases. New provider integrations pass this harness before production.
Gateway components are designed with latency budgets in mind. We instrument each stage (policy evaluation, prompt resolution, routing) with OpenTelemetry tracing to identify and address bottlenecks. Specific targets are established during implementation based on measured baselines.
Automated quality regression tests run against all providers weekly. You get scorecards showing "can I switch to provider X" based on real data.
Your path to provider-portable AI compliance starts with a structured engagement.