INTERVIEW: Why is AI Still Struggling to Break Into Classified and High-Security Defence Environments?

In this interview with Impact Newswire, Noah Labs AI chief executive Murat Isik examines how artificial intelligence can be safely and effectively deployed inside some of the most secure defence and intelligence environments, where air-gapped systems, strict compliance regimes and entrenched legacy codebases continue to limit the adoption of mainstream AI development tools. He outlines the structural constraints that shape AI use in these settings, including disconnected networks, rigid data governance rules and long-standing security protocols that prevent the use of cloud-connected tools such as Copilot or Cursor in operational contexts. Against this backdrop, Isik argues that a widening sovereign AI gap is emerging between rapidly evolving commercial AI workflows and slower, constrained government systems, raising questions about whether defence organisations risk being left behind as AI becomes embedded in everyday software development.

Within national security and defence settings, AI adoption is constrained less by model capability than by infrastructure realities. Many systems operate in isolated, air-gapped environments with no external connectivity, while others are bound by long-established security protocols and compliance requirements that restrict data movement and third-party tooling. As a result, widely used AI coding assistants such as Copilot or Cursor are generally impractical in classified or high-assurance contexts, leaving defence organisations reliant on slower, manual development workflows.

This structural gap has become increasingly significant as commercial software development rapidly integrates large language models into everyday coding and engineering processes. In contrast, government and defence environments risk being left behind, creating what industry observers describe as a widening “sovereign AI gap” between commercial innovation and secure-system deployment.

California-based Noah Labs AI is seeking to address this challenge with Sentinel, an AI-native integrated development environment designed to operate fully offline. The system is built to support AI-assisted software development within secure, disconnected environments, enabling engineers to use advanced tooling without compromising air-gapped architectures or regulatory constraints.

The company tells Impact Newswire that it has established partnerships and collaborations spanning defence, government and research institutions, including NASA, Lockheed Martin, In-Q-Tel, Stanford University, Georgetown University and StartX, positioning itself at the intersection of frontier AI research and national security applications.

Editor Faustine Ngila spoke to Noah Labs AI chief executive Murat Isik, who was raised in a military family and contributed to neuromorphic chip design with support from the US Department of Defense and Department of Energy.

Here is the full interview:

1. Defense and national security environments are often described as structurally constrained by air-gapped systems and legacy infrastructure. What are the most underestimated barriers preventing modern AI adoption in these settings?

Everyone talks about air gaps and procurement cycles, but those are not actually the hardest part. The real killer is data quality. You can have the best model in the world, but if your system is indexed around one domain and someone queries it about something adjacent, you get garbage outputs. Analysts try it once, get a bad result, lose trust, and the project quietly dies. There is no shortcut around that. AI cannot manufacture good mission data where none exists.

The other thing people underestimate is governance infrastructure. Most agencies still do not track whether their AI is actually working. A 2025 GAO review found fewer than a third of agencies track basic KPIs like error rates or decision cycle time. They count pilots, not outcomes. And then there is the talent problem. The DoD genuinely cannot compete with private sector salaries for AI/ML people, so the folks who could actually implement and maintain these systems often are not sticking around.

2. Sentinel is positioned as an AI-native IDE for offline and secure environments. How does it technically differ from mainstream developer tools like Copilot or Cursor in terms of architecture and deployment?

The core difference is where the compute lives. Copilot requires an internet connection, full stop. The plugin has to talk to Microsoft’s servers to function. Cursor is the same way, entirely SaaS-bound. That immediately disqualifies both of them for any classified or air-gapped environment. It is not a configuration issue, it is architectural.

What an offline-first IDE does differently is run everything locally or on on-premises hardware — inference, context handling, all of it. Nothing leaves the network perimeter. And critically, it has to be model-agnostic, because different customers have different approved models. One agency might run a fine-tuned open-source model, another might have a proprietary one. The infrastructure needs to support whatever they bring, without forcing a specific vendor’s model ecosystem on them.

3. When operating in fully air-gapped systems, how do you handle model updates, fine-tuning, and security patching without introducing external connectivity risks?

It is operationally harder than people realize. You cannot just push an update over the wire. The standard approach is signed offline bundles. You package the update, sign it cryptographically, and physically transfer it into the environment through a controlled process. Every patch needs to be reversible too, because if something breaks inside a classified system you need to roll back fast.

For fine-tuning, you are not doing full retraining. That is too expensive and slow. The practical approach is LoRA or similar parameter-efficient methods, where you are adding lightweight adapters on top of the base model trained on mission-specific data inside the enclave. Los Alamos went down this road publicly. They decided self-hosting was the only way to get the compliance posture they needed across their full range of work, and that is a real proof point.

4. Many sovereign AI initiatives emphasize control and independence, but often at the cost of performance or flexibility. Where do you see the real trade-offs between sovereignty, capability, and scalability?

The honest answer is that sovereign systems today do not match frontier commercial models on raw reasoning benchmarks. That gap is real. But for most defense and intelligence workflows, which are retrieval-heavy and document-analysis-heavy, the gap matters less than people think. You do not need frontier-level creative reasoning to extract key entities from a signals report. Retrieval accuracy and reliability matter more than generation fluency in those contexts.

The scalability trade-off is mostly upfront cost. You are buying hardware, standing up infrastructure, hiring people to manage it. No per-token fees after that, but the initial investment is significant. The smarter organizations are solving this by being model-agnostic at the LLM layer. They own the infrastructure and swap in better models as they become available. You get the security posture without being permanently locked to an underpowered model.

5. You reference partnerships with institutions like NASA, Lockheed Martin, and In-Q-Tel. How do the requirements of these organizations shape product design compared to commercial AI tools?

They push you toward things commercial tools never prioritize. Lockheed’s internal AI had to run on hardware ranging from a massive GPU cluster down to a four-node setup that fits in a briefcase, because edge deployment in the field is a real requirement, not a theoretical one. That kind of environment-agnosticism forces completely different architectural decisions than building for cloud-first enterprise customers.

The auditability and explainability requirements are also in a different league. Lockheed’s people are explicit that determinism and reliability are paramount — not just good enough outputs, but traceable and defensible ones. Lockheed built their own internal tool called LMText Navigator, runs it on-premises, and integrated Meta’s Llama into it for 40,000 employees, all inside their own hardened environment. The pattern is consistent: take proven commercial model weights, run them inside infrastructure you fully control. In-Q-Tel investments push in the same direction, favoring composability and integration with existing classified pipelines over wholesale replacement.

6. How do you address concerns that deploying powerful AI systems inside classified environments could increase operational risk if models behave unpredictably without external auditability?

The concern is valid. Stanford ran a wargaming exercise where LLMs, without human supervision, showed unpredictable escalatory behavior. The Air Force’s Venom program had an F-16 AI system execute maneuvers described as non-human, beyond its design boundaries. These are not hypotheticals anymore.

The Army’s response to this is a program called GUARD — Generative Unwanted Activity Recognition and Defense — specifically contracted to detect and analyze unpredictable AI behavior before and after deployment. It translates policies and behavioral rules into structured graphs so you can audit what the model is likely to do under different conditions.

For a product like Sentinel, the answer is partly architectural. Running fully local eliminates the data exfiltration risk class entirely. The behavioral unpredictability problem gets addressed through built-in audit logging and decision lineage. You know what inputs went in, what the model produced, and why. The auditability problem you face with a cloud tool becomes an advantage when you own the full stack.

7. Legacy code modernization in defense is often described as one of the hardest technical challenges in government IT. Where does AI realistically add value today, and where is the hype still ahead of reality?

The real value today is in comprehension and documentation, not replacement. If you hand an AI a 5,000-line COBOL program with no documentation, it can tell you what it does, map its dependencies, and help a developer who has never touched COBOL understand it. That is genuinely useful and saves weeks of work. Targeted refactoring of isolated modules is also real. IBM’s Watson Code Assistant has had wide adoption specifically for this.

Where the hype breaks down is the jump from translating code to modernizing a system. IBM’s own SVP said it publicly: translating code and modernizing a platform are not the same thing, and the gap is where most organizations hit trouble. Replacing a COBOL program does not replace the 40 years of business logic, edge-case handling, and institutional knowledge baked into how that system runs. Every serious modernization program still needs human subject matter experts. AI compresses the timeline, it does not eliminate the complexity.

8. Looking ahead, do you see sovereign AI systems converging toward commercial LLM ecosystems over time, or remaining fundamentally separate due to security constraints?

Probably neither cleanly. What is actually happening is a split at the layer level. The infrastructure layer — meaning compute, networking, data, and audit — is staying sovereign, and that is unlikely to change given the geopolitical dynamics. If you are dependent on a foreign commercial API and that relationship breaks down, your entire operation is at risk overnight. That is not a trade governments are willing to make.

But the model layer is already partially converging. Agencies are running Meta’s Llama, Mistral, and similar open-weight models on their own hardware. They are getting access to frontier-quality model weights while keeping the inference infrastructure firmly inside their perimeter. The trajectory is a sovereign stack with commercial models ingested under controlled conditions. Not convergence, not permanent separation, but selective bridging. The infrastructure stays owned. The models get evaluated and swapped in when they are approved. That is probably where this lands for the next decade at least.

Stay ahead of the stories shaping our world. Subscribe to Impact Newswire for timely, curated insights on global tech, business, and innovation all in one place.

Dive deeper into the future with the Cause Effect 4.0 Podcast, where we explore the ideas, trends, and technologies driving the global AI conversation.

Got a story to share? Pitch it to us at info@impactnews-wire.com and reach the right audience worldwide

Faustine Ngila

Faustine Ngila is the AI Editor at Impact Newswire, based in Nairobi, Kenya. He is an award-winning journalist specializing in artificial intelligence, blockchain, and emerging technologies.

He previously worked as a global technology reporter at Quartz in New York and Digital Frontier in London, where he covered innovation, startups, and the global digital economy.

With years of experience reporting on cutting-edge technologies, Faustine focuses on AI developments, industry trends, and the impact of technology on society.