If every user runs their own model, don't we lose the economies of scale that make AI cheap in the first place?

Economies of scale apply to training and to unpredictable elastic spikes — and those loads stay in the cloud. But repeated, predictable inference has no economies of scale in its favor: each cloud call is a permanent charge, whereas local inference costs electricity once the hardware is paid for. For stable, sensitive volume, the edge is cheaper, not more expensive — the cloud's scale stops being an argument.

How does a company justify investing in local AI if open models go obsolete every few months?

It's exactly the opposite: rapid obsolescence is an argument for local, because swapping a model file in your runtime is trivial, whereas migrating off a deprecated cloud API breaks all your prompt engineering and evaluations. With local-first you decouple your application from the commercial lifecycle of a specific vendor. The model is a replaceable part, not a marriage.

Does local privacy actually solve anything, or is it security theater? My operating system and drivers already send telemetry all over the place.

It's a categorical difference, not a cosmetic one. 'We don't train on your data' is a promise that can be broken, reinterpreted, or punctured in a leak; 'the data doesn't leave the machine' is an architectural guarantee — there's no transmission, so there's nothing about that specific data to leak. OS telemetry is a real and separate problem, which you attack at the OS level; it doesn't negate the gain of keeping the sensitive content off of any inference server.

For regulated sectors (healthcare, legal, finance), is local-first a luxury or already a requirement?

For many cases it's already the only legal route, not a luxury. Sending patient data, a sealed court filing, or financial information to a third-party processor is frequently prohibited by LGPD, GDPR, or sector regulation, regardless of how good the vendor's promise is. In those sectors, local AI isn't the worse version — it's the only version that can even be used on the real data.

You talk about 'continuity no one can switch off.' In practice, how is that different from just backing up my conversations?

A backup preserves the record; continuity preserves the living relationship. The difference is that local AI memory is an executable asset of yours — it keeps learning, holds context, and runs even if the original vendor disappears, because the model and the memory are on your disk. A backup of cloud conversations gives you a dead archive if the service that gave it meaning is shut down; local continuity gives you a system that still works on its own.

If local AI is so inevitable, why are the big cloud companies investing billions in datacenters and not at the edge?

Because their business model depends on you consuming in the cloud, and they're racing against their own commoditization — embedding AI in everything and creating behavioral lock-in before the local alternative gets too good to ignore. Investing at the edge would weaken the per-token revenue that sustains the valuation. The inevitability of local doesn't come from them wanting it; it comes from physics and economics pushing in the direction opposite their interest.

What's the practical architecture for someone who wants to start now without rewriting everything?

Invert the default: local first, cloud by explicit exception. Start by routing to a local model every inference that is frequent, sensitive, or latency-critical, and keep a cloud fallback only for the tasks that demonstrably exceed the local silicon. The gain shows up early because those frequent tasks are most of the volume and the cost, and you start to know exactly which byte leaves your machine and why.

Essays

Technology·2026-06-18·16 min read

Computational sovereignty: why AI needs to come back to your machine

Intelligence has become a rented service. The next cycle is intelligence that runs on hardware you own — and that no one can switch off.

ShareX LinkedIn

Key points

01Cloud dependency isn't a technical accident — it's the business model. Whoever charges by the token has a structural incentive for you to never have a local alternative that's good enough. AI lock-in is behavioral (your prompts, your evaluations, your users who got used to it) and that's exactly why it's invisible until the moment the model is deprecated and everything turns to garbage overnight.
02Sovereignty is charged on four concrete axes: latency, marginal cost, privacy, and continuity. Network latency makes AI in the loop of action unviable; cost per token keeps you from letting the intelligence think freely; 'we don't train on your data' is a contractual promise, not an architectural guarantee; and continuity hosted on a third party's server evaporates the day of the pivot or the bankruptcy.
03The technical window has opened now. Three curves crossed — open models got genuinely good, consumer hardware became AI hardware (unified memory, idle NPUs), and the software layer matured. Anyone who still thinks local AI is a toy is looking at a photo from eighteen months ago.
04The future isn't local OR cloud — it's local by default, cloud by conscious exception. The inversion is the point: today the default is dependency and sovereignty takes effort; local-first architecture does the opposite. You know exactly which byte leaves your machine because leaving is the exception you authorized, not the invisible rule.
05AI sovereignty is cognitive sovereignty. AI introduces judgment into the machine — it decides what's relevant, what's appropriate, what to refuse. When that decision layer lives on a third party's server, you've outsourced part of your own discernment to an entity with its own agenda. Whoever owns the decision layer owns the future they build on top of it.

The contract you signed without reading

Every time you send a prompt to the cloud, you perform a small act of faith. You believe the company on the other end will still exist next week. You believe the model you use today will answer tomorrow the same way it does now. You believe the price won't triple once you're already dependent. You believe your data — the email you pasted, the contract you asked it to review, the medical diagnosis you dumped into the chat — won't train a competitor's model, leak in an incident, or become evidence in a subpoena. You didn't read the contract. Nobody reads it. And the contract can change at any moment, retroactively, without you being notified in any way that matters.

This is the real architecture of AI in 2026. Not the architecture of transformers and attention — that one is public, it's in the papers. The economic and political architecture: a handful of companies control the cognitive substrate that is being sewn inside of everything. Your text editor, your email client, your IDE, your medical record, your CRM. Intelligence has stopped being a feature and become infrastructure. And infrastructure is not something you rent from whoever can cut off the supply. Nobody builds a factory on top of an electrical grid that the neighbor can switch off on a whim.

The cloud sold us a genuine convenience and charged for it a price that only becomes visible later. The price is sovereignty. And sovereignty is one of those things you don't notice you've lost until you need it — until the day the API changes its usage policy, deprecates the model your entire company depends on, or simply decides that your use case violates terms of service that were rewritten in the small hours of a Tuesday.

The dependency isn't accidental, it's the business model

Let's be precise about what happened. The current generation of generative AI was born in the cloud for a legitimate technical reason: training and serving frontier models required GPU clusters that no one had at home. That makes sense. But what began as necessity became deliberate design. The dominant business model of AI is not selling intelligence — it's renting dependency metered by the token.

Think about what that means structurally. Every interaction you have is a billing event. Every productivity gain of yours becomes recurring revenue for someone else. Your success is their variable cost, and the optimal design for whoever charges by the token is to make you incapable of functioning without the token. OpenAI, Anthropic, Google — all of them have an economic interest in you never having a local alternative that runs well enough. It isn't malice. It's gravity. It's what any rational vendor does when the unit of billing is consumption and the moat is the impossibility of leaving.

And AI lock-in is deeper than traditional software lock-in. When you depended on AWS, you could, with pain, migrate to Google Cloud. The primitives were similar: a VM is a VM, a bucket is a bucket. But with AI, the lock-in wraps itself around behavior. You tuned your prompts to the temperament of a specific model. You built evaluations on top of one response pattern. Your users got used to one voice. When the vendor deprecates that model — and they do, regularly, because serving old versions is expensive — all of your prompt engineering becomes garbage overnight. You rewrite everything. You rerun the tests. You re-earn your users' trust. The switching cost isn't technical, it's behavioral, and that's exactly why it's invisible until the moment you hit it.

There was a moment, a few years ago, when entire startups were built as a thin wrapper over an OpenAI API. The market's joke was cruel and precise: "that's a feature, not a company." What no one said with the same candor is that most large companies also became wrappers. Just with more employees and more to lose. The difference between the wrapper startup and the corporation is that the corporation takes longer to discover that it doesn't control the most central component of its own product.

What you actually lose: latency, cost, privacy, continuity

Sovereignty sounds abstract, so let's land it on the four concrete axes where cloud dependency charges its price.

Latency. Every network call has a physical floor that no amount of money can buy: the speed of light and the topology of the internet. Your prompt leaves your machine, crosses the country or the ocean, waits in line at a datacenter, gets processed, and comes back. That's hundreds of milliseconds in the best case, seconds in the real case, and timeout in the bad case. For a chat, fine — you read slowly anyway. But AI is ceasing to be chat. It's becoming the layer that completes your code as you type, that transcribes your meeting in real time, that drives an agent making a hundred chained calls to solve a task. When intelligence needs to be inside the loop of action, network latency stops being an inconvenience and becomes an impossibility. A local model responds at the speed of the silicon in front of you, not at the speed of the transatlantic roundtrip. For everything that is genuinely interactive, this isn't an incremental improvement — it's the difference between viable and unviable.

Cost. The cost per token is falling, true, and cloud defenders point to that constantly. But the marginal cost per inference in the cloud never reaches zero — by construction it can't, because it's revenue. The marginal cost of an inference on your machine, once you've paid for the hardware, is the price of the electricity that chip burns for a few seconds. Close to zero. That difference completely changes which applications make economic sense. When each inference costs money, you ration. You don't let an agent think ten thousand times about a problem because the bill is frightening. When inference is practically free, you unlock entire classes of use that were prohibitively expensive: continuously indexing all of your files, running an assistant that thinks in the background all day long, letting models talk to each other for hours to refine an answer. The economics of zero marginal cost isn't "cheaper" — it's a different frontier of possibility.

Privacy. This is the axis where the hypocrisy of the cloud pitch is most naked. "We don't train on your data" is a contractual promise, not an architectural guarantee. The difference matters enormously. A promise can be broken, reinterpreted, voided by an acquisition, or simply punctured by a security incident. An architectural guarantee is when the data physically doesn't leave your machine — there's nothing to leak because there's no transmission. For an individual, this is the difference between trusting and not needing to trust. For a hospital, a law firm, a bank, a company under LGPD or GDPR, it's the difference between being able and not being able to use AI on sensitive data at all. There are entire sectors paralyzed today not for lack of good models, but because sending the data to a third party is legally impossible. Local AI isn't a worse version for those cases — it's the only version that exists.

Continuity. This is the one talked about least and the one that hurts most. AI is becoming memory. Not just a tool — memory. It accumulates your context, learns your patterns, holds the thread of your conversations, becomes an extension of your cognition that enriches itself over time. And that continuity, today, is hosted on a server you don't control. The day the company changes owners, pivots, goes bankrupt, or simply decides to shut that product down, your continuity evaporates. You don't lose an app. You lose a piece of your externalized mind. We've already seen this happen with cloud services that disappeared and took years of data with them. With AI, what disappears isn't just files — it's the continuity of the relationship. The intelligence that runs on your machine is the only one no one can switch off remotely. The continuity you own is the only real continuity.

Sovereignty isn't privacy — it's power over the decision layer

Some people reduce all of this to privacy, and privacy is the easiest argument to sell. But it's the least important argument. What's at stake is more fundamental: who controls the decision layer.

For decades, computing was neutral in the sense that software did exactly what you told it to. A dumb determinism, but predictable and yours. AI breaks that. It introduces judgment into the machine. It decides what is relevant, what is appropriate, what to refuse, how to frame. And that judgment is trained and tuned by whoever made the model, according to values, regulatory pressures, and commercial interests that are not yours. When that layer of judgment lives on a third party's server, you've outsourced part of your own discernment to an entity with its own agenda.

This is already concrete. Cloud models refuse legitimate tasks because a safety filter calibrated for the average case thinks it might cause trouble. They change behavior between versions in a way you neither control nor get warned about. They carry embedded political and cultural biases that reflect where they came from. For a casual chat, irrelevant. For a system in which AI is the layer that mediates your decisions — what you read, what you write, what gets filtered before it reaches you — the question of who tunes that judgment is the central political question of the coming decade. AI sovereignty is, at bottom, cognitive sovereignty. It's retaining the right to have the intelligence you use serve your interests, not the vendor's.

Countries understood this before individuals did. That's why there's a race for "sovereign AI" at the national level — France, India, the Emirates, everyone wanting their own models running on their own infrastructure. They realized that depending on another power's cognitive layer is a form of vassalage that makes oil dependency look mild. What holds for nations holds, at scale, for companies and individuals. AI sovereignty is fractal: the same argument repeats at every level where there's an agent who doesn't want its own discernment rented from a landlord.

The technical window has opened — and most people didn't notice

All of this would be pretty, useless philosophy if local AI didn't work. Two years ago, it really was unviable: the models that ran on a laptop were toys, and the ones that were any good required a datacenter. The sovereignty argument ran straight into the reality of hardware. That argument is dead.

Three curves crossed. First: open models got good. Not "good for being free" — good. Models that fit in the memory of a consumer machine today do what required the cloud frontier a year and a half ago. Distillation, quantization, and more efficient architectures compressed capability in a way no one predicted at the speed it happened. A quantized model occupying a few gigabytes reasons, writes code, and follows instructions at a level that would have seemed like fiction recently.

Second curve: consumer hardware became AI hardware. Apple's chips with unified memory let a laptop load models that used to require server cards, because the CPU, the GPU, and the neural engine share a large pool of fast memory. This wasn't originally designed for local AI, but it turned out to be the ideal architecture for it. And it's not just Apple — the entire PC industry is embedding dedicated NPUs. The hardware you buy to work already ships with inference silicon to spare, idle most of the time, waiting for software that knows how to use it.

Third curve: the software layer matured. Running a model locally stopped requiring a PhD in ML engineering. Runtimes package everything, model formats standardized, and the friction of installation dropped to the level of installing any old application. The combination of these three curves means that local-first AI left the "hobbyist experiment" category and entered the "defensible architecture decision" category. Anyone who still thinks local AI is a toy is looking at a photo from eighteen months ago.

The window is open now, and that's why this is the moment. The cloud companies know it better than anyone — that's why they're racing to embed AI everywhere, to create behavioral lock-in, to tie the developer to the API before the local alternative gets too good to ignore. It's a race against their own commoditization. And historically, when capability commoditizes, value migrates from the component to whoever controls the relationship with the user and the data — that is, back to the edge, to the machine the person owns.

The honest tension: the cloud won't die

I'm not going to sell you a Manichaean future where the cloud is evil and local is salvation. That would be dishonest, and dishonesty weakens the real argument. The economies of scale of the cloud are real and powerful, and there are entire classes of problem where it wins and will keep winning.

Training frontier models will keep being the business of those with billions in GPU. That doesn't move to the laptop, ever. The tasks that genuinely require the largest possible model — the deepest reasoning, the longest context, the absolute frontier of capability — will keep running in the datacenter, because the physics of computation favors concentration when the model is gigantic. The cloud also wins when you need brutal elasticity: unpredictable spikes, loads that go from zero to millions and back. Provisioning local hardware for your worst day is waste; renting the spike is rational.

The right question, therefore, isn't "local or cloud." It's "which inference lives where." And the answer that is emerging is a hybrid architecture with a clear principle of gravity: the default is local, and the cloud is the justified exception. The local model handles the volume — the code completion, the transcription, the semantic search across your files, the agent that thinks all day, everything that is frequent, sensitive, or latency-critical. The cloud comes in when, and only when, the specific task exceeds what the local silicon can do, and when the data for that task can legitimately leave. This inverts the current default, in which everything goes to the cloud out of architectural laziness and only stays local when someone fights for it.

That inversion is the whole point. Today the default is dependency and sovereignty is the special case that takes effort. Local-first architecture does the opposite: sovereignty by default, dependency by conscious exception. You know exactly which byte leaves your machine and why, because leaving is the exception you authorized, not the invisible rule. The economies of scale of the cloud still exist — they just stop being the place where your entire computational life lives out of inertia.

What changes when the intelligence is yours

Let me draw concretely what becomes possible when intelligence runs on hardware you own, because this is where the argument moves from defense to offense. Sovereignty isn't only about avoiding losses. It's about unlocking things that cloud dependency makes impossible.

An assistant that knows everything about you — all your files, emails, conversations, the entire history of your digital life — without any of it ever leaving your machine. In the cloud, that assistant is a privacy nightmare that no serious company would build and no cautious individual would use. Locally, it's trivial and safe, because the index of your life never touches someone else's server. The most intimate and most useful AI possible is precisely the one that cannot exist in the cloud.

Continuity that accumulates and no one can switch off. An AI memory that grows with you over years, that holds the context of everything, that becomes a layer of your cognition — and that lives in a file on your disk, which you back up, copy, carry to the next machine, bequeath after you're gone. Not a database on a server that can vanish in a corporate pivot. Your continuity becomes an asset of yours, not a balance in an account that can be closed.

Real offline operation, which seems like a detail and isn't. On the plane, in the field, in a bad-connectivity zone, in a crisis where the internet goes down. The AI that depends on the cloud is the AI that abandons you exactly when you're most isolated and need it most. The intelligence that lives on your machine works in the apocalypse, works in the subway, works when the undersea cable snaps. Resilience isn't a paranoid luxury — it's the basic property of any infrastructure you take seriously.

And perhaps the most important, the composable. When the intelligence is yours and local, you can tinker with it. Tune it, specialize it, connect it to your data, chain it with your systems, make it do exactly what you need without asking permission from a terms-of-service document. Cloud AI is a black box behind an API that defines what you can and can't do. Local AI is a piece of software under your control. The difference between renting a car with the engine sealed and owning a machine whose hood you can open and modify is the difference between using and owning. And whoever owns the decision layer owns the future they build on top of it.

The cycle that's coming won't be defined by whoever has the biggest model in the biggest datacenter — that's the cycle that's ending, the cycle of maximum centralization. The next one is the cycle of redistribution: intelligence good enough, running cheap enough, on the hardware that billions of people already carry in their pocket and their backpack. The history of computing is a pendulum between the mainframe and the personal, between the centralized and the edge, and AI is making exactly the same arc the mainframe made when it became the PC and the landline made when it became the device in your hand. It started central because it had to start that way. It won't end central. Intelligence will come back to your machine not because it's a noble cause, but because it's the equilibrium point toward which physics, economics, and the human desire for sovereignty push together. The question isn't whether this happens. It's whether you'll be building on the right side of the pendulum when it completes the arc — or still signing, every month, the lease on your own mind.

FAQ

Because the right question isn't 'what's the best model in the world,' it's 'what's the best inference for this specific task.' The overwhelming majority of what you do — completing code, transcribing, searching your files, classifying — doesn't need the frontier; it needs 'good enough, instant, private, and free at the margin.' You reserve the cloud for the few tasks that genuinely require the largest possible model, and you run the rest locally, which is the volume.

About the author

Andre Ambrósio

Founder. Systems builder. Signal reader. I spend my days understanding how technology, business, health and AI are reorganizing — and articulating what comes next.

Instagram ↗TikTok ↗YouTube ↗Facebook ↗

Keep reading

Technology

The End of Software: When the Interface Dissolves and the System Begins to Generate Itself

For decades, software was screen, button, and menu — a frozen machine the human operated. That contract is ending. The next software isn't operated: it's instructed, and it rewrites itself in real time for every person who touches it.

Artificial Intelligence

AI as a decision layer: the loop that separates those who built a system from those who bought a tool

Most companies have AI the way they have a vacuum cleaner: grab it, use it, put it away. The structural turn is something else entirely — it's when intelligence stops being an endpoint and becomes the tissue where every workflow reads context, decides, and learns.

— End of essay —

The next cycle, before the headline.

An occasional letter: one reading, one architecture, one signal. No noise, no rush.