TheMurrow

The Quiet Revolution: On-Device AI Moves In

AI is shifting from cloud tabs to local inference—reshaping privacy, speed, and what “AI-ready” hardware really means across Windows, Apple, and Android.

By TheMurrow Editorial
February 8, 2026
The Quiet Revolution: On-Device AI Moves In

Key Points

  • 1Track the shift from cloud-first AI to local inference as NPUs hit 40+ TOPS and make everyday features faster and more private.
  • 2Treat TOPS as directional, not definitive: measurement varies by precision, thermals, bandwidth, and software support across Apple, AMD, Intel, and Qualcomm.
  • 3Demand clarity about data flow: on-device by default is rising, but hybrid designs like Apple’s PCC still escalate complex tasks to the cloud.

A few years ago, “AI” on your laptop meant a cloud tab somewhere: type a prompt, wait for a server, accept the result. Now the industry is betting on something less visible and more consequential—AI that runs where you are, not where the data center is.

The shift is measurable. Microsoft’s definition of a modern “AI PC” has a line in the sand: an NPU capable of 40+ TOPS (trillions of operations per second) for many Copilot+ class experiences. Apple’s newest silicon markets a 38 TOPS Neural Engine on the M4. AMD talks about 50+ NPU TOPS in Ryzen AI 300, while press coverage around Qualcomm and Intel centers on ~45 TOPS NPUs as table stakes.

Behind those numbers sits a bet about privacy, latency, cost, and control. If a model can process your text, voice, or images locally, your device can feel faster—and in some cases, safer—because the raw material never needs to leave your hands.

“The next privacy debate won’t be about what an app collects. It’ll be about where your AI thinks.”

— TheMurrow Editorial

What “on-device AI” actually means (and why it’s suddenly everywhere)

On-device AI is a simple promise with complicated engineering: model inference runs locally on your phone, tablet, or PC instead of sending your request to a cloud model. The practical effect is that tasks like summarizing text, processing audio, or analyzing an image can happen without uploading that content for remote processing.

The reason it’s having a moment is not purely philosophical. Hardware finally crossed a threshold that platform companies can market and developers can rely on. Microsoft’s documentation for Windows AI features describes many experiences as requiring an NPU capable of 40+ TOPS, a spec now treated as foundational for Copilot+ PCs. That’s not a vague aspiration; it’s a product category built around a measurable capability. (Microsoft’s Copilot+ messaging similarly foregrounds 40+ TOPS as a headline requirement.)

Apple has taken a parallel route, but with a different emphasis. In June 2024, Apple described on-device processing as the “cornerstone” of Apple Intelligence, framing local inference not as a performance trick but as a privacy posture. Google, for its part, has highlighted Gemini Nano powering Android features—especially security and accessibility functions—that can work locally and, in some cases, offline.

The consumer-facing benefits (when it works)

On-device AI can meaningfully improve day-to-day computing in a few clear ways:

- Speed: Local processing can reduce round trips to a server, making features feel immediate.
- Privacy: Sensitive inputs—audio snippets, screenshots, personal documents—don’t need to leave the device for every task.
- Reliability: Some features can function with limited connectivity, which matters more than people admit until they’re traveling.

The caveat is embedded in the phrase “when it works.” Smaller on-device models can’t always match the breadth of the largest cloud models. That limitation is shaping the next phase of AI product design.

The 40 TOPS era: NPUs become first-class citizens

For years, PCs sold on CPU cores and GPU performance, with AI treated as a side benefit. The Copilot+ push changes that hierarchy. Microsoft’s guidance makes the NPU—the dedicated neural processing unit—a primary buying criterion for modern Windows AI experiences, with 40+ TOPS framed as a baseline for many features.

That single figure has done something rare in consumer computing: it’s created a shared spec target across an ecosystem that usually resists standardization. OEMs can build a sticker around it. Developers can build features with the assumption that a certain class of silicon exists. Buyers get an easy metric, even if it’s an imperfect one.

Chipmakers have responded accordingly. AMD has positioned its Ryzen AI 300 and PRO 300 series as exceeding Copilot+ requirements, citing 50+ NPU TOPS in its CES announcements and positioning that performance as a reason to treat AI as a native workload. Press coverage around Qualcomm Snapdragon X laptops often repeats the same anchor—~45 TOPS NPUs—because it maps neatly onto Microsoft’s certification narrative. Reporting on Intel’s Lunar Lake has similarly emphasized ~45 TOPS from the NPU as part of “next-gen AI PC” expectations, even as the details often arrive via tech press rather than a single definitive consumer datasheet.

“TOPS became a spec because Microsoft needed a number the market could rally around.”

— TheMurrow Editorial

What the “AI PC” label really signals

The label matters less than the design decisions it enforces. An “AI PC” in this framing is a machine where:

- The NPU isn’t a checkbox; it’s a scheduling target for AI workloads.
- The OS and apps are expected to run some inference locally as a normal behavior.
- The device is marketed as more personal—because it can do more without a server.

That vision still has friction: software maturity varies, and cross-vendor performance comparisons remain messy. But the direction is set.
40+ TOPS
Microsoft’s line-in-the-sand baseline for many Copilot+ class Windows AI experiences—turning NPU throughput into a mainstream buying spec.
38 TOPS
Apple’s stated M4 Neural Engine throughput, positioned around efficient on-device inference and privacy-forward product messaging.
50+ NPU TOPS
AMD’s cited Ryzen AI 300 / PRO 300 positioning to exceed Copilot+ requirements and treat AI as a native workload class.
~45 TOPS
A frequently repeated NPU figure in press coverage of Qualcomm Snapdragon X systems and Intel Lunar Lake expectations for “next-gen AI PCs.”

TOPS is a headline spec—and an imperfect one

TOPS has the charm of simplicity: more seems better. Yet TOPS is not a universal yardstick in the way consumers might assume. Vendors can quote performance at different numerical precisions (for example, INT8 vs INT4), include or exclude different compute blocks, and cite peak performance that is hard to sustain under real workloads.

AMD’s own footnotes acknowledge that TOPS varies by model, software, and configuration. That’s a polite way of saying two chips with similar TOPS figures may behave differently once thermal limits, memory bandwidth, and software optimization show up to the party.

Apple’s marketing demonstrates the comparison problem in a different way. Apple states the M4 Neural Engine delivers 38 TOPS and frames it around efficient, private inference. Meanwhile, the Windows ecosystem often treats 40+ TOPS as a categorical requirement. The numbers sit near each other, but the platforms are not symmetrical: they differ in model deployment strategies, OS integration, and how much work is expected from the NPU versus CPU/GPU.

How readers should interpret TOPS

A sensible reading of TOPS looks like this:

- TOPS tells you the device is aimed at local inference. It’s a signal of intent as much as capability.
- TOPS alone won’t predict your experience. Software support and OS-level integration can matter more than raw peak throughput.
- Cross-brand comparisons are hazardous. Treat them as directional, not definitive.

“TOPS can tell you a chip is trying. It can’t tell you whether the software will.”

— TheMurrow Editorial

The industry’s rush to advertise TOPS is understandable. The danger is that buyers treat it as the only number that matters, repeating a familiar mistake from the early days of megapixels and GHz.

Privacy becomes a product feature—by design, not accident

On-device AI is often sold as a privacy win, and in many cases it is. If inference happens locally, the content you feed into a feature does not need to travel to a server for processing. That changes the default risk profile, especially for personal data—messages, documents, voice, images, and whatever else your computer can see.

Apple has made privacy the central argument. In June 2024, Apple described on-device processing as the cornerstone of Apple Intelligence, claiming it can deliver personal intelligence without collecting users’ data (Apple’s phrasing). The company’s message is not subtle: the safest data is the data that never leaves your device.

But Apple also acknowledges the limits of local compute. Some requests are too large or complex for small models running on a phone or laptop. For those tasks, Apple routes processing to Private Cloud Compute (PCC)—a hybrid design meant to preserve privacy properties while expanding capability.

Apple’s Private Cloud Compute: a hybrid model with auditable claims

According to Apple, PCC works under a few strict rules:

- Only data relevant to the request is sent.
- Data is not stored and is not accessible to Apple.
- Independent experts can inspect and verify the code running on PCC servers.

Those claims are notable because they treat privacy as something that can be evaluated, not merely promised. Verification language—inspection by independent experts—signals that Apple expects skepticism and is trying to meet it with process.

A fair reading includes competing perspectives. Skeptics will point out that “privacy” is not a binary. Even a well-designed system can be misunderstood by users, misconfigured by developers, or undermined by future policy choices. Still, PCC represents a concrete attempt to reconcile powerful models with privacy expectations—a design pattern others are likely to borrow.

Key Insight

On-device inference changes the default risk profile: text, audio, images, and documents may be processed locally instead of uploaded for remote inference.

The new software bargain: local by default, cloud when it counts

The most realistic future is not purely on-device or purely cloud. It’s conditional: devices do what they can locally, then escalate to the cloud when the task is too big. Apple has articulated that logic explicitly: on-device by default with Private Cloud Compute for overflow. Google’s Gemini Nano messaging similarly suggests a tiered approach, with local models powering certain real-time protections and accessibility features.

Microsoft’s approach is ecosystem-driven. By anchoring the Copilot+ category to 40+ TOPS NPUs, Microsoft is encouraging developers to assume a baseline of local capability in new PCs. That matters because it shifts cost and latency away from cloud infrastructure, while also reducing how often personal context needs to be transmitted.

Practical implications for readers

For everyday users, this “local-first” bargain changes what’s worth paying attention to:

- Offline usefulness becomes real again. Features that work without a connection matter on planes, in rural areas, or during outages.
- Sensitive workflows feel less risky. Summarizing a document or processing a screenshot locally is a different privacy proposition than uploading it by default.
- Battery and thermals become part of the AI story. A device can have impressive peak TOPS and still disappoint if sustained workloads drain the battery or throttle performance.

The trade-off is that local models may be narrower: great for quick transforms, recognition tasks, and structured assistance; less great for expansive, open-ended reasoning. The smartest products will be honest about which mode is in use and why.

Editor’s Note

The most durable question to ask isn’t “Does it have AI?” but “When does it run locally, and when does it escalate to the cloud?”

Real-world examples: where on-device AI actually earns its keep

On-device AI can sound abstract until you place it inside specific features people use—or refuse to use—because of trust.

Google’s Android messaging around Gemini Nano highlights a class of applications where local inference matters: real-time protection and accessibility features that can work offline and locally. Those are not novelty features. Security tools benefit from immediacy, and accessibility tools benefit from reliability. If connectivity is required for every interaction, the tools fail precisely when they’re most needed.

Apple’s case study is the privacy architecture itself. Apple’s June 2024 statements about on-device processing and PCC are not just about performance; they’re about enabling personalization without normalizing data collection. The company is making a bet that users will increasingly ask: “Where did my data go?” before they ask: “How clever was the answer?”

On Windows, the case study is the hardware category. Copilot+ PCs exist to prove that local AI isn’t limited to phones. Microsoft’s 40+ TOPS NPU baseline is an attempt to guarantee that a Windows laptop can run modern AI experiences without leaning on the cloud for every action.

A grounded way to evaluate “on-device” claims

When a vendor says a feature is on-device, readers should ask:

- Does the feature work without an internet connection?
- Is the model running locally for the entire task, or only parts of it?
- If the cloud is used, is the system explicit about what is sent and what is retained?

On-device AI earns trust when companies answer those questions clearly, in plain language, and in settings users can verify.

On-device AI: a quick verification checklist

  • Does the feature work without an internet connection?
  • Is the model local for the entire task, or only partial?
  • If the cloud is used, what is sent—and what is retained?

The competitive race: performance, privacy, and the next spec escalation

The NPU spec race is already accelerating. AMD touts 50+ NPU TOPS in Ryzen AI 300. Apple states 38 TOPS for the M4 Neural Engine. The Windows ecosystem repeatedly spotlights 40+ TOPS as the Copilot+ baseline, while press coverage pegs Qualcomm Snapdragon X systems at ~45 TOPS NPUs and anticipates even higher claims in next-gen parts.

The pressure is predictable. Once a market adopts a single metric, companies compete to exceed it. The risk is also predictable: performance marketing can outrun practical value. More TOPS won’t automatically deliver better user experiences if developers can’t easily deploy models across different NPUs, or if key features remain gated by cloud services.

A second risk is conceptual. Privacy can become another spec—something you “have” rather than something you continuously prove. Apple’s move to invite independent inspection of PCC code is significant precisely because it treats privacy as verifiable. Other vendors will need their own equivalent of that posture if they want privacy to be more than a slogan.

The optimistic view is that competition will improve both performance and privacy. The realistic view is that readers should expect a messy transition period: inconsistent feature availability, unclear “on-device” labeling, and devices that advertise AI readiness without delivering meaningful local workflows.

Key Insight

Once one metric becomes mainstream, marketing converges on it. The harder work is software: deployment, optimization, labeling, and clear local-vs-cloud disclosures.

Conclusion: The device is becoming the place where AI lives

The industry is slowly admitting a truth it avoided during the first wave of consumer AI: sending everything to the cloud is not a neutral design choice. It has consequences for privacy, latency, cost, and trust.

On-device AI narrows those trade-offs. Microsoft’s 40+ TOPS Copilot+ baseline, Apple’s on-device-first posture with Private Cloud Compute, and Google’s emphasis on local Gemini Nano features all point to the same future: more inference will happen where the user is, not where the server is.

The open question is not whether on-device AI will grow. The open question is whether companies will communicate its boundaries honestly—when data stays local, when it doesn’t, and why. Readers don’t need marketing promises. Readers need clarity, because the most personal computer has always been the one that knows the most about you.

“Convenience used to be the reason we accepted the cloud. Privacy is becoming the reason we demand the device.”

— TheMurrow Editorial
T
About the Author
TheMurrow Editorial is a writer for TheMurrow covering technology.

Frequently Asked Questions

What is on-device AI in plain terms?

On-device AI means the AI model runs on your phone, tablet, or PC, processing your input locally rather than sending it to a cloud server for inference. That can reduce latency and can improve privacy because text, audio, images, or screen content may not need to be uploaded for processing. Many products still use a hybrid approach for larger tasks.

What is an NPU, and why does it matter?

An NPU (neural processing unit) is a dedicated accelerator designed to run AI inference efficiently. Microsoft’s Copilot+ PC guidance treats NPU performance as central, with many Windows AI experiences described as requiring 40+ TOPS. The practical benefit is that more AI workloads can run locally without relying on the cloud for every interaction.

What does “40 TOPS” mean, and should I shop by it?

TOPS stands for trillions of operations per second, a measure often used to describe peak AI throughput. Microsoft uses 40+ TOPS as a baseline for Copilot+ class PCs. TOPS can indicate whether a device targets local AI seriously, but it’s not a perfect comparison tool across brands because vendors may measure differently and quote peak numbers.

Is Apple’s “Private Cloud Compute” still the cloud?

Yes—Private Cloud Compute (PCC) is cloud processing for tasks too large for on-device models. Apple says only relevant data is sent, the data is not stored, and it is not accessible to Apple. Apple also says independent experts can inspect and verify code running on PCC servers. It’s a hybrid approach meant to extend capability without adopting conventional cloud data practices.

Does on-device AI mean my data never leaves my device?

Not automatically. Some systems run locally by default but may use cloud processing for complex requests. Apple explicitly describes this split: on-device when possible, PCC when needed. The best way to evaluate a product is to look for clear disclosures about when cloud processing is used, what data is sent, and whether anything is stored.

Are Apple’s TOPS numbers comparable to Windows AI PC TOPS numbers?

Only loosely. Apple states the M4 Neural Engine is 38 TOPS, while Microsoft frames Copilot+ PCs around 40+ TOPS NPUs. Differences in measurement precision, what’s included in the performance figure, and software deployment strategies make direct comparisons unreliable. Treat TOPS as a directional indicator, then look for real feature support and practical reviews.

More in Technology

You Might Also Like