The Quiet Revolution: On-Device AI Moves In
AI is shifting from cloud tabs to local inference—reshaping privacy, speed, and what “AI-ready” hardware really means across Windows, Apple, and Android.

Key Points
- 1Track the shift from cloud-first AI to local inference as NPUs hit 40+ TOPS and make everyday features faster and more private.
- 2Treat TOPS as directional, not definitive: measurement varies by precision, thermals, bandwidth, and software support across Apple, AMD, Intel, and Qualcomm.
- 3Demand clarity about data flow: on-device by default is rising, but hybrid designs like Apple’s PCC still escalate complex tasks to the cloud.
A few years ago, “AI” on your laptop meant a cloud tab somewhere: type a prompt, wait for a server, accept the result. Now the industry is betting on something less visible and more consequential—AI that runs where you are, not where the data center is.
The shift is measurable. Microsoft’s definition of a modern “AI PC” has a line in the sand: an NPU capable of 40+ TOPS (trillions of operations per second) for many Copilot+ class experiences. Apple’s newest silicon markets a 38 TOPS Neural Engine on the M4. AMD talks about 50+ NPU TOPS in Ryzen AI 300, while press coverage around Qualcomm and Intel centers on ~45 TOPS NPUs as table stakes.
Behind those numbers sits a bet about privacy, latency, cost, and control. If a model can process your text, voice, or images locally, your device can feel faster—and in some cases, safer—because the raw material never needs to leave your hands.
“The next privacy debate won’t be about what an app collects. It’ll be about where your AI thinks.”
— — TheMurrow Editorial
What “on-device AI” actually means (and why it’s suddenly everywhere)
The reason it’s having a moment is not purely philosophical. Hardware finally crossed a threshold that platform companies can market and developers can rely on. Microsoft’s documentation for Windows AI features describes many experiences as requiring an NPU capable of 40+ TOPS, a spec now treated as foundational for Copilot+ PCs. That’s not a vague aspiration; it’s a product category built around a measurable capability. (Microsoft’s Copilot+ messaging similarly foregrounds 40+ TOPS as a headline requirement.)
Apple has taken a parallel route, but with a different emphasis. In June 2024, Apple described on-device processing as the “cornerstone” of Apple Intelligence, framing local inference not as a performance trick but as a privacy posture. Google, for its part, has highlighted Gemini Nano powering Android features—especially security and accessibility functions—that can work locally and, in some cases, offline.
The consumer-facing benefits (when it works)
- Speed: Local processing can reduce round trips to a server, making features feel immediate.
- Privacy: Sensitive inputs—audio snippets, screenshots, personal documents—don’t need to leave the device for every task.
- Reliability: Some features can function with limited connectivity, which matters more than people admit until they’re traveling.
The caveat is embedded in the phrase “when it works.” Smaller on-device models can’t always match the breadth of the largest cloud models. That limitation is shaping the next phase of AI product design.
The 40 TOPS era: NPUs become first-class citizens
That single figure has done something rare in consumer computing: it’s created a shared spec target across an ecosystem that usually resists standardization. OEMs can build a sticker around it. Developers can build features with the assumption that a certain class of silicon exists. Buyers get an easy metric, even if it’s an imperfect one.
Chipmakers have responded accordingly. AMD has positioned its Ryzen AI 300 and PRO 300 series as exceeding Copilot+ requirements, citing 50+ NPU TOPS in its CES announcements and positioning that performance as a reason to treat AI as a native workload. Press coverage around Qualcomm Snapdragon X laptops often repeats the same anchor—~45 TOPS NPUs—because it maps neatly onto Microsoft’s certification narrative. Reporting on Intel’s Lunar Lake has similarly emphasized ~45 TOPS from the NPU as part of “next-gen AI PC” expectations, even as the details often arrive via tech press rather than a single definitive consumer datasheet.
“TOPS became a spec because Microsoft needed a number the market could rally around.”
— — TheMurrow Editorial
What the “AI PC” label really signals
- The NPU isn’t a checkbox; it’s a scheduling target for AI workloads.
- The OS and apps are expected to run some inference locally as a normal behavior.
- The device is marketed as more personal—because it can do more without a server.
That vision still has friction: software maturity varies, and cross-vendor performance comparisons remain messy. But the direction is set.
TOPS is a headline spec—and an imperfect one
AMD’s own footnotes acknowledge that TOPS varies by model, software, and configuration. That’s a polite way of saying two chips with similar TOPS figures may behave differently once thermal limits, memory bandwidth, and software optimization show up to the party.
Apple’s marketing demonstrates the comparison problem in a different way. Apple states the M4 Neural Engine delivers 38 TOPS and frames it around efficient, private inference. Meanwhile, the Windows ecosystem often treats 40+ TOPS as a categorical requirement. The numbers sit near each other, but the platforms are not symmetrical: they differ in model deployment strategies, OS integration, and how much work is expected from the NPU versus CPU/GPU.
How readers should interpret TOPS
- TOPS tells you the device is aimed at local inference. It’s a signal of intent as much as capability.
- TOPS alone won’t predict your experience. Software support and OS-level integration can matter more than raw peak throughput.
- Cross-brand comparisons are hazardous. Treat them as directional, not definitive.
“TOPS can tell you a chip is trying. It can’t tell you whether the software will.”
— — TheMurrow Editorial
The industry’s rush to advertise TOPS is understandable. The danger is that buyers treat it as the only number that matters, repeating a familiar mistake from the early days of megapixels and GHz.
Privacy becomes a product feature—by design, not accident
Apple has made privacy the central argument. In June 2024, Apple described on-device processing as the cornerstone of Apple Intelligence, claiming it can deliver personal intelligence without collecting users’ data (Apple’s phrasing). The company’s message is not subtle: the safest data is the data that never leaves your device.
But Apple also acknowledges the limits of local compute. Some requests are too large or complex for small models running on a phone or laptop. For those tasks, Apple routes processing to Private Cloud Compute (PCC)—a hybrid design meant to preserve privacy properties while expanding capability.
Apple’s Private Cloud Compute: a hybrid model with auditable claims
- Only data relevant to the request is sent.
- Data is not stored and is not accessible to Apple.
- Independent experts can inspect and verify the code running on PCC servers.
Those claims are notable because they treat privacy as something that can be evaluated, not merely promised. Verification language—inspection by independent experts—signals that Apple expects skepticism and is trying to meet it with process.
A fair reading includes competing perspectives. Skeptics will point out that “privacy” is not a binary. Even a well-designed system can be misunderstood by users, misconfigured by developers, or undermined by future policy choices. Still, PCC represents a concrete attempt to reconcile powerful models with privacy expectations—a design pattern others are likely to borrow.
Key Insight
The new software bargain: local by default, cloud when it counts
Microsoft’s approach is ecosystem-driven. By anchoring the Copilot+ category to 40+ TOPS NPUs, Microsoft is encouraging developers to assume a baseline of local capability in new PCs. That matters because it shifts cost and latency away from cloud infrastructure, while also reducing how often personal context needs to be transmitted.
Practical implications for readers
- Offline usefulness becomes real again. Features that work without a connection matter on planes, in rural areas, or during outages.
- Sensitive workflows feel less risky. Summarizing a document or processing a screenshot locally is a different privacy proposition than uploading it by default.
- Battery and thermals become part of the AI story. A device can have impressive peak TOPS and still disappoint if sustained workloads drain the battery or throttle performance.
The trade-off is that local models may be narrower: great for quick transforms, recognition tasks, and structured assistance; less great for expansive, open-ended reasoning. The smartest products will be honest about which mode is in use and why.
Editor’s Note
Real-world examples: where on-device AI actually earns its keep
Google’s Android messaging around Gemini Nano highlights a class of applications where local inference matters: real-time protection and accessibility features that can work offline and locally. Those are not novelty features. Security tools benefit from immediacy, and accessibility tools benefit from reliability. If connectivity is required for every interaction, the tools fail precisely when they’re most needed.
Apple’s case study is the privacy architecture itself. Apple’s June 2024 statements about on-device processing and PCC are not just about performance; they’re about enabling personalization without normalizing data collection. The company is making a bet that users will increasingly ask: “Where did my data go?” before they ask: “How clever was the answer?”
On Windows, the case study is the hardware category. Copilot+ PCs exist to prove that local AI isn’t limited to phones. Microsoft’s 40+ TOPS NPU baseline is an attempt to guarantee that a Windows laptop can run modern AI experiences without leaning on the cloud for every action.
A grounded way to evaluate “on-device” claims
- Does the feature work without an internet connection?
- Is the model running locally for the entire task, or only parts of it?
- If the cloud is used, is the system explicit about what is sent and what is retained?
On-device AI earns trust when companies answer those questions clearly, in plain language, and in settings users can verify.
On-device AI: a quick verification checklist
- ✓Does the feature work without an internet connection?
- ✓Is the model local for the entire task, or only partial?
- ✓If the cloud is used, what is sent—and what is retained?
The competitive race: performance, privacy, and the next spec escalation
The pressure is predictable. Once a market adopts a single metric, companies compete to exceed it. The risk is also predictable: performance marketing can outrun practical value. More TOPS won’t automatically deliver better user experiences if developers can’t easily deploy models across different NPUs, or if key features remain gated by cloud services.
A second risk is conceptual. Privacy can become another spec—something you “have” rather than something you continuously prove. Apple’s move to invite independent inspection of PCC code is significant precisely because it treats privacy as verifiable. Other vendors will need their own equivalent of that posture if they want privacy to be more than a slogan.
The optimistic view is that competition will improve both performance and privacy. The realistic view is that readers should expect a messy transition period: inconsistent feature availability, unclear “on-device” labeling, and devices that advertise AI readiness without delivering meaningful local workflows.
Key Insight
Conclusion: The device is becoming the place where AI lives
On-device AI narrows those trade-offs. Microsoft’s 40+ TOPS Copilot+ baseline, Apple’s on-device-first posture with Private Cloud Compute, and Google’s emphasis on local Gemini Nano features all point to the same future: more inference will happen where the user is, not where the server is.
The open question is not whether on-device AI will grow. The open question is whether companies will communicate its boundaries honestly—when data stays local, when it doesn’t, and why. Readers don’t need marketing promises. Readers need clarity, because the most personal computer has always been the one that knows the most about you.
“Convenience used to be the reason we accepted the cloud. Privacy is becoming the reason we demand the device.”
— — TheMurrow Editorial
Frequently Asked Questions
What is on-device AI in plain terms?
On-device AI means the AI model runs on your phone, tablet, or PC, processing your input locally rather than sending it to a cloud server for inference. That can reduce latency and can improve privacy because text, audio, images, or screen content may not need to be uploaded for processing. Many products still use a hybrid approach for larger tasks.
What is an NPU, and why does it matter?
An NPU (neural processing unit) is a dedicated accelerator designed to run AI inference efficiently. Microsoft’s Copilot+ PC guidance treats NPU performance as central, with many Windows AI experiences described as requiring 40+ TOPS. The practical benefit is that more AI workloads can run locally without relying on the cloud for every interaction.
What does “40 TOPS” mean, and should I shop by it?
TOPS stands for trillions of operations per second, a measure often used to describe peak AI throughput. Microsoft uses 40+ TOPS as a baseline for Copilot+ class PCs. TOPS can indicate whether a device targets local AI seriously, but it’s not a perfect comparison tool across brands because vendors may measure differently and quote peak numbers.
Is Apple’s “Private Cloud Compute” still the cloud?
Yes—Private Cloud Compute (PCC) is cloud processing for tasks too large for on-device models. Apple says only relevant data is sent, the data is not stored, and it is not accessible to Apple. Apple also says independent experts can inspect and verify code running on PCC servers. It’s a hybrid approach meant to extend capability without adopting conventional cloud data practices.
Does on-device AI mean my data never leaves my device?
Not automatically. Some systems run locally by default but may use cloud processing for complex requests. Apple explicitly describes this split: on-device when possible, PCC when needed. The best way to evaluate a product is to look for clear disclosures about when cloud processing is used, what data is sent, and whether anything is stored.
Are Apple’s TOPS numbers comparable to Windows AI PC TOPS numbers?
Only loosely. Apple states the M4 Neural Engine is 38 TOPS, while Microsoft frames Copilot+ PCs around 40+ TOPS NPUs. Differences in measurement precision, what’s included in the performance figure, and software deployment strategies make direct comparisons unreliable. Treat TOPS as a directional indicator, then look for real feature support and practical reviews.















