The Quiet Revolution: On-Device AI, Everywhere

Q: What is an NPU, and why does it matter?

An **NPU (neural processing unit)** is a chip component designed to run AI workloads efficiently. It can perform the math behind neural networks using less power than a CPU or GPU for many tasks. NPUs are a major reason always-on features—like real-time transcription or message classification—are becoming feasible on laptops and phones without destroying battery life.

Q: What does “40+ TOPS” mean on Copilot+ PCs?

**TOPS** stands for **trillions of operations per second**, a measure of peak theoretical AI throughput. Microsoft introduced **Copilot+ PCs** on **May 20, 2024**, defining them in part by requiring an NPU capable of **40+ TOPS**. Microsoft also said initial Copilot+ PC availability would begin **June 18, 2024**. TOPS is useful for rough comparison but doesn’t guarantee real-world performance.

Q: What is Apple’s Private Cloud Compute (PCC)?

**Private Cloud Compute** is Apple’s system for handling requests that exceed what can be done on-device. Apple says PCC runs on **custom Apple-silicon server nodes** and uses security features including **Secure Enclave** and **Secure Boot**. Apple also claims requests are **end-to-end encrypted** to the compute nodes after device attestation, and it has pledged “verifiable transparency” by publishing PCC software images and measurements for independent inspection.

AI is moving closer to your most sensitive data—messages, photos, calls—by running directly on your phone and laptop. When it can’t, “private AI clouds” promise proof they aren’t snooping.

By TheMurrow Editorial

February 23, 2026

The Quiet Revolution: On-Device AI, Everywhere

Key Points

1Follow the shift: AI is moving on-device to access sensitive data with less exposure, while cloud escalation becomes a deliberate, auditable choice.
2Watch the hardware arms race: NPUs and TOPS (40+ on Copilot+ PCs) market “AI-ready” devices, but real performance depends on more than peak throughput.
3Demand verifiable privacy: hybrid systems hinge on transparency—when data leaves your device, what the cloud can retain, and whether claims can be independently checked.

Your next phone call might be judged by a machine—on your phone, in real time, without ever leaving your pocket.

That sounds like a privacy nightmare until you notice the twist: the industry’s newest “trust us” strategy isn’t a bigger data center. It’s smaller models running locally—and, when they can’t, a new kind of cloud designed to prove it isn’t snooping.

Over the last two years, the biggest names in consumer tech have converged on the same idea: the most valuable AI features require access to the most sensitive data—your messages, photos, calendars, and voice. If AI is going to be personal, it has to be close to you. And if it’s close to you, it can be on-device.

A quiet arms race is now underway, measured in an awkward new unit: TOPS—trillions of operations per second. Microsoft says you need 40+ TOPS to qualify as a new class of “Copilot+ PC.” Qualcomm boasts 45 NPU TOPS in Snapdragon X-series systems. Apple is building a “private cloud” on custom silicon and promising outsiders can inspect what runs there. Google is pushing Gemini Nano onto Pixel phones and framing local inference as privacy by design.

The marketing is loud. The architectural shift is real. The question is whether it earns the trust it asks for.

“The most personal AI is the kind that can afford to stay on your device—and only asks the cloud for help when it must.”
— — TheMurrow Editorial

What “on-device AI” actually means (and what it doesn’t)

On-device AI means AI “inference”—generating outputs from a trained model—happens on your hardware: your phone, laptop, or tablet. In some cases, small amounts of personalization can also happen locally, but the core idea is practical: your request doesn’t automatically travel to a remote data center.

That’s the clean definition. Consumer reality is messier.

Most companies now sell a hybrid story. Simple or sensitive tasks run locally; harder ones “escalate” to some form of cloud. Routing depends on a few factors:

Why tasks run locally—or get escalated

✓Complexity: large models still do better at certain tasks.
✓Latency: local can be faster, especially offline.
✓Cost: cloud inference is expensive at scale; local inference shifts cost to the device.
✓Capability: not every device can run the same model well.

The reason “on-device” is suddenly everywhere has as much to do with product strategy as it does with technology. Apple, Google, and Microsoft are all trying to build AI that feels personal: summaries drawn from your notes, contextual help tied to your calendar, search across your photos, transcription of your meetings. Those features are only compelling if they can see sensitive data. Sensitive data invites scrutiny.

Privacy claims have moved from compliance boilerplate to a competitive feature. Regulatory pressure and reputational risk amplify the shift. When users worry they’re feeding private conversations into a distant black box, local inference becomes a selling point—and a design constraint.

The promise: privacy, speed, and offline usefulness

Local inference can reduce the amount of data leaving your device, cut network delays, and enable features in low-connectivity environments. On a phone, it can also support always-on intelligence without constant server calls—if hardware can handle the workload efficiently.

The fine print: “on-device” is often a routing decision

Marketing often treats “on-device” as a binary. Engineering treats it as a policy: what runs locally, what runs in the cloud, and what data is permitted to travel. The trust question is less “Is it on-device?” and more “When is it not—and what happens then?”

NPUs, TOPS, and the new hardware layer you didn’t ask for

The sudden ubiquity of on-device AI is inseparable from a new baseline component: the NPU (neural processing unit). Unlike CPUs and GPUs, NPUs are optimized for the matrix math behind modern neural networks, targeting higher efficiency—especially important on battery-powered devices.

Microsoft’s latest push makes the trend explicit. On May 20, 2024, Microsoft introduced Copilot+ PCs, a category of Windows machines defined in part by an NPU threshold: “40+ TOPS” of NPU compute. Microsoft also said the first wave would be available June 18, 2024, led by Surface and major OEM partners. In the same announcement, partner devices using Snapdragon X-series were associated with claims around 45 NPU TOPS.

Those numbers are now part of a consumer-facing narrative: a laptop is “AI-ready” if it clears a TOPS bar.

“TOPS is the new gigahertz—useful as a signal, dangerous as a shortcut.”
— — TheMurrow Editorial

40+ TOPS

Microsoft’s NPU compute threshold used to define the Copilot+ PC category (announced May 20, 2024).

45 NPU TOPS

Qualcomm’s claimed NPU throughput for Snapdragon X-series systems referenced in Microsoft’s Copilot+ PC partner wave.

Why TOPS matters—and why it misleads

TOPS (trillions of operations per second) is a peak-throughput figure. It’s easy to market and simple to compare. It’s also incomplete.

Real-world AI “feel” depends on more than raw throughput:

What TOPS doesn’t capture

✓Memory bandwidth: models are data-hungry; moving weights matters.
✓Optimization: quantization and compilation can change latency dramatically.
✓Software stack: drivers, runtimes, and model support define what runs well.
✓Thermals and power limits: peak TOPS may not be sustainable.

In other words, TOPS can hint at potential, but it doesn’t guarantee that a given model will run smoothly—or that the features you want will arrive on your specific device.

Why the NPU arms race is happening now

Premium devices now routinely ship with NPUs, making AI features possible without wrecking battery life. That capability invites new product ideas—ones that, by their nature, touch private data. The NPU isn’t just hardware; it’s a permission slip for product teams.

On phones, “small models” are doing real work—quietly

If you want proof that on-device AI isn’t just a laptop story, look at Google’s Pixel strategy. Google has been expanding Gemini Nano, its on-device model line, across more phones and more features.

In the June 2024 Pixel feature drop, Google expanded Gemini Nano availability to Pixel 8 and Pixel 8a, initially as a developer option. The same update highlighted that Pixel Recorder’s “Summarize in Recorder” can run on-device on Pixel 8/8a, and that Recorder summaries and transcripts gained export options.

That’s not a demo; it’s a daily utility. Meeting summaries and audio transcription are exactly the sort of features people hesitate to send to a server—because the content can be intimate, legally sensitive, or both.

Case study: scam detection that stays on your phone

Google’s March 2025 Pixel Drop went further into privacy-sensitive territory with Scam Detection for calls and Google Messages. Google described it as powered by on-device AI (Gemini Nano), explicitly framing the design as a way to keep conversations private to the user.

The key point isn’t only that the phone can flag suspicious patterns. It’s that the product argument hinges on locality: detection without shipping your private conversation to a remote model.

“The moment AI starts reading your messages, privacy stops being a policy document and becomes an architecture.”
— — TheMurrow Editorial

The trade-offs users will actually feel

On-device models tend to be smaller, which can mean:

- More limited reasoning depth compared to large cloud models
- Better responsiveness for certain tasks (especially short text, classification, summarization)
- More predictable privacy behavior—if the system truly stays local

In practice, the best experiences often blend local and cloud. The crucial question is whether the system is honest about what runs where, and what data leaves your device when it doesn’t run locally.

Apple’s two-tier approach: on-device first, then Private Cloud Compute

Apple has made the boldest attempt to turn AI privacy into a product architecture you can point at. On June 10, 2024, Apple framed Apple Intelligence as primarily on-device, with requests escalated to Private Cloud Compute (PCC) when larger models are needed. Apple’s public posture: only the data that’s relevant should be sent, it should not be stored, and it should not be accessible to Apple.

Apple’s security materials go deeper. In its Private Cloud Compute technical write-up, Apple describes PCC as built on custom Apple-silicon server nodes and security mechanisms including Secure Enclave and Secure Boot, with a hardened operating system designed to reduce the attack surface. Apple also claims requests are end-to-end encrypted to PCC nodes after the device validates and attests the node’s software. Supporting services such as load balancers sit outside the trust boundary and should not be able to decrypt requests.

“Verifiable transparency” as a trust strategy

Apple also made an unusually strong pledge for a cloud AI system: “verifiable transparency.” The company says it will publish production PCC software images and measurements in a transparency log, enabling independent inspection.

That matters because “trust us” is no longer enough. Users and regulators increasingly want mechanisms to verify what’s happening. A transparency log doesn’t solve every risk, but it changes the terms of the debate: Apple is inviting outside scrutiny of the code running on the servers handling sensitive AI requests.

Independent reporting has emphasized the same design philosophy. WIRED described PCC servers as intentionally minimal—highlighting the lack of persistent storage and a “start fresh on reboot” posture via cryptographic erasure mechanisms. The specific security posture reinforces Apple’s thesis: when the cloud is necessary, it should resemble an extension of the device’s security model, not a typical multi-tenant AI service.

The perspective check

Apple’s approach is ambitious, but it also sets a high bar for clarity. Users will still want to know which requests go to PCC, what metadata is generated, and how often escalation happens. Even the best architecture benefits from plain-language controls and understandable defaults.

Key Insight

“On-device” isn’t a slogan; it’s a routing policy. The trust test is knowing when escalation happens—and what the cloud is allowed to do.

The emerging compromise: “private AI cloud” becomes the new normal

On-device AI is powerful, but it is not omnipotent. Larger models can handle longer context windows, more complex reasoning, and richer generative tasks. Battery, thermals, and RAM still impose limits.

So the industry is building a compromise: a “private AI cloud.” Apple’s Private Cloud Compute is the most defined example in the public record provided here, but the direction is broader. Reporting in late 2025 indicates Google is introducing “Private AI Compute,” positioned as analogous to Apple’s PCC—an implicit acknowledgement that the future is not purely local or purely cloud.

The hybrid model is now becoming the default story:

- Run what you can on-device
- Escalate what you must to a hardened, privacy-scoped cloud
- Promise (and ideally prove) the cloud behaves more like a secure enclave than a data-mining platform

What readers should watch for in “private cloud” claims

Not all “private” clouds are equal. A serious privacy architecture tends to include some combination of:

- Strong isolation and attestation (proving what code is running)
- Encryption where intermediary systems cannot read payloads
- Minimal retention and a defensible story on logs and debugging
- Narrowly scoped access to only relevant user data

Apple’s PCC claims cover several of these elements explicitly, including attestation and end-to-end encryption to the compute nodes. Those details create accountability pressure on competitors: once one company describes a specific mechanism, hand-wavy privacy slogans look thin.

Editor's Note

When evaluating “private AI cloud” marketing, look for concrete mechanisms (attestation, encryption boundaries, retention limits), not just promises.

What on-device AI changes for everyday users (and what it doesn’t)

The practical upside of on-device AI is not abstract “privacy.” It’s a bundle of day-to-day qualities that you can feel—sometimes without noticing why.

### Practical takeaways: where on-device AI helps most
On-device AI tends to shine in scenarios that are:

- Latency-sensitive: instant transcription or summarization
- Connectivity-poor: features that still work on airplanes or subways
- Privacy-sensitive: messages, calls, photos, personal notes
- Always-on: spam/scam detection, quick classifiers, accessibility tools

Google’s on-device scam detection is a clear example of privacy and immediacy aligning with usefulness. Apple’s on-device-first positioning targets a similar intuition: assistants become valuable when they can reference personal context, but personal context is exactly what users don’t want to upload casually.

The limits: small models, selective features, and uneven availability

On-device models are typically constrained. Even when the hardware exists, features may roll out selectively. Google’s Gemini Nano expansion to Pixel 8 and 8a began as a developer option in June 2024—a reminder that availability is as much product management as it is capability.

On PCs, Microsoft’s 40+ TOPS Copilot+ requirement introduces a new fault line: devices are implicitly separated into “AI-native” and “legacy,” even when they’re perfectly capable computers. That may accelerate upgrades, but it also risks confusing consumers who assume AI features are software updates rather than hardware-gated experiences.

Privacy isn’t automatic

On-device reduces exposure, but it doesn’t eliminate risk. A compromised device is still compromised. And hybrid systems still send some requests outward. The meaningful question for users becomes: Can you tell when your data leaves the device, and can you control it?

June 18, 2024

Microsoft’s stated date for the first wave of Copilot+ PC availability following the May 20, 2024 announcement.

The trust problem: architecture is necessary, transparency is decisive

Tech companies are treating privacy as a differentiator because users are treating AI as a suspicion engine. That’s rational. AI features, by design, ingest the kinds of content people have historically kept private: conversations, schedules, photos, voice.

Apple’s PCC transparency log pledge is notable because it treats trust as something you can audit, not merely accept. Google’s framing of scam detection emphasizes that conversations remain private by staying on-device. Microsoft’s NPU baseline for Copilot+ suggests a future where local inference is expected, not exceptional.

These are distinct strategies, but they rhyme. All three point toward a world where consumer AI is judged on a new set of standards:

- How often can it run locally?
- What triggers escalation to the cloud?
- What is the cloud allowed to remember?
- Can outsiders verify the claims?

The next phase will likely be less about who has the flashiest chatbot and more about whose systems can credibly say: your life is not training data, and your private context is not a product.

The odd outcome is that privacy—long treated as a legal constraint—is becoming a design aesthetic. The best AI won’t only be smart. It will be discreet.

About the Author

TheMurrow Editorial is a writer for TheMurrow covering technology.

Frequently Asked Questions

What is on-device AI, in plain English?

On-device AI means your phone or computer runs the AI model directly on its own hardware, rather than sending your request to a remote server. Most consumer systems still use a hybrid approach: some tasks run locally, while harder ones are sent to a cloud service. The main benefit is reducing what leaves your device, which can improve privacy and responsiveness.

What is an NPU, and why does it matter?

An NPU (neural processing unit) is a chip component designed to run AI workloads efficiently. It can perform the math behind neural networks using less power than a CPU or GPU for many tasks. NPUs are a major reason always-on features—like real-time transcription or message classification—are becoming feasible on laptops and phones without destroying battery life.

What does “40+ TOPS” mean on Copilot+ PCs?

TOPS stands for trillions of operations per second, a measure of peak theoretical AI throughput. Microsoft introduced Copilot+ PCs on May 20, 2024, defining them in part by requiring an NPU capable of 40+ TOPS. Microsoft also said initial Copilot+ PC availability would begin June 18, 2024. TOPS is useful for rough comparison but doesn’t guarantee real-world performance.

Is TOPS a reliable way to compare AI devices?

Not by itself. TOPS is a peak number and doesn’t capture memory bandwidth, software optimization, power limits, or whether your preferred models are supported. Two devices with similar TOPS can feel very different in practice. Treat TOPS like a starting point—then look for evidence of actual features that run locally and how well they behave.

What is Apple’s Private Cloud Compute (PCC)?

Private Cloud Compute is Apple’s system for handling requests that exceed what can be done on-device. Apple says PCC runs on custom Apple-silicon server nodes and uses security features including Secure Enclave and Secure Boot. Apple also claims requests are end-to-end encrypted to the compute nodes after device attestation, and it has pledged “verifiable transparency” by publishing PCC software images and measurements for independent inspection.

Does on-device AI mean companies never see your data?

Not necessarily. Many products use hybrid routing, so some requests may still go to cloud systems depending on complexity and capability. The meaningful questions are when escalation happens, what data is transmitted, whether it’s encrypted end-to-end to the compute environment, and whether the provider can store or access it. Apple and Google have both framed local processing as a privacy advantage, but users should still look for clear disclosures and controls.

More in Technology

Technology·May 21

5 Billion Passkeys Are ‘In Use’—So Why Are People Still Getting Phished? The Sync Detail That Quietly Recreates a Password Problem

Passkeys can kill credential phishing at the login prompt—yet synced backups, recovery flows, and legacy sign-ins can quietly reopen the side doors. The milestone is real; the risk just moved.

Technology·May 17

Your ‘AI Watermark’ Probably Doesn’t Survive a Screenshot — The Provenance Trick Platforms Are Betting On Instead (and why it can still fail)

Most “AI watermarks” aren’t in the pixels—they’re cryptographically signed provenance attached to a file. Screenshots and platform re-encodes don’t break the label; they bypass it.

Technology·May 11

12,000 AI ‘Tool Servers’ Are Plugged Into Chatbots Now—Here’s the Security Assumption Almost Every Company Is Getting Wrong

MCP turns chatbots into operators that can touch files, secrets, and production systems. The mistake: treating tool servers like harmless plugins instead of a privileged part of your perimeter.

Technology·Apr 30

149 Million Passwords Leaked in January 2026—So Why Are ‘Passkeys’ Still Losing the Security War Inside Your Company?

The “149 million passwords” headline wasn’t a Big Tech breach—it was a cloud-exposed credential cache that shows why passwords still dominate enterprise logins despite passkeys.

Technology·Apr 12

Microsoft’s May 1 ‘Agent 365’ Launch Isn’t the Big Risk—It’s the New ‘Prompt Traffic’ Layer That Can Leak Your Company in One Click

Microsoft is shipping agent governance and agent acceleration on the same day. The bigger risk isn’t hallucinations—it’s the new “prompt traffic” layer where context, tool outputs, and clickable actions can turn one approval into a company-wide spill.

Technology·Apr 1

Your ‘Secure’ Website Might Be Six Months From Randomly Going Dark — The 200‑Day TLS Certificate Rule That Starts a Renewal Stampede

When HTTPS certificates expire, browsers don’t “degrade”—they block. Starting March 15, 2026, the new 200‑day cap turns sloppy renewals into recurring, client-side “outages.”

Technology·Mar 27

Your ‘AI agent’ can’t tell a helpful tool from a trap—here’s the tiny metadata lie that can drain your bank account

As soon as an agent can call tools—APIs, plugins, browser controls—attackers can hide instructions in “harmless” tool metadata. The transcript will look competent, right up until money moves.

Technology·Mar 10

AI Code Isn’t “More Buggy”—It’s More Trusted. That’s Why 2026’s Next Mega‑Breach Will Start in Your Dependency Tree.

The breach risk isn’t “AI wrote bad code”—it’s that AI makes unreviewed change feel safe, especially when it quietly rewires your dependency tree under deadline pressure.

Opinion·May 24

Trump Says an Iran Deal Is Coming ‘Shortly.’ Here’s the Catch: A Hormuz ‘Victory’ Could Lock In $5 Gas for Months—and Make Washington Call It Peace

A ceasefire headline can move markets in hours, but safe, routine shipping through Hormuz is rebuilt on the water—via mine-clearing, insurance repricing, and proven transit. That lag is where $5 gas can stick even after Washington declares “peace.”

Reviews·May 23

Apple’s App Store Now Shows AI ‘Review Summaries’—Here’s the 3-Star Pattern They Can’t See (and the $9.99 Trap It Hides)

Apple is elevating an AI-written paragraph above the review pile—turning messy human feedback into a single, authoritative voice. That convenience can also smooth extremes, amplify manipulation, and quietly reshape what shoppers tolerate and what developers get blamed for.

Style & Fashion·May 23

That ‘Sustainable’ QR Code on Your Shirt Isn’t for You — It’s for EU Auditors (and it could quietly kill “mystery fabrics” in resale by July 2026)

Fashion’s QR code moment isn’t a marketing perk—it’s the EU’s compliance gateway for inspectors, repairers, sorters, and recyclers. And the most-cited deadline (July 2026) is widely misunderstood.

Lifestyle·May 23

America’s New Diet Guideline Dodged Two Words — ‘Ultra-Processed.’ Here’s the Label Trick Food Brands Use Instead (and how to spot it in 10 seconds)

The 2025–2030 Dietary Guidelines deliver their harshest warning yet about industrial food—while strategically avoiding the term “ultra-processed.” That word swap changes what can be defined, enforced, marketed against, or litigated.

Explainers·May 23

The ‘Right to Repair’ Isn’t About Screws—It’s About Software Keys: The 3 Words in 2026 Contracts That Decide Whether You Own Your Gear (or Just Rent It Forever)

The modern repair barrier isn’t the casing—it’s the device deciding, in software, whether your replacement part is “legitimate.” The next fight is over pairing tools, calibration utilities, and “licensed, not sold” terms that turn ownership into permission.

$Sweetgreen Is Dumping “Seed Oils.” Here’s the Frying-Fat Math Nobody Does (and the one switch that can quietly raise your saturated fat by 2–3×).$

Food & Recipes·May 23

Sweetgreen Is Dumping “Seed Oils.” Here’s the Frying-Fat Math Nobody Does (and the one switch that can quietly raise your saturated fat by 2–3×).

Sweetgreen’s EVOO shift started as a precise kitchen policy—then morphed into “seed oil-free” marketing. The catch: what counts as “seed oil-free” depends on the component, and oil swaps change trade-offs rather than erasing them.

Entertainment·May 23

The ‘Licensed AI Music’ Era Is Here — But the Part Everyone Gets Wrong Is Who Actually Gets Paid (and why the lawsuits won’t settle it)

“Licensed” doesn’t mean “fair”—it means someone in the chain signed paperwork. The real story is which rights were licensed, who collects first, and what users lose when platforms go legit.

Trends·May 22

Google’s AI Overviews Didn’t ‘Steal’ Your Clicks — The New Metric Brands Are Quietly Buying Instead (and why it rewires what “going viral” even means in 2026)

AI Overviews don’t have to “steal” traffic to break the old web bargain. When Google answers first and links second, brands start paying for inclusion, recall, and authority—not clicks.