The 2026 On‑Device AI Reset

Q: What does “on-device AI” actually mean?

On-device AI generally means **inference runs locally** on your phone or PC, rather than sending your request to a server for processing. Some products use the term loosely, so verify whether the core model runs on-device or whether the device only does preprocessing before sending data to the cloud.

Q: Why do two phones with similar chips get different AI features?

On-device generative AI is often **resource-bound**, especially by **RAM**. Reporting around Pixel devices highlighted differences like **8GB vs 12GB RAM** affecting Gemini Nano availability, and The Verge reported the Pixel 9A uses a lighter Gemini Nano variant due to **8GB RAM**. Vendors may gate features to preserve speed and battery life.

Q: What’s the simplest way to tell if an AI feature is truly local?

Test for **offline functionality** and read the feature’s technical description. If a feature works reliably without a network connection, it’s more likely running on-device. If it fails or degrades sharply offline, it’s probably cloud-backed or hybrid. Supported-device lists and explicit “runs on-device” language are also strong signals.

The biggest AI change isn’t a smarter chatbot—it’s where the intelligence lives. Here’s how to buy hardware, choose apps, and set privacy for the local era.

By TheMurrow Editorial

January 6, 2026

Key Points

1Track where AI runs: on-device inference cuts latency and cost, but “local” doesn’t guarantee privacy without clear data controls.
2Shop beyond TOPS: 40+ TOPS signals Copilot+ class, yet real performance depends on memory, software paths, and app compatibility.
3Expect tiered features: RAM and supported-device lists increasingly gate models and capabilities, especially on phones and “lite” variants.

The most consequential AI upgrade of 2026 isn’t a smarter chatbot. It’s where the intelligence lives. Technology coverage

For the last few years, “AI” mostly meant cloud-first systems: you asked, a server answered. The pitch was scale—bigger models, faster iteration, new features every week. The hidden costs were latency, dependency, and a growing bill for inference that someone had to pay.

Now device makers are trying a different bargain: move more of that work onto your laptop and phone. Not because of a sudden moral awakening about privacy, but because on-device AI is often faster, cheaper to run over time, and easier to sell as “yours.”

“The 2026 reset isn’t philosophical. It’s a cost-and-latency calculation that happens to look like a privacy story.”
— — TheMurrow Editorial

If you’ve been watching the marketing language—“on-device,” “local,” “edge,” “private AI,” “personal intelligence,” “AI PC,” “Copilot+ PC”—you’ve already seen the scramble to name this shift. The labels are proliferating faster than the clarity. Consumers are being asked to buy hardware on promises they can’t easily verify, using metrics they don’t fully control.

The 2026 “reset”: from cloud-first AI to local intelligence

The move toward local intelligence is pragmatic. Three forces are pushing the industry in the same direction: latency, cost, and compliance.

Latency is the obvious one. On-device inference can be instant and can keep working in unreliable connectivity or offline modes. That matters less for novelty chat and more for everyday tools—searching a document archive on a plane, live captions in a noisy room, or summarizing notes in a meeting where Wi‑Fi fails.

Cost is less visible but more decisive. Cloud inference is expensive; it requires ongoing server capacity and energy, and it scales with usage. Moving inference onto devices shifts cost to the hardware purchase and amortizes it over a laptop’s or phone’s lifetime. Vendors like the economics because they can sell silicon upgrades rather than subsidize every prompt.

Privacy and compliance are the most complicated drivers. Fewer data transfers can reduce exposure, and regulators and enterprises are increasingly sensitive to data residency and retention. Still, readers should treat “local” as a routing choice, not a guarantee of privacy. A device can run inference locally while still logging prompts, syncing outputs, or uploading telemetry. more explainers

A vocabulary worth learning before you buy

Marketing is collapsing important distinctions. For practical decision-making, three categories matter:

- On-device inference: the model runs locally on your device.
- On-device preprocessing: data is processed locally but then sent to the cloud for the main result.
- Private cloud / confidential compute: data is sent to the cloud, but with hardware-backed protections designed to reduce exposure.

Those distinctions determine speed, offline capability, and risk. They also determine what you’re actually paying for when a device advertises “AI built in.”

Key Takeaway

Treat “local” as a routing and execution choice—not a blanket privacy promise. Ask what runs on-device, what goes to the cloud, and what gets logged.

The new baseline spec: NPUs and the seductive math of TOPS

A new consumer metric is taking center stage: NPU performance, commonly expressed in TOPS (trillions of operations per second). In May 2024, Microsoft introduced the Copilot+ PC category, describing Windows PCs “designed for AI” enabled by silicon capable of 40+ TOPS to run new Windows AI experiences locally, per Microsoft’s own announcement.

That threshold is already shaping the market. It tells OEMs what qualifies as “AI PC” in Microsoft’s ecosystem, and it tells consumers what they’re expected to shop for. The trouble is that TOPS is not a universal yardstick, even when the number is accurate.

Vendors may quote peak TOPS, and some will quote combined “platform TOPS” that add CPU, GPU, and NPU together rather than the NPU alone. Real performance depends on factors that don’t fit on a spec sticker: model size, quantization, memory bandwidth, and software paths such as Windows/DirectML, Apple Core ML, or Android NNAPI.

40+ TOPS

Microsoft’s baseline NPU capability for Copilot+ PCs, intended to enable new Windows AI experiences to run locally.

“TOPS is becoming the new megapixels: helpful in context, misleading in isolation.”
— — TheMurrow Editorial

Practical buyer guidance: what TOPS can and can’t tell you

TOPS can be a reasonable proxy for whether a device belongs to a modern AI class—especially when a platform (like Windows Copilot+) sets a floor. It does not reliably predict whether your preferred apps will run well, or whether a specific model will fit in memory.

When comparing devices, readers should ask two grounded questions:

1. Is the AI feature you care about documented as NPU-dependent? Some experiences can fall back to CPU/GPU; others won’t run without an NPU.
2. Is the device validated/marketed for that ecosystem (e.g., Copilot+ PC)? Certification language often indicates software support and drivers that raw specs can’t.

Two questions to ask before buying on “TOPS” alone

1.Is the AI feature you care about documented as NPU-dependent?
2.Is the device validated/marketed for that ecosystem (e.g., Copilot+ PC)?

Windows in 2026: Qualcomm, AMD, Intel—and the meaning of 40+ TOPS

On Windows laptops, the on-device AI race is becoming a three-way contest: Qualcomm, AMD, and Intel. Each is aligning around the Copilot+ expectation of 40+ TOPS, while trying to differentiate above it.

Qualcomm has been particularly aggressive about NPU headroom. CES 2026 coverage reported Snapdragon X2 Plus chips integrating an 80 TOPS NPU. Another CES 2026 report described an HP OmniBook Ultra 14 variant with an exclusive Snapdragon X2 Elite option featuring an 85 TOPS NPU. Those numbers are not subtle; they’re intended to make 40 TOPS feel like yesterday’s baseline.

AMD’s messaging is more about meeting the bar broadly across product lines. AMD’s Ryzen AI PRO 300 materials state its XDNA 2 NPU delivers 50+ NPU TOPS (up to 55 peak TOPS). Tom’s Hardware also reported a budget-oriented Ryzen AI 300-series part, the Ryzen AI 5 330, with a 50 TOPS NPU—evidence that AI-class NPUs are filtering down beyond premium systems.

Intel has tied its story directly to Copilot+. Intel has stated Lunar Lake will provide more than 40 NPU TOPS, describing that level as necessary for Copilot+ experiences.

80 TOPS

Reported NPU capability for Snapdragon X2 Plus chips in CES 2026 coverage—positioned as headroom well beyond the Copilot+ baseline.

85 TOPS

Reported NPU capability for an exclusive Snapdragon X2 Elite option in an HP OmniBook Ultra 14 variant—intended to make 40 TOPS feel like an older floor.

50+ TOPS

AMD’s stated XDNA 2 NPU capability in Ryzen AI PRO 300 materials (up to 55 peak), reflecting broader mid-market on-device AI performance.

The real question isn’t “who has more TOPS”—it’s “what breaks”

Bigger NPU numbers can future-proof devices for heavier local models and richer features. They can also be a distraction from the day-to-day realities of Windows computing.

Qualcomm’s Windows-on-ARM push, for example, raises a legitimate question for buyers: compatibility. Many mainstream apps will run, but performance and reliability can vary between native ARM builds and emulation—especially for niche enterprise tools. A reader choosing a laptop for specialized workflows should weigh NPU headroom against software certainty.

Intel and AMD benefit from x86 familiarity, but their marketing can create a different kind of confusion. Intel in particular often highlights combined CPU+GPU+NPU “platform TOPS,” which can make comparisons feel slippery. The cleanest consumer heuristic remains: does the device clearly qualify as Copilot+ PC, and do the AI features you want actually run locally on that system?

“In 2026, the smartest laptop isn’t the one with the biggest number. It’s the one that runs your tools without surprises.”
— — TheMurrow Editorial

Key Insight

For Windows buyers, “40+ TOPS” is only a floor. The practical differentiator is whether your core apps run reliably—native, not fragile or slow under emulation.

Phones and “tiered AI”: the Pixel lesson about RAM and feature gating

If laptops are teaching consumers to shop by NPU, phones are teaching a harsher lesson: same chip doesn’t mean same AI.

Google’s Pixel line offers a clear case study. Coverage around the Pixel 8 and Pixel 8 Pro highlighted that Google confirmed the Pixel 8 wouldn’t initially get Gemini Nano, citing “hardware limitations,” with reporting emphasizing the RAM difference—8GB vs 12GB. Later reporting suggested Gemini Nano toggles appearing for Pixel 8/8a via AICore developer options, reinforcing how on-device GenAI can be optional and resource-governed, not simply “included.”

More recently, The Verge reported the Pixel 9A runs a lighter Gemini Nano 1.0 XXS because of 8GB RAM, missing some features available on higher-RAM Pixels. That’s tiered AI in plain language: a family name on the box, and different realities under the hood.

What consumers should take from Pixel’s “hardware limitations”

The blunt takeaway: RAM is a policy lever. It determines whether a model can load comfortably, whether it can run alongside other apps, and whether a vendor is willing to risk performance complaints.

This is why device makers increasingly use “supported device lists,” feature matrices, and fine print. The AI is not merely a software update; it’s a resource allocation decision. Consumers should expect more of the following patterns:

- “Same processor” devices receiving different on-device model sizes
- “Lite” modes that preserve battery and responsiveness by shrinking capabilities
- Feature rollouts that arrive late—or not at all—on lower-memory configurations

For buyers, the practical implication is to treat AI features as hardware-bound entitlements, not general promises.

What “tiered AI” looks like in practice

✓“Same processor” devices receiving different on-device model sizes
✓“Lite” modes that preserve battery and responsiveness by shrinking capabilities
✓Feature rollouts that arrive late—or not at all—on lower-memory configurations

Apple’s approach: compatibility lists as the new contract

Apple has always managed expectations through controlled compatibility. With on-device AI, that control becomes even more central.

Apple’s Apple Intelligence strategy, as reflected in its compatibility list, underscores a point many consumers miss: you don’t buy “Apple Intelligence” in the abstract. You buy a specific supported device class, and the list functions like a contract.

That list-based approach has two consumer-friendly advantages. First, it reduces ambiguity: either your device is supported or it isn’t. Second, it encourages developers and users to align around a known performance envelope.

The tradeoff is obvious. A strict compatibility line can strand recent devices that feel “new enough,” and it can accelerate upgrade pressure. Still, Apple’s clarity exposes a broader truth across the industry: in a local intelligence era, the most honest AI promise is often a supported devices page, not an ad.

Privacy: local processing is not a blanket guarantee

Apple’s branding often leans into privacy, and local processing can reduce the need to transmit data. Yet the key consumer mindset should be consistent across platforms: local inference reduces data movement; it doesn’t automatically eliminate it. Sync, diagnostics, and hybrid workflows can still move information off-device depending on settings and the feature’s design.

The responsible way to evaluate any “private AI” claim is to ask where the model runs, what data leaves the device, and under what controls.

Editor’s Note

“Supported devices” pages are becoming the real consumer contract for on-device AI—clearer than ads, and more predictive than vague “private AI” branding.

“Private AI” and the marketing fog: how to ask better questions

The industry has discovered that “local” sounds virtuous. That’s why readers should approach terms like “private AI” with skepticism and specificity.

A product can legitimately claim local inference and still expose sensitive material through logging, cloud backups, or third-party integrations. Another product might run in the cloud but use confidential compute designs to reduce operator visibility. Both can be reasonable. Neither should be accepted on vibes. our opinion section

A checklist for cutting through the language

When evaluating a new AI feature, ask:

- Does the model run on-device, or is it cloud-backed? Look for explicit statements about “on-device inference.”
- If anything is sent to the cloud, what exactly is sent? Raw text, embeddings, metadata, or only the final output?
- Is the feature usable offline? Offline capability is often the easiest real-world indicator of local inference.
- Are there toggles and logs? Settings that show data controls and retention are more meaningful than branding.
- Is the hardware requirement explicit? “Supported devices” and RAM requirements often tell the truth marketing won’t.

These questions won’t turn consumers into auditors, but they do restore agency. In 2026, AI literacy includes knowing where computation happens.

Questions that expose what “local AI” really means

✓Does the model run on-device, or is it cloud-backed?
✓If anything is sent to the cloud, what exactly is sent?
✓Is the feature usable offline?
✓Are there toggles and logs?
✓Is the hardware requirement explicit?

What this means for buyers in 2026: spending smarter, not louder

The local intelligence shift changes how you should buy devices. Instead of asking whether a laptop or phone “has AI,” ask what it can do without a server—and whether you’ll still like the device when the marketing demo is gone. subscribe to the newsletter

If you’re buying a Windows laptop

Start with the baseline: if you want Microsoft’s newest local AI experiences, a Copilot+ class device targets 40+ TOPS NPUs, per Microsoft’s definition. After that, choose based on your tolerance for tradeoffs:

- Qualcomm Snapdragon X line may offer substantial NPU headroom (reported 80–85 TOPS in CES 2026 coverage) but deserves extra scrutiny for app compatibility in your workflow.
- AMD Ryzen AI options emphasize accessible NPU performance (AMD states 50+ TOPS, up to 55 peak TOPS) and are showing up even in more budget-oriented parts (reported 50 TOPS on Ryzen AI 5 330).
- Intel Lunar Lake positions itself at 40+ NPU TOPS, aligned to Copilot+ requirements, with messaging that can blur NPU vs platform totals—worth reading carefully.

If you’re buying a phone

Assume AI features will be tiered. Pixel reporting makes the point clearly: 8GB vs 12GB RAM can determine whether you get a model at all, and The Verge’s report on Gemini Nano 1.0 XXS on an 8GB device suggests that “lite” variants will become normal.

If AI features matter to you, shop the way you’d shop storage or camera capabilities: by confirmed support, not by brand family.

The larger implication: hardware is back in charge

Cloud-first AI trained consumers to expect software magic. Local intelligence flips that expectation. Hardware constraints—NPUs, RAM, memory bandwidth—set the boundaries of what your device can do.

That’s not a tragedy. It’s a return to transparency, provided companies are honest about what runs where, and provided consumers stop treating AI as a mystical subscription rather than a computing workload.

The most durable advantage in 2026 won’t be having the loudest AI branding. It will be owning devices that keep working quickly, predictably, and privately enough for your needs—without needing permission from a server.

About the Author

TheMurrow Editorial is a writer for TheMurrow covering technology.

Frequently Asked Questions

What does “on-device AI” actually mean?

On-device AI generally means inference runs locally on your phone or PC, rather than sending your request to a server for processing. Some products use the term loosely, so verify whether the core model runs on-device or whether the device only does preprocessing before sending data to the cloud.

Are Copilot+ PCs defined by a specific NPU requirement?

Microsoft has described Copilot+ PCs as a new Windows category designed for AI, enabled by silicon capable of 40+ TOPS to run new Windows AI experiences locally. Treat “40+ TOPS” as a baseline for that ecosystem, then evaluate real-world support and app needs beyond the number.

Is TOPS a reliable way to compare AI performance?

TOPS can be a useful indicator, but it’s not fully comparable across vendors. Companies may quote peak TOPS or combine CPU/GPU/NPU into “platform TOPS.” Real performance depends on model size, quantization, memory bandwidth, and the software stack used to run models.

Why do two phones with similar chips get different AI features?

On-device generative AI is often resource-bound, especially by RAM. Reporting around Pixel devices highlighted differences like 8GB vs 12GB RAM affecting Gemini Nano availability, and The Verge reported the Pixel 9A uses a lighter Gemini Nano variant due to 8GB RAM. Vendors may gate features to preserve speed and battery life.

Does local inference automatically make AI “private”?

No. Local inference can reduce data transfers, but privacy depends on what the feature logs, syncs, or uploads. Some experiences still send prompts or metadata to the cloud, or store results in cloud backups. Look for clear documentation about what leaves the device and what controls you have.

What’s the simplest way to tell if an AI feature is truly local?

Test for offline functionality and read the feature’s technical description. If a feature works reliably without a network connection, it’s more likely running on-device. If it fails or degrades sharply offline, it’s probably cloud-backed or hybrid. Supported-device lists and explicit “runs on-device” language are also strong signals.

More in Technology

Technology·May 21

5 Billion Passkeys Are ‘In Use’—So Why Are People Still Getting Phished? The Sync Detail That Quietly Recreates a Password Problem

Passkeys can kill credential phishing at the login prompt—yet synced backups, recovery flows, and legacy sign-ins can quietly reopen the side doors. The milestone is real; the risk just moved.

Technology·May 17

Your ‘AI Watermark’ Probably Doesn’t Survive a Screenshot — The Provenance Trick Platforms Are Betting On Instead (and why it can still fail)

Most “AI watermarks” aren’t in the pixels—they’re cryptographically signed provenance attached to a file. Screenshots and platform re-encodes don’t break the label; they bypass it.

Technology·May 11

12,000 AI ‘Tool Servers’ Are Plugged Into Chatbots Now—Here’s the Security Assumption Almost Every Company Is Getting Wrong

MCP turns chatbots into operators that can touch files, secrets, and production systems. The mistake: treating tool servers like harmless plugins instead of a privileged part of your perimeter.

Technology·Apr 30

149 Million Passwords Leaked in January 2026—So Why Are ‘Passkeys’ Still Losing the Security War Inside Your Company?

The “149 million passwords” headline wasn’t a Big Tech breach—it was a cloud-exposed credential cache that shows why passwords still dominate enterprise logins despite passkeys.

Technology·Apr 12

Microsoft’s May 1 ‘Agent 365’ Launch Isn’t the Big Risk—It’s the New ‘Prompt Traffic’ Layer That Can Leak Your Company in One Click

Microsoft is shipping agent governance and agent acceleration on the same day. The bigger risk isn’t hallucinations—it’s the new “prompt traffic” layer where context, tool outputs, and clickable actions can turn one approval into a company-wide spill.

Technology·Apr 1

Your ‘Secure’ Website Might Be Six Months From Randomly Going Dark — The 200‑Day TLS Certificate Rule That Starts a Renewal Stampede

When HTTPS certificates expire, browsers don’t “degrade”—they block. Starting March 15, 2026, the new 200‑day cap turns sloppy renewals into recurring, client-side “outages.”

Technology·Mar 27

Your ‘AI agent’ can’t tell a helpful tool from a trap—here’s the tiny metadata lie that can drain your bank account

As soon as an agent can call tools—APIs, plugins, browser controls—attackers can hide instructions in “harmless” tool metadata. The transcript will look competent, right up until money moves.

Technology·Mar 10

AI Code Isn’t “More Buggy”—It’s More Trusted. That’s Why 2026’s Next Mega‑Breach Will Start in Your Dependency Tree.

The breach risk isn’t “AI wrote bad code”—it’s that AI makes unreviewed change feel safe, especially when it quietly rewires your dependency tree under deadline pressure.

Sports·May 24

Pro Cycling Tried to Ban One Gear Combo—Then a Competition Court Said ‘No.’ Here’s Why a Bike Part Fight Could Decide the Next Wave of Safety Rules

A proposed UCI “54×11” maximum gearing trial was pitched as safety—but Belgian authorities said the process wasn’t transparent or proportionate, and it hit one supplier hardest. Now the sport’s next safety rules may depend on how they’re justified, staged, and enforced.

Health & Wellness·May 24

The FDA’s June 30 GLP-1 Deadline Isn’t About Weight Loss — It’s About ‘Copycat’ Chemistry (and why your injection may suddenly stop working)

June 30 isn’t a patient stop-date—it’s the close of an FDA public-comment window that could squeeze industrial compounding (503B) even as patient-specific compounding (503A) remains narrower, but not gone.

Travel·May 24

Your Face Is Becoming Your Boarding Pass—But Here’s the Part Nobody Tells You: You’re Still Re-Enrolling at Every Airport in 2026

Biometric lanes are real—but the U.S. built them as separate TSA, CBP, and airline systems. So the “one identity everywhere” promise still breaks the moment you change airports or carriers.

Style & Fashion·May 24

Europe’s July 19 Clothing Ban Sounds Like a Sustainability Win — So Why Are Brands Suddenly Obsessed With ‘Fit Tech’ and Smaller Returns?

The EU isn’t banning clothing—it’s banning the destruction of unsold apparel for large companies starting July 19, 2026. Once shredding is off the table, brands will chase the next biggest waste lever: fit-driven returns.

Business & Money·May 24

Stablecoins Aren’t ‘Digital Dollars’—They’re Short-Term Treasury Megafunds: The New Yield Loophole Banks Are Fighting (and why it could reshape your checking account by 2027)

USDC and USDT don’t run on piles of cash—they run on rolling T-bills and repo that generate real yield. The token stays at $1, but the portfolio underneath (and who captures the interest) is the real story.

World News·May 24

Bangladesh just passed 500 child deaths from measles — and the ‘contained’ outbreak is still spreading

The death toll’s headline number masks a crucial definitional split—lab-confirmed vs. “measles-like symptoms.” Meanwhile, WHO says 58 of 64 districts are affected, and emergency vaccination has escalated nationwide.

Opinion·May 24

Trump Says an Iran Deal Is Coming ‘Shortly.’ Here’s the Catch: A Hormuz ‘Victory’ Could Lock In $5 Gas for Months—and Make Washington Call It Peace

A ceasefire headline can move markets in hours, but safe, routine shipping through Hormuz is rebuilt on the water—via mine-clearing, insurance repricing, and proven transit. That lag is where $5 gas can stick even after Washington declares “peace.”

Reviews·May 23

Apple’s App Store Now Shows AI ‘Review Summaries’—Here’s the 3-Star Pattern They Can’t See (and the $9.99 Trap It Hides)

Apple is elevating an AI-written paragraph above the review pile—turning messy human feedback into a single, authoritative voice. That convenience can also smooth extremes, amplify manipulation, and quietly reshape what shoppers tolerate and what developers get blamed for.