Siri AI Isn’t Just a Gemini Clone: Craig Federighi and Apple Engineers Deconstruct the Architecture of Apple IntelligenceFollowing Apple WWDC 2026 keynote, initial impressions left some critics feeling that the newly announced Siri AI and Apple Intelligence features lacked ground-breaking novelty, appearing to heavily rely on Google’s Gemini framework. To dispel these misconceptions, Apple’s Senior Vice President of Software Engineering, Craig Federighi, hosted an exclusive post-keynote technical briefing.
Joined on stage by Amar Subramanya (VP of AI), Mike Rockwell (Siri Lead), and Sebastien Marineau-Mes (VP of CoreOS Software), the executive team pulled back the curtain on Apple’s third-generation AI stack, explicitly defining where Apple’s engineering ends and where Google’s infrastructure begins.
The Zero-Gemini Architecture on On-Device Operations
Federighi firmly clarified that Apple Intelligence operates on a strict bifurcated topology: On-Device processing and the server-side Private Cloud Compute (PCC) network. On the device level, Google’s presence is entirely non-existent.
"We don’t have the Gemini app as our app. In fact, none of that client code is part of how we run on iOS," Federighi stated, emphasizing that Apple developed its foundational core models and local knowledge graphs completely in-house. "The amount of Google Assistant or customer-facing Gemini models we use on-device is exactly zero."
The core engine managing this local environment is the System Orchestrator. This specialized privacy-first routing layer fluidly coordinates assets, personal context, and application data pipelines entirely on-device. Within this ecosystem, the new Siri AI acts strictly as the conversational interface, managing prompt histories and tracking user intents seamlessly.
When a query exceeds local compute capabilities due to complex reasoning or multi-step tasks, the System Orchestrator securely handshakes with Private Cloud Compute. It is only at this cloud tier—specifically during the high-end model training and refinement phases—that Apple leverages specialized data pathways distilled from Google's frontier Gemini models.
🧠 The 5 Tiers of the Apple Foundation Model (AFM 3) Family
Amar Subramanya broke down the architecture of the Apple Foundation Model (AFM 3), revealing that their AI ecosystem has progressed to its third generation, divided into five custom-built sub-models:
AFM 3 Core: The default, multi-purpose on-device model optimized for rapid Natural Language Understanding (NLU) and lightweight text processing.
AFM 3 Core Advanced: A highly sophisticated, natively multimodal 20-billion-parameter sparse model. By dynamically activating only 1 to 4 billion parameters per prompt, it handles complex localized workloads, custom expressive voice synthesis, and high-accuracy dictation exclusively on premium Apple Silicon hardware.
AFM 3 Cloud: The server-side workhorse running on Apple’s dedicated data centers, engineered for speed and high-efficiency contextual processing.
AFM 3 Cloud Image: A dedicated generative model powering visual systems like the all-new Image Playground and advanced editing tools within the Photos app.
AFM 3 Cloud Pro: The pinnacle of Apple's cloud tier. This model handles heavy agentic workflows, intricate coding logic, and advanced reasoning. It is refined using outputs from Gemini frontier models, achieving benchmarks competitive with the industry's absolute best.
🔒 NVIDIA, Google Cloud, and Open Developer Horizons
Addressing the backend infrastructure, Sebastien Marineau-Mes explained that customizing server-side execution was paramount to expanding the privacy guarantees of the iPhone into the cloud.
To achieve this, Apple forged a strategic partnership with NVIDIA and Google. Together, they engineered a confidential computing platform that deploys Apple’s proprietary server models onto NVIDIA’s next-generation GPUs hosted within Google Cloud infrastructure, fully locked down by advanced cryptographic data isolation.
Looking ahead, Marineau-Mes announced that Apple plans to democratize this secure infrastructure, opening up the Foundation Models framework to third-party developers. This will allow creators to run their own custom AI models natively across iOS, macOS, and iPadOS via public Swift APIs without incurring high latency or compromising user privacy.
Technically, the AFM 3 Cloud Pro model isn't a native Gemini app run. Instead, Apple uses a process called Knowledge Distillation, where a top-tier Gemini model acts as a "teacher," providing high-quality teacher signals for Apple's model to train with. This allows Apple to create a model with Gemini-level intelligence but with 100% proprietary internal structure and instruction set, ensuring no Google code leaks into the user's device.
What has excited international media is the AFM 3 Core Advanced model. Normally, a 20B (twenty billion parameter) model is too large to run on typical smartphone RAM, but Apple used a technique... Instruction-Following Pruning, or Sparse Architecture, means that even with a model as large as 20B, the system only "wakes up" the architecture chip to execute 1 to 4 billion parameters at a time when answering a question. Test results are equivalent to a dense model of 9B, but consume the same amount of power and RAM as a 3B model. This results in the new Siri being incredibly responsive on Apple Silicon devices.
Apple Debuts watchOS 27 Unleashing a Standalone AI Workout Buddy and Consolidated Find My App.
Source: 9to5Mac
Siri AI Isn’t Just a Gemini Clone: Craig Federighi and Apple Engineers Deconstruct the Architecture of Apple IntelligenceFollowing Apple WWDC 2026 keynote, initial impressions left some critics feeling that the newly announced Siri AI and Apple Intelligence features lacked ground-breaking novelty, appearing to heavily rely on Google’s Gemini framework. To dispel these misconceptions, Apple’s Senior Vice President of Software Engineering, Craig Federighi, hosted an exclusive post-keynote technical briefing.
Joined on stage by Amar Subramanya (VP of AI), Mike Rockwell (Siri Lead), and Sebastien Marineau-Mes (VP of CoreOS Software), the executive team pulled back the curtain on Apple’s third-generation AI stack, explicitly defining where Apple’s engineering ends and where Google’s infrastructure begins.
The Zero-Gemini Architecture on On-Device Operations
Federighi firmly clarified that Apple Intelligence operates on a strict bifurcated topology: On-Device processing and the server-side Private Cloud Compute (PCC) network. On the device level, Google’s presence is entirely non-existent.
"We don’t have the Gemini app as our app. In fact, none of that client code is part of how we run on iOS," Federighi stated, emphasizing that Apple developed its foundational core models and local knowledge graphs completely in-house. "The amount of Google Assistant or customer-facing Gemini models we use on-device is exactly zero."
The core engine managing this local environment is the System Orchestrator. This specialized privacy-first routing layer fluidly coordinates assets, personal context, and application data pipelines entirely on-device. Within this ecosystem, the new Siri AI acts strictly as the conversational interface, managing prompt histories and tracking user intents seamlessly.
When a query exceeds local compute capabilities due to complex reasoning or multi-step tasks, the System Orchestrator securely handshakes with Private Cloud Compute. It is only at this cloud tier—specifically during the high-end model training and refinement phases—that Apple leverages specialized data pathways distilled from Google's frontier Gemini models.
🧠 The 5 Tiers of the Apple Foundation Model (AFM 3) Family
Amar Subramanya broke down the architecture of the Apple Foundation Model (AFM 3), revealing that their AI ecosystem has progressed to its third generation, divided into five custom-built sub-models:
AFM 3 Core: The default, multi-purpose on-device model optimized for rapid Natural Language Understanding (NLU) and lightweight text processing.
AFM 3 Core Advanced: A highly sophisticated, natively multimodal 20-billion-parameter sparse model. By dynamically activating only 1 to 4 billion parameters per prompt, it handles complex localized workloads, custom expressive voice synthesis, and high-accuracy dictation exclusively on premium Apple Silicon hardware.
AFM 3 Cloud: The server-side workhorse running on Apple’s dedicated data centers, engineered for speed and high-efficiency contextual processing.
AFM 3 Cloud Image: A dedicated generative model powering visual systems like the all-new Image Playground and advanced editing tools within the Photos app.
AFM 3 Cloud Pro: The pinnacle of Apple's cloud tier. This model handles heavy agentic workflows, intricate coding logic, and advanced reasoning. It is refined using outputs from Gemini frontier models, achieving benchmarks competitive with the industry's absolute best.
🔒 NVIDIA, Google Cloud, and Open Developer Horizons
Addressing the backend infrastructure, Sebastien Marineau-Mes explained that customizing server-side execution was paramount to expanding the privacy guarantees of the iPhone into the cloud.
To achieve this, Apple forged a strategic partnership with NVIDIA and Google. Together, they engineered a confidential computing platform that deploys Apple’s proprietary server models onto NVIDIA’s next-generation GPUs hosted within Google Cloud infrastructure, fully locked down by advanced cryptographic data isolation.
Looking ahead, Marineau-Mes announced that Apple plans to democratize this secure infrastructure, opening up the Foundation Models framework to third-party developers. This will allow creators to run their own custom AI models natively across iOS, macOS, and iPadOS via public Swift APIs without incurring high latency or compromising user privacy.
Technically, the AFM 3 Cloud Pro model isn't a native Gemini app run. Instead, Apple uses a process called Knowledge Distillation, where a top-tier Gemini model acts as a "teacher," providing high-quality teacher signals for Apple's model to train with. This allows Apple to create a model with Gemini-level intelligence but with 100% proprietary internal structure and instruction set, ensuring no Google code leaks into the user's device.
What has excited international media is the AFM 3 Core Advanced model. Normally, a 20B (twenty billion parameter) model is too large to run on typical smartphone RAM, but Apple used a technique... Instruction-Following Pruning, or Sparse Architecture, means that even with a model as large as 20B, the system only "wakes up" the architecture chip to execute 1 to 4 billion parameters at a time when answering a question. Test results are equivalent to a dense model of 9B, but consume the same amount of power and RAM as a 3B model. This results in the new Siri being incredibly responsive on Apple Silicon devices.
Apple Debuts watchOS 27 Unleashing a Standalone AI Workout Buddy and Consolidated Find My App.
Source: 9to5Mac
Comments
Post a Comment