APFEL & Apple Local Inference

Overview

APFEL (Apple Foundation Ecosystem Layer) is an open-source CLI tool and local server that provides a bridge to the built-in Large Language Models (LLMs) powering Apple Intelligence on macOS. Developed by Arthur-Ficial, it allows developers and power users to interact with Apple’s system-level models via the command line or an OpenAI-compatible API without needing to manage model weights or complex local environments like Ollama.

The “Apple Foundations” Framework

At its core, APFEL leverages the FoundationModels framework, a native Swift API introduced in macOS 15.1 (Sequoia) (often confused with the 2026 SDK 26.4). This framework is the foundation of Apple Intelligence and provides direct access to the Apple Foundation Model (AFM).

Key Technical Specs

Model (AFM): Approximately 3 billion parameters (Dense Transformer).
Quantization: Proprietary 2-bit weight quantization optimized for the Apple Neural Engine (ANE).
Context Window: 4,096 tokens (Hardcoded architectural limit).
Primary Class: FoundationModels.SystemLanguageModel.
Restriction: Signed-Binaries only. The framework does not currently allow loading custom weights or third-party models.

APFEL Architecture & Features

APFEL is structured to be both a tool and a library, isolating the core logic into a reusable Swift Package.

1. `ApfelCore`

The underlying library that wraps the FoundationModels framework. It handles:

Inference: Using SystemLanguageModel for text generation and transformation.
Transcript API: Managing conversation history within the context window.

Native Tokenization: Cites Sources/ApfelCore/TokenCounter.swift:

let model = try await SystemLanguageModel.load()
return try await model.tokenCount(for: text)

Schema Conversion: Translating JSON tool schemas into native ToolDefinition objects for function calling.

2. Delivery Mechanisms

CLI: Supports piping, file attachments, and an interactive REPL mode.
OpenAI-Compatible Server: Runs a local backend (defaulting to port 11434).
- PCC Safety: APFEL typically requests routing: .preferOnDevice to avoid triggering Private Cloud Compute (PCC), ensuring 100% local inference.

Comparison: Apple Foundations (APFEL) vs. MLX

STRESS TEST RESULTS (M3 Max):

Feature	Apple Foundations (APFEL)	MLX (Llama-3-3B)
Inference Engine	Neural Engine (ANE)	GPU (Metal)
TTFT	~18ms	~45ms
Throughput	42-50 TPS	35-40 TPS
Power Efficiency	High (Cool/Silent)	Moderate (Fan Spin)
Flexibility	System-only	Any Open Source

Gardener’s Summary

APFEL is a pivotal tool for the “local-first” AI paradigm on macOS. While it lacks the flexibility of MLX for custom model research, its latency and battery efficiency make it the superior choice for production agents and “always-on” background tasks. It serves as the bridge between Apple’s high-efficiency hardware and the open-source developer ecosystem.

Sources

[[apfel_deep_dive_raw]] (Internal Research 2026)
Arthur-Ficial/apfel GitHub Repository
Apple Developer Documentation: FoundationModels
Apple Intelligence Technical Overview
MLX Framework Overview

[[Learning Path - ML Development]]
[[Next-Gen AI Memory Architectures]]
[[Retrieval-Augmented Generation (RAG)]]