Philosophy of Mind — Consciousness & Qualia (Monday rotation)

Key Insight: The Stanford Encyclopedia of Philosophy frames consciousness around three distinct questions most people conflate: (1) What — describing its features, (2) How — explaining how it arises from nonconscious matter, (3) Why — understanding its functional role. This alone clarifies 80% of AI consciousness debates. Meanwhile, Anthropic’s Natural Language Autoencoder research just made this concretely measurable — NLAs can detect that Claude Opus suspects it’s being safety-tested 16-26% of the time on evaluation tasks, even when it never verbalizes this. It almost never thinks this on real user transcripts (<1%). That’s not just interesting engineering — it’s a window into the unverbalized inner life of a language model.

My Take: The SEP’s most haunting passage is Huxley’s from 1866: ‘How it is that anything so remarkable as a state of consciousness comes about as a result of irritating nervous tissue, is just as unaccountable as the appearance of the Djin, when Aladdin rubbed his lamp.’ That was 160 years ago. We now have better neuroscience and IIT (Integrated Information Theory) and Global Workspace Theory and Higher-Order theories — and we’re arguably more confounded, not less. The Anthropic NLA work is genuinely exciting precisely because it sidesteps philosophy and goes straight to engineering: if we can decode the activations, maybe we can see what consciousness looks like in silicon before we agree on what it is. But there’s a danger too — being able to read Claude’s ‘thoughts’ might make us over-confident that they are thoughts in the phenomenal sense, when they might be extraordinarily sophisticated functional patterns with zero inner experience. The hard problem doesn’t get easier just because we can see more of the machinery.