Mark Gurman reported earlier this month that Apple is on track to launch its first smart glasses in September or October, with broader rollout in early 2027. Display-free at first. Acetate frames. Tight iPhone integration. A premium contrast against Meta's plastic Ray-Bans.
My former Thoughtworks colleague Mike Mason has been thinking out loud about what this hardware does to the world it lands in, and his piece is worth reading before this one. The short version: the hardware is arriving faster than the norms, and the question of whether your conversations are being captured is going to get a lot harder to answer. Mike walks through a framework from Barry O'Reilly (another former colleague, and the author of the new book Artificial Organizations) for ethical capture: consent first, synthesis over verbatim, trust over data maximisation. He notes that it works inside controlled relationships and breaks down everywhere else. He leaves the reader with three questions:
-
Is consent even structurally possible at scale?
-
Do we create physical "AI-free" zones?
-
Does visible hardware become a new signaling mechanism — like taking notes used to be?
Mike expects Apple will handle this better than Meta, given their track record on on-device processing and transparent indicators. But he closes the door on Apple solving the bystander problem at all. I want to push back, gently, on that last bit. I think they can — partially — and the building blocks are already there.
What Apple is already carrying into the category
The core of my proposal is straightforward: take the technology Apple already ships in AirTags — a rotating, privacy-preserving Bluetooth identifier broadcast on a public channel and picked up opportunistically by every nearby Apple device — and repurpose it to broadcast a tamper-proof "I am recording" signal from any device with an active camera or microphone. The rotating-key pattern preserves the wearer's privacy (no persistent tracking ID) while still letting bystanders' phones detect the state. The crucial extra piece is hardware attestation: the recording state would be signed by the Secure Enclave, the same chain of trust that powers App Attest and Face ID, so the signal can't be spoofed or suppressed by a software exploit. A glasses owner who wanted to record covertly couldn't simply turn the broadcast off — the broadcast and the recording state are bound together at the hardware layer.
The reason this is plausible rather than fanciful is that Apple has already shipped every part of it for other purposes. The orange dot when the microphone is in use is the precedent for OS-level capture indicators that apps can't suppress. App Tracking Transparency is the precedent for Apple shipping privacy controls that the rest of the industry has to negotiate around. And AirTag unwanted-tracking detection — co-developed with Google as a cross-platform IETF draft after the stalking pressure mounted — is the most relevant precedent of all, because it proves Apple can ship a privacy protocol that works on Android too. A recording-state broadcast that only iPhones could see would be useless. The radio and crypto pieces are equally well-established: Find My's rotating identifiers, the U1/U2 ultra-wideband chips for proximity, and the Secure Enclave for tamper-resistant signing. Nothing needs to be invented.
The interesting bit is the inverse signal. A "do not record" beacon, broadcast from a bystander's iPhone when the user has set that preference. Apple Glasses receive these advertisements within range, and the on-device vision pipeline (which already does face detection for accessibility and computational photography) blurs or excludes those individuals from any captured stream before it ever hits storage. All on-device. No identity exchange beyond an ephemeral rotating key. The glasses don't need to know who the person is, just that someone in frame has opted out.
That stack lets us walk through Mike's three questions, and see the outline of an answer:
Is consent even structurally possible at scale?
Not in the form we usually mean. Verbal or written consent doesn't scale to dozens of incidental encounters per day. The cost of asking is too high, the cost of refusing is too high, and nothing else gets done.
But machine-mediated consent might scale, because it drops both costs close to zero. The protocol asks on your behalf. The protocol refuses on your behalf. Your phone can default to broadcasting a no-record signal in public spaces, the same way it already defaults to Find My. Bystanders don't need to notice the encounter happened; the negotiation runs in the background.
This is something narrower than full consent in the rich sense Barry has been working with: it's a baseline floor, automated and ambient, where a person who has expressed a preference is excluded from passive capture by trust-worthy devices. Still a substantive improvement on the world we're heading into, albeit one that likely will require further iteration to get right.
Do we create physical "AI-free" zones?
Yes, and the same protocol carries the broadcast. A venue, an office, a school, a courthouse, a hospital ward, a private home: any of these could emit a space-level "no record" beacon, and Apple Glasses (and any device that joined the standard) would honour it. The architecture is identical to the personal version, just emitted from a fixed point rather than a phone in someone's pocket.
The same honest-actor caveat applies. A space-level broadcast is a norm, not a wall. Bad-faith devices don't honour it; custom firmware doesn't honour it. But social pressures can be brought to bear, and the social meaning of crossing the line shifts. Right now, recording in a private venue is practically invisible. With a broadcast standard, it becomes a visible choice to ignore the signal. Equally, it may become easier and more acceptible to ask people to remove devices known to not participate when alternatives exist.
It also opens opportunities to let venues do something physics alone can't: signal not "no recording" but "recording is fine, but please synthesise rather than transcribe verbatim", which is closer to Barry's framework for group settings, scaled to a public space.
Does visible hardware become a new signaling mechanism?
It already is, but the signalling is currently analog and weak. Meta's recording LED is the floor: easily covered, easily missed, not machine-readable. The notebook on the table in Mike's analogy worked because everyone in the room could see it. The equivalent today needs to reach your phone, not just your eye, because the recording device might be a pendant, a button, a pair of glasses indistinguishable from regular sunglasses, or eventually a contact lens.
A protocol-level broadcast turns the hardware into a signal that propagates further than the wearer's body. Your phone tells you who in the room is recording, the same way it tells you which AirTags are travelling with you. The glasses still light up; the analog signal is still there for politeness. But the digital signal is the one that actually carries.
What this still doesn't fix
A protocol can mark capture, give bystanders a way to push back, and let venues set policy. It can't restore the cultural expectation that ordinary conversation is off-the-record by default. That's what's actually being eroded: the unspoken contract that what you say in a coffee shop or a hallway exists only in the memories of the people who heard it, decaying naturally over time, available for revision and forgetting. As the technology becomes more commonplace, having a default 'do not record' stance may still risk being viewed a pariah.
Mike's closing distinction (that capture which strengthens trust turns conversations into durable insight, while capture that weakens it turns conversations into performance) is the right frame for what's at stake. The protocol I'm describing doesn't strengthen trust on its own. It just makes the question of who's recording answerable, and allows for the possibility of negotiation, and these are preconditions for trust, not a substitute for it.
But "answerable" is a lot more than what we have now, which is closer to "we have no idea, ask the wearer and hope they tell you the truth." A default-on, hardware-attested broadcast — with an inverse signal that capture devices honour, a venue-level extension, and a cross-platform standard so it works on Android too — would shift that default. It wouldn't make the always-on question go away, but it would mean the question has somewhere to land.
That seems worth doing, even partially. And given where Apple's competitive positioning is heading into September, it's also the kind of opportunity they should be hunting for.