Confidential Computing’s Inconvenient Truth

This is part of a series on confidential computing. See also: Confidential Computing: What It Is, What It Isn’t, and How to Think About It for practical deployment guidance, and Why Nobody Can Verify What Booted Your Server for the attestation infrastructure gap. Two companion reference documents provide the evidence base: the TEE Vulnerability Taxonomy and TPM Attestation and PCR Verification: The Infrastructure Gap.

Confidential computing has a vulnerability record that grows every year, an attestation infrastructure that does not work at scale, and a hardware root of trust with a demonstrated shelf life. This piece explains why.

I want to be clear about where I stand before cataloging problems. I believe in this technology. What Signal has done with Private Contact Discovery and Sealed Sender using SGX enclaves, building systems where even Signal’s own servers cannot see who is contacting whom, is exactly the kind of architecture that confidential computing makes possible. Apple’s Private Cloud Compute takes the model further. Every production build is published to a transparency log, user devices will only communicate with nodes whose attested measurements match the log, and Apple released a virtual research environment so anyone can verify the claims independently. Moxie Marlinspike’s Confer applies the same idea to AI inference, with all processing inside a TEE and remote attestation so the service provider never has access to your conversations. These are real systems delivering real privacy guarantees that would be hard to achieve any other way.

More broadly, TEEs make systems more verifiable. Instead of asking users to take on faith that a service handles their data correctly, the service can prove it through attestation. I wrote earlier about attestation as the MFA for machines and workloads, and I explored the same idea in 2022 in the context of certificate authorities. If the CA runs open-source software on attesting hardware with reproducible builds, you can verify its behavior rather than trusting an annual audit. That shift, from asserted trust to verifiable trust, is genuinely important, and confidential computing is what makes it possible.

But “the direction is right” is not the same as “the current state is adequate.” We should not make perfection the enemy of good. This technology delivers real value today. But we also cannot afford to mistake the current state for the desired end state. Getting to where this technology needs to be requires seeing clearly where it actually is. That is what this piece is about.

The answer is not “the implementations are buggy.” The answer is structural. These technologies were designed for threat models that do not match how they are being deployed. Smart cards and HSMs were physically discrete devices with clear trust boundaries. TPMs were designed for boot integrity on enterprise desktops. Intel SGX was designed for desktop DRM. Each was repurposed for the cloud because the technology existed and the market needed something now. The repurposing created systematic security gaps that the research community has spent a decade documenting and the market has spent a decade deploying through.

In March 2025, I published a technical reference on security hardware and an in-depth companion document that categorized how these technologies fail. One of those failure categories was “Misuse Issues”: vulnerabilities that occur when security technology is adopted beyond its original design. A year later, with TDXRay reconstructing LLM prompts from inside encrypted VMs, TEE.Fail extracting attestation keys with a $1,000 device, and the SGX Global Wrapping Key extracted from hardware fuses, that observation warrants a much fuller treatment.

Timeline

YearEventCategory
1968Smart card patents (Dethloff, Moreno). Special-purpose computers in tamper-resistant packages. The original TEE.Hardware TEE
1980sIBM secure coprocessors for banking. US government funds kernelized secure OS research.Hardware TEE
1996nCipher founded. nShield HSMs with CodeSafe: custom application code inside tamper-resistant hardware.Hardware TEE
1998IBM 4758 commercially available. Arbitrary code execution inside tamper-responding enclosure. FIPS 140-1 Level 4.Hardware TEE
2003TCG founded, TPM standardized. Designed for boot integrity from ring -x. Hardware root of trust, measurement chains, attestation concepts established.Institutional
2006AWS launches EC2. Public cloud computing begins. Workloads move to shared infrastructure owned by someone else.Cloud
2006BitLocker ships with TPM support. TPMs reach millions of enterprise devices. Reference value infrastructure never materializes.Hardware TEE
2008-2010Cloud goes mainstream. Azure (2010), GCP (2008), OpenStack (2010). Multi-tenant shared infrastructure becomes the default enterprise compute model.Cloud
2012AlexNet wins ImageNet. Deep learning proven at scale on GPUs. AI workloads begin moving to cloud GPU infrastructure.AI
2013Apple Secure Enclave Processor (iPhone 5s). Physically separate processor on SoC. First mass-market TEE. Invisible to users.Hardware TEE
2015Intel SGX (Skylake). Enclaves inside the CPU. Designed for desktop DRM: single-tenant threat model. Cloud providers begin evaluating for multi-tenant use.CPU TEE
2016AMD SEV. VM-level memory encryption. First CPU TEE designed with virtualization in mind.CPU TEE
2017Transformer architecture published (“Attention Is All You Need”). Foundation for the model scale that will drive confidential computing demand.AI
2017First SGX side-channel attacks. Cache-timing, Spectre adaptation. Desktop design meets multi-tenant reality.Vulnerability
2018Foreshadow (L1TF) reads arbitrary SGX memory. SEVered remaps SEV guest pages. Desktop-to-cloud threat model gap exploited.Vulnerability
2019Confidential Computing Consortium founded (Google, Microsoft, IBM, Intel, Linux Foundation). Repurposing becomes official strategy.Institutional
2019Plundervolt, ZombieLoad, RIDL. Three distinct attack classes against SGX in one year.Vulnerability
2020GPT-3 (175B parameters). Model weights become billion-dollar assets. Protecting weights on shared infrastructure becomes a business requirement.AI
2020AWS Nitro Enclaves. Purpose-built for cloud, not repurposed from desktop. The exception to the pattern.Cloud
2020AMD SEV-SNP, Intel TDX announced. VM-level TEEs designed for cloud but still sharing microarchitectural resources. Azure/GCP ship confidential VMs with vTPMs.Cloud
2021Intel deprecates SGX on consumer CPUs (11th/12th gen Core). Desktop DRM cannot sustain the technology alone.CPU TEE
2022ChatGPT launches (Nov). AI goes mainstream. Every enterprise begins evaluating LLM deployment on cloud infrastructure.AI
2022ÆPIC Leak, SGX.Fail. Vulnerable platforms remain in TRUSTED attestation state months after disclosure.Vulnerability
2023GPT-4, Llama 2, Claude 2. Foundation model race accelerates. EU AI Act passed.AI
2023Downfall (SGX), CacheWarp (SEV-SNP). CacheWarp is first software-based attack defeating SEV-SNP integrity. NVIDIA H100 confidential GPU ships.Vulnerability
2024Confidential AI goes mainstream. Azure, GCP, AWS all position confidential computing for AI. TDXdown and Heckler attacks hit TDX. HyperTheft extracts model weights via ciphertext side channels.AI / Vulnerability
2025 FebGoogle finds insecure hash in AMD microcode signature validation (CVE-2024-56161). Malicious microcode loadable under SEV-SNP.Vulnerability
2025 MayGoogle announces confidential GKE nodes with NVIDIA H100 GPUs. Confidential AI training and inference on GPU clusters.AI
2025 OctTEE.Fail. $1K DDR5 bus interposer extracts attestation keys from Intel TDX and AMD SEV-SNP. Attestation forgery demonstrated.Vulnerability
2025 DecIDC survey: 75% of organizations adopting confidential computing, 84% cite attestation validation as top challenge. Gartner predicts 75% of untrusted-infra processing uses CC by 2029.Institutional
2025 DecIETF RATS CoRIM reaches draft-09. Reference value format standards mature. Vendor adoption of publishing measurements remains minimal.Institutional
2026 JanStackWarp (CVE-2025-29943). Stack Engine synchronization bug enables deterministic stack pointer manipulation inside SEV-SNP guest via MSR toggling. Affects AMD Zen 1 through Zen 5. USENIX Security 2026.Vulnerability
2026TDXRay (IEEE S&P 2026). Reconstructs LLM user prompts word-for-word from encrypted TDX VMs by monitoring tokenizer cache access patterns. No crypto broken. UC San Diego, CISPA, Google.AI / Vulnerability
2026 MarNVIDIA publishes zero-trust AI factory reference architecture. CPU TEE + confidential GPU + CoCo + KBS. Model weights encrypted until attestation passes.AI
2026 Mar 31Ermolov extracts SGX Global Wrapping Key from Intel Gemini Lake. Root key extraction via arbitrary microcode. Unpatchable (hardware fuses).Vulnerability

Trusted Platform Modules: Boot Integrity and System State

The idea that hardware should measure and attest to software integrity goes back to the late 1990s. The Trusted Computing Group, formed in 2003, standardized the Trusted Platform Module, a discrete chip that stores cryptographic keys and maintains Platform Configuration Registers recording the boot chain as a sequence of hash measurements.

The TPM was designed to solve a specific problem: bootloader-level attacks. Rootkits and bootkits that compromised the system before the OS loaded were invisible to any software-based security tool. The TPM sat below the OS, measuring each boot stage before execution. It could answer a question that no operating system could answer about itself: did this machine boot the software it was supposed to boot?

Each boot stage measures the next before handing off execution. The measurements are extended into PCRs using a one-way hash chain: PCR_new = Hash(PCR_old || measurement). The TPM can produce a signed quote of its PCR values, and a remote verifier can check whether the system booted the expected software stack.

TPMs shipped in millions of enterprise laptops and servers. BitLocker used TPM-sealed keys for disk encryption. Linux distributions added measured boot support. But TPMs never achieved the broad security impact their designers envisioned. The problem was practical: to verify a TPM quote, you need to know what the correct PCR values should be, and nobody built the infrastructure to distribute and maintain those reference values at scale.

The TPM could tell you what booted. It could not tell you whether what booted was good.

What TPMs did accomplish was laying the conceptual groundwork for everything that followed. Hardware root of trust, measurement chains, remote attestation, platform state quotes. All of this vocabulary originated in the TPM ecosystem. Modern CPU TEEs inherited these concepts even as their architectures diverged significantly from the TPM model.

Hardware-Isolated Execution: Older Than You Think

Running code inside a tamper-resistant hardware boundary did not start with Intel or Apple. It started with smart cards.

Smart cards emerged in the late 1960s as special-purpose computers embedded in plastic cards. By the 1980s, they were executing cryptographic operations in banking, telecommunications, and government ID. A smart card is a tiny computer with its own processor, memory, and operating system, running inside a tamper-resistant package. That is a trusted execution environment by any reasonable definition, even if nobody called it that at the time.

HSMs extended the same concept to server-class computing. IBM’s 4758, commercially available in the late 1990s, provided a tamper-responding enclosure with its own processor, battery-backed memory, and secure boot chain. If someone tried to open the case, drill through it, or expose it to extreme temperatures, the device would zeroize its keys. The 4758 ran arbitrary code inside the boundary.

nCipher (founded 1996, later acquired by Thales) took this further with CodeSafe on the nShield HSM line, a development framework for deploying custom applications inside the HSM. This was general-purpose computation inside a hardware trust boundary, exactly the model that SGX would later attempt to replicate in silicon without a separate physical device. I spent years working with these HSMs. They ran custom signing logic, policy engines, tokenization routines, and key derivation functions, all inside the tamper-resistant module where the host OS could not observe or interfere.

The difference between these earlier systems and modern confidential computing is not the concept. It is the integration point. Smart cards and HSMs are discrete devices with well-defined physical boundaries. You can see the trust boundary. You can hold it in your hand. SGX, TDX, and SEV moved the trust boundary inside the CPU itself, eliminating the separate device but also eliminating the physical clarity. When the trust boundary is a set of microarchitectural state bits inside a processor with billions of transistors and a microcode layer updated quarterly, the attack surface becomes much larger.

Apple’s Secure Enclave Processor, introduced with the iPhone 5s in 2013, sat between these two models. It was a physically separate processor on the SoC with its own encrypted memory, dedicated to protecting biometric data and cryptographic keys. Even a fully compromised application processor with root privileges could not reach the Secure Enclave’s memory.

The SEP succeeded where HSMs had stayed confined to data centers for two reasons. It was invisible to users. Nobody configured it or provisioned it. And it protected something users cared about: their fingerprints and their money. The security was a means to a consumer feature, not a product in itself.

Intel SGX: Designed for the Desktop

Intel SGX, introduced with Skylake processors in 2015, brought the enclave concept to general-purpose computing. Instead of a separate processor, SGX created isolated memory regions within the main CPU. Code and data inside an enclave are encrypted in memory and protected from all other software on the system. The enclave’s measurement (MRENCLAVE) is a hash of exactly what was loaded, making attestation straightforward. One binary, one deterministic hash.

SGX was designed for the desktop. Its primary use cases were single-tenant scenarios like content protection, DRM key management, and Ultra HD Blu-ray playback. The threat model is clear. One machine, one user, and the enclave protects the content owner’s code from that user.

This is a single-tenant threat model. The attacker is the machine owner. There is no hypervisor. There are no co-tenant workloads competing for shared microarchitectural resources. The side-channel attack surface exists, but the economic incentive is limited. The attacker gains access to one DRM key or one media stream.

Enterprise adoption beyond DRM was limited. SGX enclaves had severe memory constraints (initially 128MB). Programming for SGX required partitioning applications into trusted and untrusted components. Intel deprecated SGX from consumer processors in 2021. The desktop DRM use case was not enough to sustain the technology.

Cloud Adoption and the Threat Model Mismatch

The cloud introduced a fundamentally different threat model, and this is where the problems began.

In the desktop DRM model, you protect your code from one user on one machine. In the cloud, you protect your code and data from the infrastructure provider, co-tenant workloads, the hypervisor, firmware, and anyone with physical access to a shared data center. The provider controls the hardware, the hypervisor, the firmware, the physical facility, and the scheduling of workloads across shared CPU cores.

The industry took technologies designed for the desktop single-tenant model and applied them to this multi-tenant cloud model. The architectural mismatch opened attack surfaces that the original designs did not anticipate.

SGX on a desktop shares caches, branch predictors, execution ports, and power delivery with the enclave owner’s own code. On a cloud server, those same resources are shared with co-tenant workloads controlled by different parties, each potentially adversarial. Cache-timing attacks that were theoretical on a desktop became practical in the cloud because the attacker could run arbitrary code on the same physical core. The side-channel catalog that accumulated against SGX from 2017 onward was not a series of implementation bugs. It was a consequence of deploying a single-tenant design in a multi-tenant environment.

AMD SEV and Intel TDX were designed with the cloud threat model more explicitly in mind, protecting entire virtual machines rather than individual enclaves. But they still share fundamental hardware resources with the hypervisor and co-tenants. CPU caches, memory buses, power delivery, and microarchitectural scheduling state. CacheWarp, StackWarp, WeSee, and Heckler all exploit the interfaces between the confidential VM and the hypervisor that manages it.

Virtual TPMs are another instance of the same pattern. Physical TPMs provide hardware-rooted trust because they are discrete chips with their own silicon. A vTPM is software running inside the hypervisor or a confidential VM. Cloud providers adopted vTPMs because provisioning hardware TPMs per VM is impractical at scale. The vTPM’s trust root is the software stack that hosts it. If the hypervisor is compromised, the vTPM is compromised.

The Repurposing Pattern

This is a recurring pattern in security technology, and it is one I have watched play out multiple times in my career. Build X for threat model Y, then repurpose X for threat model Z because X already exists and deploying it is cheaper than building something new.

SMS was designed for person-to-person messaging. It was repurposed for two-factor authentication because every phone could receive an SMS. The threat model assumed the cellular network was trusted. SIM swapping, SS7 interception, and malware-based SMS capture exploited the gap between “messaging channel” and “authentication channel.” NIST deprecated SMS-based 2FA. SMS OTP is still everywhere because deployment inertia exceeds the security community’s ability to move the market.

SSL was designed for securing web browsing sessions. It was repurposed for API authentication, IoT device communication, email encryption, and VPN tunneling. Each repurposing exposed assumptions in the original design that did not hold in the new context. The ecosystem spent two decades fixing the gaps through Certificate Transparency, HSTS, and progressively stricter CA/Browser Forum requirements. I was part of that ecosystem. The fixes were not inevitable. They required sustained institutional effort.

TPMs were designed for boot integrity on enterprise desktops. They were repurposed as vTPMs for cloud VM attestation, trading hardware isolation for scalability. SGX was designed for desktop DRM. It was repurposed for cloud confidential computing, trading single-tenant simplicity for multi-tenant attack surface. Each repurposing followed the same logic. The technology existed, the market needed something, and “available now with known limitations” beat “purpose-built but years away.”

The repurposed technology works well enough to create adoption. The adoption creates dependency. The dependency makes it difficult to replace even after the threat model gap is well understood. And the security research community spends years documenting the consequences while the market continues deploying.

AWS took a different path with Nitro Enclaves. Rather than building on CPU instruction extensions designed for desktops, Nitro Enclaves are isolated virtual machines on a purpose-built hypervisor with no persistent storage, no network access, and no access from the host. The Nitro model sidestepped many of the shared-resource problems because the hypervisor is minimal and the enclave has dedicated resources. The measurement model is clean. One image, one deterministic measurement.

Azure and GCP followed with confidential VM offerings on AMD SEV-SNP and Intel TDX. Google has positioned confidential computing as foundational to AI, expanding support across Confidential VMs, Confidential GKE Nodes, and Confidential Space with Intel TDX and NVIDIA H100 GPUs.

NVIDIA entered with confidential GPU support on H100 and Blackwell architectures. Their reference architecture for “zero-trust AI factories” combines CPU TEEs with confidential GPUs, Confidential Containers via Kata, and a Key Broker Service that releases model decryption keys only after remote attestation succeeds. Model weights remain encrypted until the hardware proves the enclave is genuine. This positions confidential computing as IP protection for model owners deploying on infrastructure they do not control.

Intel launched Trust Authority as a SaaS attestation service independent of the cloud provider. If the cloud provider both runs your TEE and verifies its attestation, you are still trusting the provider. An independent verifier breaks that circularity.

By 2025, every major hardware vendor and every major cloud provider had a confidential computing offering. The question was no longer whether the technology existed. It was whether anyone could make it work at scale.

Why It Never Hit Mass Adoption

Despite the investment, confidential computing did not achieve mass adoption through the SGX era or the first wave of confidential VMs. Several problems compounded.

Attestation is hard to operationalize. The verification step requires infrastructure that most organizations do not have and that the ecosystem has not built. I wrote about this problem in detail in Why Nobody Can Verify What Booted Your Server. The short version: 84% of IT leaders cite attestation validation as their top adoption challenge.

The performance overhead was non-trivial in early implementations. SGX had significant costs from enclave transitions and limited memory. Confidential VMs with SEV-SNP and TDX reduced this to single-digit percentage overhead for most workloads, but the perception of “secure means slow” persisted.

The developer experience was poor. SGX required application partitioning and a specialized SDK. Confidential VMs improved this by running unmodified applications, but attestation integration, key management, and secret provisioning still required specialized knowledge. As of early 2026, deploying a confidential workload still requires expertise that most teams do not have.

The vulnerability narrative undermined confidence. The side-channel attacks against SGX were not random bugs. They were a predictable consequence of deploying a single-tenant design in a multi-tenant environment. Each new attack generated press coverage and reinforced the perception that the technology could not deliver. Security teams found a long list of CVEs, academic attacks, and “known limitations” that made the risk-benefit calculus uncertain.

And without AI, the use cases were niche. DRM, financial services MPC, healthcare analytics, sovereign cloud compliance. Real markets, but not mass markets. Not enough volume to drive the ecosystem maturity needed for broad adoption.

The Vulnerability Record

The side-channel attacks did not stop with SGX’s partial deprecation. They followed the technology into the cloud.

Intel TDX still shares microarchitectural resources with the hypervisor. TDXdown demonstrated single-stepping and instruction counting against TDX trust domains. PortPrint showed that CPU port contention reveals distinctive execution signatures across SGX, TDX, and SEV alike, and because it exploits instruction-level parallelism rather than thread-level parallelism, disabling SMT does not help.

The attack that most directly undermines the “Private AI” narrative is TDXRay (IEEE S&P 2026, UC San Diego, CISPA, Google). TDXRay produces cache-line-granular memory access traces of unmodified, encrypted TDX VMs. The researchers reconstructed user prompts word-for-word from a confidential LLM inference session. No cryptography was broken. The attack works because standard LLM tokenizers traverse a hash map to find token IDs, and that traversal creates a memory access pattern observable at 64-byte cache-line resolution. The host watches which hash map nodes the tokenizer visits and stitches the prompt back together. The encryption protects the data in memory. The computation pattern leaks it through the cache.

TEE.Fail (ACM CCS 2025) is the most dramatic recent finding. Researchers built a $1,000 physical interposer that monitors the DDR5 memory bus and extracted ECDSA attestation keys from Intel’s Provisioning Certification Enclave, the keys that underpin the entire SGX and TDX attestation chain. Attestation can be forged. The attack requires physical access, which limits applicability. But cloud providers have physical access to every server they operate.

On March 31, 2026, Mark Ermolov announced the extraction of the SGX Global Wrapping Key from Intel Gemini Lake. This is not a side-channel leak. It is extraction of the root cryptographic key that protects SGX sealing operations. The key wraps Fuse Key 0, which means the entire key hierarchy rooted in hardware fuses is compromised for that platform generation. No microcode update can change fuses. Ermolov’s assessment: “its fundamental break means that the HW Root of Trust approach is not unshakable.”

Gemini Lake is a low-power consumer chip, not a Xeon server processor. The same attack has not been demonstrated on current server-class implementations. But the research trajectory is clear. Each generation of hardware trust primitives has been broken by the next generation of hardware security research.

Why the Pattern Persists: Five Broken Design Assumptions

The vulnerability record is not a collection of unrelated bugs. It is the predictable result of specific design assumptions that held in the original use cases but fail in the cloud and AI contexts where the technology is now deployed.

The attacker does not share physical hardware with the victim. SGX was designed for a desktop where one user runs one workload. In the cloud, co-tenants share CPU cores, caches, branch predictors, TLBs, execution ports, memory controllers, and power delivery. CacheWarp, StackWarp, and TDXRay all exploit resources that remain shared because complete resource partitioning would make the hardware unusable for general-purpose computing.

The platform owner is not the adversary. TPMs and early SGX assumed the platform owner was the user or a trusted IT department. In the cloud, the provider controls the hypervisor, firmware, BMC, physical facility, and scheduling. The interfaces between the TEE and the provider-controlled environment become the attack surface. WeSee, Heckler, and SEVered exploit these interfaces. TEE.Fail exploits the provider’s physical access to the memory bus.

The hardware root of trust is immutable. The attestation model depends on root keys being beyond the reach of software attacks. This assumption has been violated repeatedly. Ermolov reached fuse-based keys through microcode. Google’s CVE-2024-56161 found an insecure hash in AMD’s microcode signature validation. Sinkclose provided universal Ring-2 escalation on AMD CPUs back to 2006.

Attestation verification is someone else’s problem. The specifications define how to produce attestation evidence but not how to verify it at scale. In the desktop DRM case, one binary produced one hash. In the cloud, PCR values are combinatorial across firmware, bootloader, kernel, and boot configuration.

Performance and security tradeoffs are invisible. On a desktop running DRM playback, a 5% performance hit is imperceptible. On a cloud server running AI inference at scale, every percentage point is cost. Disabling SMT, applying Downfall mitigations, and enabling inline encryption all have measurable overhead. Organizations are pressured to disable countermeasures for performance, reopening the attack surface.

These assumptions compound. The attacker shares hardware with a platform owner who is the adversary, exploiting a hardware root of trust that has a shelf life, verified through attestation infrastructure that does not exist at scale, with mitigations that carry performance costs the deployment context cannot absorb. No single patch addresses the compound effect. The assumptions are architectural, not implementational, which is why the vulnerability catalog grows despite continuous investment in mitigations.

The full root cause analysis with specific attack mappings for each assumption is in the companion TEE Vulnerability Taxonomy.

AI Changes the Calculus

All of the problems described above are real and unresolved. None of them are stopping adoption, because AI changed the calculus.

Model weights represent billions of dollars in training investment. A leaked foundation model is a competitive catastrophe. Running inference on shared cloud infrastructure means trusting the cloud provider not to inspect memory, which is the exact problem TEEs solve.

Training data includes regulated information across healthcare, financial services, and government. The EU AI Act, DORA, CCPA, and evolving federal privacy frameworks create compliance pressure that confidential computing directly addresses.

Multi-party AI scenarios (federated learning, collaborative training, secure inference on third-party data) require environments where no single party sees the complete dataset. TEEs provide the isolation boundary. This is why every major hyperscaler is building on confidential computing despite its known limitations.

But AI workloads amplify every weakness. GPU TEEs are new and their attestation models are immature. The attestation chain now spans CPU TEE, GPU TEE, and potentially TPM, each with different measurement schemes. AI workloads run on heterogeneous infrastructure across multiple cloud providers. And AI workloads are the most valuable targets for the attacks TEEs are vulnerable to. An attacker who extracts model weights via a side channel gets a multi-billion-dollar asset.

The market treats the different TEE designs (SGX, SEV, TDX, Nitro, NVIDIA confidential GPU) as interchangeable. They are not. Each has different properties and different security guarantees. Pretending otherwise is how organizations end up deploying against a threat model their chosen TEE was not designed to address.

The Trust Model Gap

The deeper issue is the gap between what is marketed and what is engineered.

Confidential computing marketing says “even the infrastructure provider cannot access your data.”

The engineering reality is different. The infrastructure provider cannot access your data through the software stack, but the hardware has known side-channel leakages that a sufficiently motivated attacker with privileged access can exploit. The attestation infrastructure that proves the TEE is genuine has structural limitations that make verification at scale dependent on each organization building its own reference value databases. And the hardware root of trust that anchors the entire system has a demonstrated shelf life.

This is a reasonable tradeoff for many threat models. Most organizations are defending against curious administrators, software-level compromise, and regulatory compliance requirements. Side-channel attacks require significant expertise and often physical access. But the market does not present it as a tradeoff.

What Needs to Happen

Closing the gap between the market narrative and the engineering reality requires work that is less exciting than launching new AI services.

Firmware and OS vendors need to publish reference measurements. The standards exist. CoRIM provides the format. RFC 9683 provides the framework. What is missing is the operational commitment to publish signed measurement values for every release. I wrote about the infrastructure that would need to exist and why none of it does yet.

The industry needs honest threat modeling that acknowledges what TEEs protect against and what they do not. TEE.Fail requires physical access, but cloud providers have physical access to every server. TDXdown requires a malicious hypervisor, which is precisely the threat TDX is designed to defend against. These are not edge cases. They are the threat model.

Attestation verification needs to become a commodity. Organizations should not need to build their own reference value databases, write their own event log parsers, and maintain their own golden image registries. This infrastructure should be as standardized and available as Certificate Transparency logs are for the web PKI.

And the security research community’s findings need to be incorporated into the market narrative rather than treated as exceptions. The pattern of continuous vulnerability discovery and mitigation is the normal state of the technology, not an aberration.

Confidential computing is directionally correct. The ability to verify what code is running on hardware you do not control, rather than simply trusting the operator, is a fundamental improvement in how we build systems. Signal proved the model works. The challenge is closing the gap between that promise and the current engineering reality.

The organizations deploying confidential computing for AI workloads today should understand what they are buying. Against the threats they are most likely to face, curious administrators, software-level compromise, regulatory compliance gaps, and unauthorized data access by the infrastructure operator, confidential computing is a significant improvement. Against a well-resourced attacker with physical access to the hardware, side-channel expertise, or the ability to exploit a hardware root-of-trust vulnerability, it is a partial mitigation, not an absolute guarantee.

That is a defensible position. It is just not the one being marketed.


For practical guidance on deployment, see Confidential Computing: What It Is, What It Isn’t, and How to Think About It.

For the full vulnerability catalog and root cause framework, see the TEE Vulnerability Taxonomy and TPM Attestation and PCR Verification .

Previously: TPMs, TEEs, and Everything In Between (March 2025). See also: Why Nobody Can Verify What Booted Your Server.

What Is Confidential Computing, What It Isn’t, and How to Think About It

Confidential computing is the most important security technology that most organizations deploying it do not fully understand.

Last March, I wrote about the terminology confusion in security hardware — how terms like TEE, TPM, secure enclave, and confidential computing get used interchangeably in ways that obscure what these technologies actually do. The accompanying technical reference laid out the foundational concepts and the ways these technologies fail.

A year later, confidential computing is no longer a niche technology. AI has made it urgent. When you run inference on a model worth hundreds of millions in training compute, on hardware you don’t own, in a data center you’ve never visited, the question of what the infrastructure operator can see becomes a business-critical concern. Confidential computing is the industry’s answer.

It is also a technology whose security properties are routinely overstated by the vendors selling it and the cloud providers deploying it. Marketing language like “even the infrastructure provider cannot access your data” appears in product pages from every major hyperscaler. The engineering reality is more constrained than that, and the gap between the marketing and the engineering is where organizations get hurt. None of that means you shouldn’t use it. I use it extensively. It means you need to understand what it actually gives you so you can build architectures that account for what it doesn’t.

What Confidential Computing Actually Does

Confidential computing protects data while it is being processed. Traditional encryption covers data at rest and data in transit. Confidential computing addresses the third state: data in use, the window when data must be decrypted for computation and is therefore exposed in memory.

The mechanism is hardware-based isolation. The CPU (or GPU, in newer implementations) creates an environment where code and data are encrypted in memory and protected from all other software on the system, including the operating system and hypervisor. The cloud provider’s administrators cannot read your data even though it is running on their hardware.

The technology comes in several forms. AMD SEV-SNP and Intel TDX protect entire virtual machines. AWS Nitro Enclaves provide isolated execution environments on Amazon’s custom hardware. NVIDIA’s H100 and Blackwell GPUs add hardware-encrypted GPU memory with GPU-specific attestation. Apple’s Secure Enclave protects biometric data and cryptographic keys on a physically separate processor. The implementations differ significantly, but they share a common principle: hardware-enforced boundaries that the software stack cannot cross.

What It Does Not Do

Confidential computing does not make your workload invulnerable. It changes the threat model. Understanding what it does not protect against matters as much as understanding what it does.

Side-channel attacks remain viable. The CPU still shares caches, branch predictors, execution ports, and power delivery with other workloads. Researchers have demonstrated attacks that extract data from inside TEEs without breaking any cryptography. The TDXRay attack (IEEE S&P 2026) reconstructed user prompts word-for-word from an encrypted Intel TDX VM by watching which cache lines the LLM tokenizer accessed. The data was encrypted in memory. The computation pattern leaked it through the cache.

Physical access defeats memory encryption. The TEE.Fail attack (ACM CCS 2025) used a $1,000 device soldered to the DDR5 memory bus to extract attestation keys from Intel TDX and AMD SEV-SNP. Cloud providers have physical access to every server they operate. That is the threat model confidential computing claims to address.

Attestation depends on hardware roots of trust that have a shelf life. Attestation is how a TEE proves to a remote party that it is running expected code on genuine hardware. The proof depends on cryptographic keys embedded in the processor. Those keys have been extracted. The March 2026 extraction of the SGX Global Wrapping Key from Intel Gemini Lake reached root keys burned into hardware fuses. Google’s discovery of an insecure hash in AMD’s microcode signature validation (CVE-2024-56161) allowed loading malicious microcode that could subvert SEV-SNP. When root keys are compromised, attestation can be forged.

Attestation verification infrastructure barely exists. Even if the TEE hardware is sound, verifying attestation at scale requires knowing what the correct measurements should be. For TPM-based attestation, this means maintaining reference values for every combination of firmware, bootloader, kernel, and boot configuration across a heterogeneous fleet. That infrastructure largely does not exist. An IDC survey found that 84% of IT leaders cite attestation validation as their top adoption challenge.

The Vulnerability Record in Context

The security research community has published over 50 distinct attacks against TEE platforms since 2017. The companion TEE Vulnerability Taxonomy catalogs these in detail.

The number is large. It does not mean confidential computing is broken. It means the technology has been subjected to intense scrutiny by some of the best hardware security researchers in the world, and they have found weaknesses. The question is whether the risk after deploying the technology is lower than the risk without it.

For most deployments, the answer is yes. Confidential computing raises the bar significantly. An attacker who could previously read VM memory through a compromised hypervisor now needs a side-channel attack, a physical interposition device, or a root-of-trust compromise. Each of these is substantially harder than the baseline attack.

The vulnerability record matters most when the attacker is well-resourced and has privileged access to the infrastructure — which is exactly the cloud provider threat model that confidential computing is designed to address. The threat model it targets is the one where its limitations are most relevant. That tension is real, and pretending it does not exist does not help anyone making deployment decisions.

The Gartner prediction that 75% of processing in untrusted infrastructure will use confidential computing by 2029 assumes a maturity the technology has not achieved. Treating a bounded isolation primitive as a general trust solution is how organizations end up surprised.

How to Think About Deployment

Confidential computing is one layer in a defense-in-depth architecture. It is not a substitute for the other layers. I have deployed confidential computing in production and these are the principles I have found matter most.

Use it, but don’t rely on it alone. Encrypt data at rest and in transit independently of the TEE. Use application-level encryption for the most sensitive data so that even a TEE compromise does not expose plaintext. The TEE is a defense-in-depth layer, not your sole protection.

Verify attestation, and understand what verification actually proves. A TPM quote or attestation report proves the state of the machine at the time the quote was generated. It does not prove the machine is still in that state five minutes later. It does not prove the machine’s physical location or who has physical access. Build your verification flow with these limitations in mind.

Know your TEE’s specific threat model. AMD SEV-SNP, Intel TDX, AWS Nitro Enclaves, and NVIDIA GPU CC have different architectures, different shared resource boundaries, and different attestation mechanisms. They are not interchangeable. A TDX trust domain sharing microarchitectural state with a hypervisor has a different side-channel surface than a Nitro Enclave running on a purpose-built hypervisor with dedicated resources.

Plan for the hardware root of trust to eventually fail. The research trajectory is clear: each generation of hardware trust primitives has been broken by the next generation of hardware security research. Build your key management and secret rotation so that a root-of-trust compromise on one platform generation does not expose secrets that have already been rotated.

Ask whether your workload actually needs multi-tenant cloud TEEs. For some use cases, a physically discrete device — an HSM, a USB Armory, a Nitro Enclave with dedicated resources — provides stronger isolation than a confidential VM sharing silicon with co-tenants. The multi-tenancy problem is where most of the vulnerability surface lives. If your workload does not require multi-tenant shared infrastructure, you can sidestep the largest attack class entirely.

What a Practical Architecture Looks Like

Consider a Certificate Authority that runs its signing operations inside AWS Nitro Enclaves. The enclave has no persistent storage, no network access, and no access from the host instance. The signing key never leaves the enclave. Attestation is verified through the Nitro Attestation PKI, which produces deterministic measurements of the enclave image.

Nitro Enclaves were chosen because their architecture sidesteps the shared-resource side-channel problems that affect SGX and TDX. The Nitro hypervisor is purpose-built and minimal. The enclave gets dedicated resources. The measurement model is clean: one image, one deterministic measurement, no combinatorial PCR explosion.

But the enclave is not the only security layer. The signing keys are backed by a hardware root of trust. The enclave image is built from reproducible builds so the expected measurements are verifiable from source. Access to the host instance is controlled through IAM policies that are themselves audited. The architecture is designed so that compromising any single layer does not compromise the signing keys.

Use confidential computing as a meaningful security improvement, understand its specific limitations, and build the rest of your architecture so that the limitations do not become single points of failure.

Where This Is Heading

Confidential computing is not going away. The economic pressure to deploy AI workloads on shared infrastructure guarantees continued investment. NVIDIA’s Blackwell architecture extends confidential GPU support. ARM CCA adds Realm World isolation. The Confidential Computing Consortium continues to drive standardization.

The technology will improve. Side-channel mitigations will get better. Attestation infrastructure will mature — the IETF RATS standards are ready, and what is missing is vendor adoption of publishing reference values. Performance overhead will continue to decrease.

But the fundamental constraints — shared microarchitectural resources, physically accessible memory buses, the shelf life of hardware roots of trust — are properties of how CPUs and memory work. They will not be eliminated by the next generation of silicon. They will be reduced, mitigated, and worked around.

Confidential computing is a significant improvement in security posture. It is not an absolute guarantee. That is a defensible position to sell. It is just not the one being sold.


For the deeper analysis of why the vulnerability record looks the way it does, see Confidential Computing’s Inconvenient Truth.

For the full vulnerability catalog, attestation gap analysis, and root cause framework, see the TEE Vulnerability Taxonomy and TPM Attestation and PCR Verification .

Previously: TPMs, TEEs, and Everything In Between: What You Actually Need to Know (March 2025). See also: Why Nobody Can Verify What Booted Your Server.

Why Nobody Can Verify What Booted Your Server

There is no public database of known-good TPM measurements. There never has been.

The Trusted Platform Module, a security chip that measures and attests to system integrity, has been a standard for twenty years. TPMs ship in virtually every enterprise laptop and server. Software-emulated versions are provisioned for every cloud VM on Azure, GCP, and AWS. Measured boot is a checkbox in every compliance framework that touches system integrity. The hardware that produces platform measurements is everywhere. The infrastructure to verify those measurements is not.

If you have deployed measured boot at scale, you have hit this wall. I have, more than once. If you haven’t yet, you will.

I wrote about the foundational concepts behind these technologies last year, covering how TPMs, TEEs, HSMs, and secure enclaves differ and where they fail. This post goes deeper on one specific problem that anyone deploying measured boot or confidential VMs hits immediately: the verification gap for PCR values.

What PCRs Are and Why They Exist

A TPM contains a set of Platform Configuration Registers, special-purpose storage locations that record the boot chain as a sequence of cryptographic measurements. Each boot stage measures the next before handing off execution. The measurements are extended into PCRs using a one-way hash chain: the old value is concatenated with the new measurement and hashed to produce the new value. This is irreversible. Given a final PCR value, you cannot determine the individual measurements without replaying the full sequence.

A TPM quote is a signed snapshot of these PCR values, which lets a remote verifier assess what software actually booted on the machine. This is remote attestation, and it answers a question no operating system can answer about itself: did this machine boot what it was supposed to boot?

This works fine for a single machine. The problem is fleets.

Why There Is No PCR Registry

You would think someone would have built a public database of known-good PCR values by now, something like CCADB for certificate trust or VirusTotal for malware hashes. Nobody has, and it is not because nobody thought of it. The reasons are structural.

PCR values are combinatorial. A single PCR accumulates measurements from multiple software components. PCR 0 reflects the firmware version, CPU microcode patches, and the UEFI configuration that controls early boot behavior. PCR 4 reflects the bootloader and the shim that validates Secure Boot signatures. On modern Linux distributions using Unified Kernel Images, which bundle the kernel and initial RAM disk into a single signed binary, measurements fragment across PCRs 8, 9, 11, and 12 depending on the distribution and boot configuration. This is messier than the traditional GRUB boot path, and it was already messy.

Any component update produces a completely different PCR value for the affected register. A fleet with 3 firmware versions, 2 bootloaders, 4 kernels, and 3 initrd configurations has 72 valid PCR value combinations for a single hardware model. Five hardware models is 360. Add boot parameters and the number becomes effectively unbounded.

Measurement ordering matters. The hash chain is order-dependent. Extending measurement A then B produces a different result than B then A. Boot is not fully deterministic. Driver initialization order, ACPI table enumeration, and peripheral probe sequences can vary between boots of identical software on identical hardware. The TCG’s own specification acknowledges this directly: operating system boot code is “usually non-deterministic, meaning that there may never be a single ‘known good’ PCR value.”

Firmware measurements are opaque. The UEFI event log is the detailed record behind those PCR values, and in practice it is often more useful than the final values themselves. But the event data for firmware blobs is often just a physical memory address and size. No indication of format or purpose. Intel Boot Guard measurements use methods that are under NDA. Dell extends proprietary configuration data into PCR 6 in undocumented formats. A verifier cannot independently reconstruct many of these measurements without vendor-specific knowledge that is not publicly available.

Nobody is obligated to publish reference values. The standards for publishing expected measurements exist. The TCG Reference Integrity Manifest specification defines the formats. The IETF RATS working group developed CoRIM, a compact machine-readable format for publishing reference measurements. RFC 9683, which covers remote integrity verification of network devices containing TPMs, specifies that software suppliers MUST make reference values available as signed tags. The standards are there. Manufacturers are not obligated to follow through, and most do not.

What Everyone Actually Does Instead

PCR value matching fails at scale, so the industry has quietly converged on something else: event log verification.

The TPM does not just produce final PCR values. It also maintains an event log, a sequential record of every individual measurement extended into each PCR during boot. Each entry contains the PCR index, the hash of what was measured, and a description of the event — “loaded bootloader from partition 1” or “Secure Boot certificate db contained these entries.”

The event log is what makes attestation workable in practice. The verifier replays the log by re-computing the hash chain from the individual entries. If the replayed chain produces the same PCR values that the TPM signed in its quote, the log has not been tampered with. The events it describes are the actual events that produced those values. The verifier then evaluates individual events against a policy: is this firmware version on the approved list? Is Secure Boot enabled? Is the kernel signed by a trusted key? Was anything unexpected loaded?

This is more flexible than PCR matching. A firmware update changes one event in the log, not the entire composite hash, so the policy absorbs the change without requiring new reference values.

But event log verification has its own problems. Event data is often insufficient for independent verification. Vendor-specific formats are undocumented. Event types and descriptions are not part of the hash, so they can be manipulated without affecting the signed PCR value. Intel’s CSME subsystem extends measurements that verifiers cannot evaluate without access to Intel’s proprietary documentation.

Keylime, the most mature open-source attestation framework, says it plainly: direct PCR value matching is “only useful when the boot chain does not change often.” Intel Trust Authority, Google Cloud Attestation, and Azure Attestation all verify event log properties rather than matching literal PCR values.

So every organization deploying TPM attestation at scale ends up building their own reference values by capturing measurements from known-good environments. The “registry” is whatever you build from your own golden images. This is not a sustainable state of affairs, but it is the state of affairs.

vTPMs Add Another Layer

Virtual TPMs make the verification problem worse. A physical TPM’s trust comes from being a discrete chip with its own silicon. A vTPM is software running inside the hypervisor or a confidential VM. Cloud providers adopted vTPMs because provisioning physical TPMs per VM is impractical at cloud scale.

The vTPM’s trust root is the software and hardware stack that hosts it. If the hypervisor is compromised, the vTPM is compromised. If the CPU’s hardware isolation (the TEE that protects the confidential VM) has a side-channel vulnerability, the vTPM’s keys are exposed through that side channel. Verifying vTPM evidence requires also verifying the TEE evidence, because the trust chains through.

Each layer’s trust depends on the layer below, and the bottom layer has a demonstrated shelf life. The March 2026 extraction of the SGX Global Wrapping Key from Intel Gemini Lake and Google’s discovery of an insecure hash in AMD’s microcode signature validation (CVE-2024-56161) are the latest demonstrations that hardware roots of trust are not permanent.

A Practical Approach

The reference value infrastructure does not exist. So what do you actually do?

Pick the verification approach that matches what your deployment can support, and accept the tradeoff. I have listed these from strongest assurance to weakest, which is also from highest operational cost to lowest.

Exact PCR match compares values against a fixed allowlist. Strongest when reference values are correct. Breaks on any component update. Only practical for enclave-style deployments like AWS Nitro Enclaves or Intel SGX, where one image produces one deterministic measurement. If you control the entire image and the measurement is deterministic, this is the easy case.

Event log policy replays the event log and evaluates individual events against policy. Flexible to component updates. Requires an event log parser and per-vendor knowledge of event formats.

Signed baseline accepts any PCR values covered by a signature from a trusted key. The signing key becomes the trust anchor rather than a registry of literal values. When software updates change PCR values, the security team signs a new baseline. This is the PolicyAuthorize pattern that System Transparency documents and pcr-oracle supports: seal secrets to a signing key rather than to specific PCR values, so that software updates do not lock you out of your own data.

Node identity only verifies the TPM’s Endorsement Key identity without PCR verification. Proves hardware identity, not software state. Weakest assurance, lowest operational cost.

Most real-world deployments will use different approaches for different parts of their architecture. Exact match for the most sensitive operations. Event log policy for managed servers. Signed baselines for fleet environments where the security team controls the update cycle. The right answer is almost never one approach for everything.

What Would Need to Exist, and Why It Matters

The gap between what TPM attestation promises and what it delivers at scale comes down to five missing pieces of infrastructure. None of them are technically novel. All of them require cross-vendor coordination, which is the hard part.

Firmware vendors publishing signed reference measurements for every release. If Dell, HP, Lenovo, Supermicro, and Intel published signed CoRIM measurement bundles alongside firmware updates, verifiers could check boot measurements against vendor-provided values instead of building golden image databases. The thousands of organizations currently maintaining their own reference values stop doing that redundant, error-prone work. A firmware update becomes verifiable by any attestation service, not just by organizations that happened to capture the right measurements before deploying. This is the single highest-impact change.

OS vendors publishing signed reference measurements for kernels, bootloaders, and initrd images. Red Hat, Canonical, and SUSE would publish expected measurement values for each package version. The cost of operating measured boot drops from “dedicated team” to “configuration.”

A transparency log for reference measurements. Analogous to Certificate Transparency for the web PKI. Reference value providers submit signed measurements to a log. Verifiers check the log. Monitors detect inconsistencies. The incentive structure shifts from “trust the vendor” to “verify the vendor,” which is the entire point of attestation in the first place.

This is not hypothetical. I worked on firmware transparency at Google, including work with Andrea Barisani to integrate it into the Armored Witness, a tamper-evident signing device built on TamaGo and the USB Armory platform. Google publishes a transparency log for Pixel factory images. The broader Binary Transparency framework has production deployments across Go modules, sigstore, and firmware update pipelines. Researchers are extending the approach to server firmware signing. The pattern works. What is missing is adoption by the server firmware vendors whose measurements actually need verifying.

Cross-vendor event log normalization. A library that translates vendor-specific event log formats into a common representation, abstracting away the differences between Dell, HP, Lenovo, and Intel firmware event structures.

Attestation verification as a commodity service. Not vendor-specific, not requiring deep expertise, but as simple as an OCSP responder for certificate revocation: send a TPM quote and event log, get back a signed attestation result.

None of these exist at scale as of April 2026. The standards are ready. The hardware is deployed. The market is adopting confidential computing at a pace that assumes this infrastructure is coming. It is not here yet.

None of this fixes the side-channel vulnerabilities in the TEE hardware itself. None of it extends the shelf life of hardware roots of trust. Those are silicon problems that require silicon solutions. But the attestation infrastructure gap is not a silicon problem. It is a coordination and incentive problem, and those are solvable.

The web PKI went through a similar transition, and I watched it happen from the inside. Certificate mis-issuance was undetectable until Certificate Transparency made it visible. Certificate authorities operated without enforceable standards until the CA/Browser Forum Baseline Requirements created them. There was no shared database of trusted roots until CCADB built one. Each of those required cross-vendor coordination that looked unlikely right up until it shipped. The result is an ecosystem that is not perfect but is dramatically more trustworthy than it was fifteen years ago.

The attestation infrastructure could follow the same path. The standards work is done. What remains is the operational commitment from the vendors who manufacture the hardware and the organizations that rely on it.

Every organization deploying measured boot today is independently solving the same problem with their own golden images, their own event log parsers, and their own reference value databases. I have built some of these myself. The standards are ready, the hardware is deployed, and the economic incentive is growing. What is missing is the willingness to coordinate. That is a solvable problem.


This post is the first in a series on confidential computing. The next two posts, What Is Confidential Computing, What It Isn’t, and How to Think About It, and Confidential Computing’s Inconvenient Truth. Two companion reference documents provide the full evidence base: the TEE Vulnerability Taxonomy and TPM Attestation and PCR Verification

Previously: TPMs, TEEs, and Everything In Between: What You Actually Need to Know (March 2025)

We Built It With Slide Rules. Then We Forgot How.

My father grew up on a subsistence farm, the kind that raised chickens and grew just enough to get by. Farmers were the original hackers. You couldn’t wait for the right tool or the right expert. You fixed what was broken with what you had, because the alternative was worse.

As a kid he taught himself rocket chemistry. Not from a kit. From whatever he could source locally. He was trying to make things burn hotter and fly farther, adjusting mixtures through trial and error long before he had words like specific impulse or oxidizer ratio for what he was doing.

The materials weren’t exotic. Potassium nitrate sold as stump remover. Sulfur and charcoal. Mix them correctly and you have black powder, the same oxidizer-fuel logic underlying every solid rocket motor ever built. More ambitious builders used potassium perchlorate from chemical suppliers, mixed with aluminum powder or sugar to control burn rate and energy density. All of it over the counter. All of it accessible to someone willing to read carefully and try things until they worked.

He wasn’t following a plan. He was just that kind of person.

Most people have forgotten that the Air Force had its own space program before NASA existed. NASA was carved out of NACA in 1958, but the Air Force had been running parallel efforts since the mid-1950s. That generation had grown up on science fiction and wanted to see it happen. When Sputnik launched in October 1957 the country went into a low-grade panic about whether it understood physics well enough to survive, and suddenly the kids who had been dreaming about space since they could read had somewhere to go with it. What followed was one of the rare moments in American history when technical aptitude was a genuine class elevator. The government needed people who understood this stuff badly enough to find them wherever they were.

He enlisted in his early twenties, aerospace degree in hand. The Air Force space program was what he was aiming at. He ended up working on attitude control thrusters for reconnaissance satellites, the kind that could resolve fine surface detail on Earth from hundreds of miles up. For that mission attitude control wasn’t a secondary problem. It was the central one. A camera that can’t hold still is useless. The thrusters are what made the intelligence possible. The underlying engineering was the same problem he had been teaching himself: oxidizer, fuel, combustion geometry, now controlled to tolerances that left no margin.

I remember him watching a satellite reenter on the cable news when I was young. I don’t know which one or exactly what year. What I remember is that he cried. He told me later there was a plate on that satellite with his name engraved on it. Work he had done, hardware he had touched, in orbit for years and now gone. Grief with no adequate audience, because the context was secret and the people who would have understood were scattered across programs that didn’t officially exist.

Years later my father was excited watching Iridium launch, Motorola’s commercial satellite constellation, first launches 1997. The same fundamental technology, now accessible to anyone with a phone. His generation had figured out how to do this, quietly, under classification, and here it finally was in the open. The knowledge had propagated. Just not through the channels that were supposed to carry it.

He kept a green chalkboard in the garage. He would pull out his slide rule and work through things with me. Orbital decay, thrust, specific impulse, delta-v, the rocket equation and why it makes everything harder than it looks. He had a worry he came back to often – society had forgotten how to go to the moon. The knowledge existed in aging engineers and partially classified documents and it was not being transmitted. The chalkboard was what he could do about that.

Last year Destin Sandlin, an aerospace engineer who describes himself as a redneck from Alabama, walked into a room full of the most senior people in American space policy and did something worth an hour of your time to watch. He asked questions that people inside the institutional food chain had stopped asking. Starting with the most basic one: how many rockets does it take to fuel the Artemis lunar lander?

The room went quiet. Nervous laughter. EPublic estimates have varied, but all point to a strikingly high number of launches and on-orbit refueling operations before a landing attempt depending on assumptions about boil-off and reuse, and nobody in the room had a confident answer.

These are not uninformed people. A core operational parameter of their own mission architecture was not common knowledge among the people running it.

Then Destin asked the room a simpler question.

“Is this the simplest solution?”

Silence.

Destin pointed them at NASA SP-287, a document the Apollo engineers wrote and left behind specifically so the next generation wouldn’t have to rediscover everything from scratch. The title is “What Made Apollo a Success.” It has been sitting there, public, for decades. Most of the people in that room had not read it.

The principle at the center of that document is blunt:

“Build it simple and then double up on as many components or systems so that if one fails, the other will take over.”

Simple first. Then redundant. Not complex and hoping.

Simple isn’t just aesthetic preference. Simple is how you keep the system inside your head. Simple is how you build procedures all the way down to bolt cutters and still know what comes next. When a system gets complex enough that a room full of its leaders can’t answer a basic operational question about it, it has exceeded the boundary of what they actually understand. They are renting the complexity along with the capability.

The Apollo engineers meant it literally. When designing the ascent stage separation, the mechanism that gets astronauts off the lunar surface, they didn’t stop at one solution or two. They built redundancy on top of redundancy. Flip the switch. If that fails, go outside and trip the manual release. If that fails, depressurize, suit up, go to the bottom of the spacecraft with bolt cutters, and cut the straps holding the stages together. Harrison Schmitt said there was one more procedure after the bolt cutters. Nobody would say what it was.

That’s not genius. That’s a chicken farmer’s epistemology applied to the hardest engineering problem humans had ever attempted. You don’t wait for perfect conditions or perfect knowledge. You start simple, you build every fallback you can think of, and then you think of one more.

Destin argues that Artemis didn’t follow that logic. The NRHO/Gateway architecture was publicly justified in part on communications, surface access, stability, and operational grounds, but Destin argues that it also reflects deeper architectural constraints that accumulated into a more complex solution. Destin’s read, and he makes a detailed case for it, is that it’s an architectural constraint dressed up as a design choice, complexity that accumulated because the real constraints couldn’t be named publicly. A room full of program leaders who couldn’t tell you the basic parameters of the system they were running.

That’s what happens when you lose the thread.

Destin also interviewed an engineer who had worked on the lunar landing training vehicle, the machine that taught Apollo astronauts to land in one-sixth gravity by actually putting them in a vehicle where their life depended on getting it right. Destin asked whether the Apollo engineers were smarter than engineers today. The answer was no. What they had wasn’t superior intelligence. It was a bias toward doing, toward simplicity, toward keeping the system inside human heads rather than delegating it to complexity they couldn’t fully reason about.

NASA SP-287 exists because those engineers understood something important. Capability doesn’t survive on its own. Knowledge doesn’t transmit automatically. You have to codify it deliberately or it dies with the people who held it. It is ownership made explicit. Here is what we understood. Here is why it worked. Here is the playbook so the next generation doesn’t have to rediscover it at the cost of lives.

The space race created a machine for turning hands-on knowledge into national capability. It found people like my father wherever they were because it needed what they had already taught themselves. It was the on-ramp, the forcing function that pulled curiosity into programs that mattered and gave it somewhere to go. That same forcing function generated SP-287, the discipline to write it down, the institutional pressure to transmit it. When the race ended the machine stopped. The on-ramp closed. The knowledge didn’t vanish immediately. It aged out, program by program, engineer by engineer, panel by panel. What remained was credentials and institutional memory of having once known how, which is a different thing entirely from knowing how.

We took that gift and built a lunar return architecture that, at least in its public form, often looks more operationally intricate than the Apollo playbook would have preferred. More complex architecture. Estimates ranging from eight to fifteen or more rockets just to fuel the lander. A room full of its leaders who hadn’t read the playbook.

“Is this the simplest solution?”

Silence.

That’s not an aerospace problem. That’s the pattern. The knowledge transmission problem is older than aerospace. I’ve been writing about it in other contexts for a while, starting here.

My father spent my childhood pointing at this from a chalkboard in a garage. I didn’t become an astronaut. That was his hope, not my path. The chalkboard worked anyway. The knowledge moved. The Iridium launches proved it. The knowledge his generation developed under classification eventually became infrastructure anyone could hold in their pocket. You can’t fully control where it lands. You can only decide whether to try.

Now AI is doing to software what the end of the space race did to aerospace. It is consuming the early career tasks that used to serve as scaffolding for building judgment. The debugging, the boilerplate, the routine iteration that taught tradeoffs and edge cases before anyone trusted you with the hard problems. The visible work disappears first. The tacit knowledge becomes unreachable just as it becomes most important. The on-ramp closes. And at some point a room full of senior people goes quiet when someone asks a basic operational question, not because they’re uninformed, but because the complexity was delegated before the understanding had time to form.

That is the cautionary tale. Not that AI is bad. That capability outsourced before it is understood leaves you renting decisions you don’t control while keeping consequences you can’t transfer. The room goes quiet. And eventually nobody even thinks to ask whether this is the simplest solution.

My father saw it coming. That’s what the chalkboard was for.

The question isn’t whether you work in aerospace or software. It’s whether you’ve stopped asking basic questions about the system you’re running. Whether it has exceeded the boundary of what you actually understand. Whether you’re renting complexity along with capability and calling it progress.

You don’t wait for perfect knowledge. You read every playbook you can find. You build redundancy all the way down to bolt cutters. And then you think of one more thing.

The chemicals are still on the shelves. SP-287 is still public. The Destin talk is an hour of your time and worth every minute.

Read the playbook.

The WebPKI and Client Authentication Are at a Crossroads

The CA/Browser Forum is having its first serious conversation about whether publicly trusted client authentication certificates deserve their own Baseline Requirements. Nick France kicked off the discussion on the public list last week, asking for concrete use cases, and the responses so far have been a useful window into how the industry thinks about this problem. Or rather, how it doesn’t.

The timing isn’t accidental. Chrome Root Program Policy v1.6 is forcing a structural realignment of the WebPKI, and client authentication is caught in the middle. All PKI hierarchies in the Chrome Root Store must now be dedicated solely to TLS server authentication. Chrome stopped accepting new intermediate CA applications with mixed EKUs in June 2025, and by June 15, 2026, Chrome will distrust any newly issued leaf certificate containing clientAuth EKU from a Chrome Root Store hierarchy. Multi-purpose roots get phased out entirely. Mozilla, Apple, and Microsoft are all aligning with this direction. Every major public CA has published a sunset schedule. Sectigo stopped including clientAuth by default in September 2025, DigiCert followed in October, and Let’s Encrypt is phasing it out through ACME profiles. By mid-2026, you will not be able to get a publicly trusted TLS certificate that also works for client authentication.

This is the right call. The historical practice of stuffing both serverAuth and clientAuth into the same certificate, from the same hierarchy, created exactly the kind of entanglement that makes the WebPKI brittle. The SHA-1 migration is the canonical example. Payment terminals that relied on client auth from the same roots as server certs couldn’t upgrade, holding back the entire transition for years. Today, Cisco Expressway is the poster child for the same problem, using a single certificate for both server and client auth in SIP mTLS connections and scrambling to decouple them before the deadline. Dedicated hierarchies for dedicated purposes. It’s a principle the WebPKI should have enforced from the start.

What to do about it

What’s emerging is a clearer, more honest WebPKI, but one with a gap that nobody is cleanly addressing. If you’re currently relying on publicly trusted certificates for client authentication, the path forward depends on your use case.

If the client auth is internal to your organization, VPN access, Wi-Fi onboarding, device authentication, mTLS between your own services, you should be moving to private PKI. This was always the right answer for internal use cases, and modern private CA solutions have made it far more practical than it used to be. You get full control over certificate profiles, lifetimes, and revocation without being subject to external root program policy changes. The blast radius of a private CA is contained to your organization, which is exactly what you want for internal trust.

If the client auth is between your organization and a small number of known partners, like B2B API integrations or supply chain connections, private PKI still works well. You exchange trust anchors with your partners and configure your systems to trust their specific CA. This is how most of these integrations should have been built in the first place. The “convenience” of using publicly trusted certs for this was always a false economy, because you were accidentally opening your trust boundary to every entity that could buy a cert from the same CA.

But if the client auth needs to work across organizational boundaries at scale, meaning you can’t reasonably pre-configure trust anchors for every potential counterparty, this is where it gets interesting and where the current alternatives fall short. Private PKI doesn’t solve this. You need some form of shared trust anchor, which is what public PKI provides for server authentication today. The question is whether a similar model can work for client authentication with properly scoped identifiers and validation methods.

The human identity case is the relatively easy part

On the CA/B Forum list, Sebastian Nielsen argued that public CAs shouldn’t issue client auth certificates at all, pointing to the name collision problem. He makes a fair point, but the conclusion is too broad. I’m Ryan Hurst the security practitioner, and there’s also Ryan Hurst the actor (Remember the Titans, Sons of Anarchy). A public CA asserting “Ryan Hurst” in a DN doesn’t help a relying party figure out which one of us is authenticating. The DN is a vestige of the X.500 global directory that never materialized. There is no global directory. Even local directories that correspond to DN structures don’t exist in any meaningful density. Identity in the WebPKI belongs in the SAN, where we have identifiers that are both globally unique and reachable.

S/MIME already handles the human case correctly. The rfc822Name in the SAN is at least unique at the time of issuance. More importantly, it’s reachable. You can send a challenge to an email address and get a response. You can’t send a challenge to a social security number. You can’t send a challenge to “Ryan Hurst, US.” The broad intent of the WebPKI is to make things reachable in an authenticated way. DNS names and email addresses fit that model. DNs do not.

Even with email, there’s a temporal problem. Addresses get reassigned, domains lapse, providers recycle accounts, and throwaway addresses exist by design. CAs can’t monitor for reassignment, so these are inherently short-lived assertions. The certificate lifetime is the outer bound of your trust in that binding. Broader questions around PII and auditability are really about how Key Transparency can be bolted into the ecosystem. I wrote about that previously.

There is valuable work happening in this space. Ballot SMC015v2 enabling mDLs and EU digital identity wallets for S/MIME identity proofing shows this evolving in a meaningful direction. Client authentication and signed email under S/MIME belong together. Apple has argued that emailProtection EKU should mean mandatory S/MIME BR compliance, closing the loophole where CAs omit email addresses from emailProtection certificates to avoid the BRs. I think that’s the right direction. One nuance worth calling out though. S/MIME bundles signing, authentication, and encryption, and I think that’s right for the first two but not the third. Signing and authentication are real-time assertions that work well as short-lived credentials. Encryption is different. The key is bound to an identifier that may not be durable, and without frequent rotation you risk bygone-SSL style attacks where a new holder of an email address could access messages intended for the previous one. The encryption case deserves its own careful treatment around key lifecycle and rotation.

Browsers are actively looking to remove client auth from TLS certificates, and I don’t disagree given how poorly specified and unconstrained it has been. That signals whatever comes next needs to be much more tightly defined. The human client auth case is covered by S/MIME, browser-based client auth is on its way out for good reason, and a new working group doesn’t need to revisit the human case.

The machine identity gap

Where it gets interesting is cross-organizational service-to-service authentication on the public internet. Today this is mostly handled with API keys, OAuth client credentials, or IP allowlisting, all with well-known limitations. mTLS with publicly trusted client certs could fill a real gap, but only if the identity model is built correctly.

Many current uses of mTLS with publicly trusted client certs are misplaced. Organizations are often assuming a level of assurance they don’t actually get when they accidentally cross security domains by relying on the public WebPKI for what is fundamentally a private trust relationship. A publicly trusted cert for payments.example.com tells you that the entity controlling that domain authenticated, nothing more. It does not mean they are your trusted partner, your approved vendor, or anyone you intended to grant access to. Public trust gives you authenticated identity, not authorization. Organizations that conflate the two will accidentally open up access based solely on someone having obtained a client cert. The examples collected on the list so far, Cisco Expressway and EPP, are mostly legacy compatibility problems being fixed. A working group built on those foundations would produce weak Baseline Requirements.

The better foundation is the emerging need for authenticated service-to-service communication across organizational boundaries. Consider SMTP. Mail servers already authenticate to each other over the public internet using TLS, and MTA-STS is pushing that toward authenticated connections. The logical next step is mutual authentication, where the receiving mail server can cryptographically verify the sending server’s identity, not just the other direction. SMTP and mTLS go together like peanut butter and jelly, but there’s no clean way to do it with publicly trusted client certs today. Or consider vendor supply chains. If a manufacturer’s procurement system needs to query a supplier’s inventory API, or a logistics provider needs to authenticate to a retailer’s fulfillment service, the options today are API keys, OAuth flows, or standing up an industry-specific trust framework just so machines can talk to each other. mTLS with publicly trusted client certs would let these systems authenticate directly, without building bespoke trust infrastructure for every partnership.

And this need is accelerating beyond any single industry. As AI agents increasingly act as user agents on the open internet, calling APIs, negotiating with services, and transacting across organizational boundaries on behalf of users, mutual authentication between machines that have no pre-established trust relationship is becoming a practical necessity, not a theoretical concern. You can’t pre-configure trust anchors for every service an agent might need to interact with any more than you can pre-configure them for every website a browser might visit. I wrote about this dynamic previously, and the trajectory is clear. The machine-to-machine authentication problem on the open internet is starting to look a lot like the server authentication problem that the WebPKI was built to solve, just in both directions.

For machines, the name collision problem largely disappears. DNS names are globally unique by design. A client cert with a dNSName SAN of payments-api.example.com or registry-client.registrar.example.net doesn’t have an ambiguity problem. The relying party knows exactly what organization controls that name. Nick’s original question on the list asked about what parts of the DN the relying party verifies. I’d argue that’s almost the wrong framing. There is no global X.500 directory. The question should be, what SAN types are needed, and what validation methods can we define for them?

For straightforward service identification, dNSName works today with no new validation methods needed.

  • payments-api.example.com
  • erp-connector.supplier.example.net
  • registry-client.registrar.example.com

For more expressive service identification, uniformResourceIdentifier SANs encode not just the organization but the specific service.

  • https://example.com/services/payments
  • urn:example:service:billing:v2

This URI-based approach isn’t speculative. SPIFFE already uses URI SANs (spiffe://cluster.local/ns/production/sa/checkout) to represent service identities in Kubernetes mTLS contexts. The pattern is proven and widely deployed within private PKI. Extending it to public trust for cross-organizational federation is a natural evolution of an approach the industry has already validated. URI SANs can be validated through .well-known challenge methods (like ACME HTTP-01 scoped to a URI path) and ALPN-based methods, extending battle-tested ACME-era infrastructure rather than building from X.500-era assumptions.

What the industry is doing instead

Almost all the CA and vendor messaging right now says “move to private PKI.” That’s the right answer for internal use cases, but it doesn’t address cross-organizational trust. The most interesting alternative emerging is the DigiCert X9 PKI, launched in partnership with ASC X9, the financial standards body. X9 PKI is a completely independent trust framework, governed by X9’s policy committee rather than the CA/Browser Forum or browser root programs. It supports both clientAuth and serverAuth EKUs, uses a common root of trust for cross-organizational interoperability, and is WebTrust audited. It’s specifically designed for the financial sector’s mTLS needs, though they’re expanding to other sectors.

X9 PKI is essentially a “public PKI that isn’t the WebPKI” for service-to-service auth. It validates the premise that there’s a real need for cross-organizational client authentication with a shared trust anchor. But it’s sector-specific and governed outside the CA/Browser Forum, which means it doesn’t solve the general case. The EU’s eIDAS QWAC framework is another sector-specific approach. These are workarounds for the absence of a general-purpose, properly scoped public client auth certificate type.

If this moves forward

I’m not advocating for or against a working group at the CA/Browser Forum. But if the Forum does decide to take this on, the scope needs to be narrow IMHO. Machine and service client auth only, with identity in the SAN using dNSName and uniformResourceIdentifier. DN fields should not be relied upon for authentication decisions. Validation methods should build on existing domain control mechanisms. Human client auth stays in S/MIME where it belongs. The BRs should address the authentication versus authorization distinction explicitly, so relying parties understand that a publicly trusted client cert tells them who is connecting, not whether that entity should be granted access. This is already how server certificates work, and client auth should follow the same model. And the issuing CAs need to be dedicated, separate from server auth hierarchies. The SHA-1 payment terminal debacle, the Cisco Expressway mess. Every time client and server auth are entangled in the same hierarchy, one use case holds back progress on the other. Don’t repeat that.

The bigger picture

What we’re watching is a structural realignment of the WebPKI’s purpose. The WebPKI is being narrowed to mean “TLS server authentication for web browsers,” full stop. Everything else, client auth, S/MIME, code signing, is being pushed to dedicated hierarchies, private PKI, or alternative trust frameworks. That’s mostly the right direction. But the service-to-service authentication gap is real, growing, and not well served by any of the current alternatives. Private PKI doesn’t solve cross-organizational trust. X9 PKI is sector-specific. The CA/Browser Forum has the institutional knowledge, the validation infrastructure, and the trust framework to define something that works here. Whether they choose to is another question.

The conversation is happening now on the public list. If you have concrete use cases for cross-organizational service authentication with publicly trusted client certificates, this is the time to share them. The shape of what comes next depends on whether the use cases justify the effort, and right now the list is thin.

Introducing the WebPKI Observatory

For as long as I have been in this industry, the WebPKI compliance conversation has run on impressions. People with long memories and regular conference attendance have built up a picture of which CAs are well-run, which are struggling, and where the oversight gaps are. That picture has generally been accurate. It has also been almost entirely unmeasured.

The WebPKI Observatory at webpki.systematicreasoning.com, a project from Systematic Reasoning, is an attempt to change that. It’s a public dashboard covering 1,690 compliance incidents drawn from Mozilla Bugzilla between 2014 and 2025, cross-referenced with CCADB membership data, certificate issuance volumes from CT logs, root program trust store compositions, and the complete history of CA distrust events. The goal was simple: replace the shared intuition with actual data, and see what the data shows that intuition missed.

Some of it confirmed what most people in this space already suspected. Some of it was genuinely surprising.

The finding that reframes everything else is detection. When a compliance incident occurs, who finds it? Root programs find 52% of incidents. Automated external tools — CT log monitors, certificate linters, community scanning infrastructure — find 14%. CAs find their own problems in 9% of cases.

That number deserves more attention than it typically gets. One in eleven. CAs have full access to their own issuance systems, their own audits, their own CPSs, their own disclosure obligations, and they are the least effective detection mechanism in the ecosystem. External parties without any privileged access outperform internal CA monitoring by a factor of six or more. The compliance monitoring function has been effectively outsourced to external parties by default, and mostly without anyone deciding that was the right architecture.

Everything else in the data follows from that.

The failure classes that have grown are instructive. Technical misissuance has declined as a share of incidents over the past decade. What has grown is the process layer. In 2019, governance failures represented 21% of all incidents. By 2025 that figure was 60%. Policy violations, CPS failures, disclosure deadline misses. These are by definition things internal compliance programs should be catching. The 260 incidents tagged policy-failure or disclosure-failure in the dataset are a direct indictment of internal compliance operations. A CA that violates its own documented policy is not being surprised by an external attacker.

The oversight picture is also worth examining. In 2017, Mozilla engaged with 79% of Bugzilla compliance bugs. Chrome had no formal root program yet and was near zero. By 2025 the picture had reversed and degraded simultaneously. Chrome now contributes the dominant share of oversight engagement but covers only 18% of incidents. Mozilla covers 8%. The total corpus has roughly doubled since 2017 while combined meaningful oversight coverage has fallen by two-thirds. The Chrome Root Program launched in 2021, and its effect on the governance landscape is visible in the data — Chrome has made 239 substantive oversight comments in recent years versus Mozilla’s 158 over the same period. The center of gravity in CA compliance governance has shifted to the browser with 78% market share. That is structurally significant. Microsoft, which operates the largest trust store by root count at 346 trusted roots, has made zero recorded governance comments across all 1,690 incidents spanning 11 years.

The distrust history is also clarifying. The common mental model is that CAs get removed for catastrophic technical failures. The data does not support that model. 14 of 16 distrust events involve compliance operations failures. The behavioral taxonomy matters, negligent noncompliance, willful circumvention, demonstrated incompetence, and argumentative noncompliance. In 10 of the 16 cases, the distrust event was preceded by a documented pattern of prior incidents. The median runway from the first incident to distrust is 3.2 years. The failures were not hidden. They were in Bugzilla the whole time. The CA just was not resolving them systematically.

That means distrust is largely predictable given sufficient data. The indicators show up well before the outcome. That is a sobering observation about past oversight and a useful one for anyone thinking about what the compliance monitoring function should actually do.

The Observatory is a measurement tool, not a verdict. The dataset has limits — Bugzilla under-represents incidents that never reach public disclosure, CT-derived issuance volumes reflect only unexpired certificates at the time of measurement, and the behavioral taxonomy applied to distrust events involves judgment calls. But the patterns are robust enough to be useful.

For CA operators, the detection data alone should prompt hard questions about internal monitoring coverage. For root programs, the oversight gap data quantifies a scaling problem that is currently being absorbed by Chrome without anyone having explicitly decided that is the right architecture. For the policy community, the shift from technical to governance failures as the dominant incident class has direct implications for what audit frameworks should actually measure.

The dashboard is live at webpki.systematicreasoning.com, updated daily. The methodology is documented. Pull requests are welcome

Signed, Auditable, Offline-Tolerant, PQ Secure QR Codes

Signed, Auditable, Offline-Tolerant, PQ Secure QR Codes

A few months ago I wrote about what it would take to make a QR code verifiable in a post quantum world. In this post I wanted to explore what it would look like if we wanted one that is genuinely verifiable, not just signed, but auditable, offline-tolerant, and ready for a post-quantum world. That post was mostly conceptual. A conversation with Bruno Couillard last week nudged me to put down the thoughts I had been carrying about exactly that.

The design draws heavily on the draft for Merkle Tree Certificates, which is working through the IETF right now. MTC is aimed at TLS, but the core insight is that you can replace per-certificate signatures with compact Merkle inclusion proofs against a periodically updated signed root, and that insight translates directly to QR codes once you think carefully about the offline constraint. If you haven’t read it, the draft is at datatracker.ietf.org/doc/draft-davidben-tls-merkle-tree-certs.

The result of applying that idea to the QR problem is MTA-QR, a working implementation of what I’ve been calling Merkle Tree Assertions for QR codes. The demo is live at mta-qr.peculiarventures.com, and the full source is at github.com/PeculiarVentures/mta-qr-demo. There are Go and TypeScript implementations, a browser-only demo that generates and verifies without any backend, and an interoperability test matrix that exercises all three signing algorithms against both runtimes in every combination.

To be clear, this isn’t a production-ready library, but building it helped me identify things I had missed while whiteboarding it in my head.

The size problem is real but solvable

The original post flagged signature size as the central constraint. An ML-DSA-44 signature is 2,420 bytes. A Version 40 QR code at medium ECC holds about 1,273 usable bytes. Those two numbers don’t fit in the same sentence without a solution.

The solution is separating what goes in the QR from what you need to verify it. The QR carries the assertion content, a Merkle inclusion proof, and coordinates pointing to a signed checkpoint. The checkpoint itself contains the issuer signature, lives outside the QR, and gets cached on the verifier’s device, typically during a charge cycle before the device ever sees a QR code. Once cached, verification is fully offline.

The proof is the interesting part. A two-level tiled Merkle tree, with an inner batch tree and an outer parent tree, caps the total proof at eight hashes regardless of how large the log grows. Eight hashes is 256 bytes. That’s the ceiling, forever. The QR version stays fixed. The code never gets denser as the issuer accumulates millions of entries.

In practice, a Mode 1 QR carrying bearer claims and a Merkle inclusion proof fits comfortably within a Version 10 to 15 code at medium ECC, well under 500 bytes total. ML-DSA-44 doesn’t appear in the QR at all. The issuer signature lives in the checkpoint that the verifier fetched during its last charge cycle.

ML-DSA-44 won’t fit in a single QR in Mode 0, the fully embedded mode where the signature is in the QR itself. Mode 0 is the bootstrap mode: it works on air-gapped verifiers, on paper QR codes printed before any checkpoint infrastructure exists, and for scenarios where prefetch is operationally impractical. It’s not a niche failure case; it’s the starting condition for any new deployment. Mode 0 with PQC will require waiting for NIST to finalize smaller-signature algorithms, or accepting larger QR codes. Mode 1 is the practical path to PQC today.

Offline tolerance is mostly a framing problem

There’s a habit of treating offline verification as binary, either the device has connectivity at scan time, or it doesn’t. That framing creates a false constraint.

Every verifier with a battery has a window where it is stationary, connected, and idle. That’s when it charges. Fetching a checkpoint during a charge cycle is trivially cheap compared to everything else happening during that window. The relevant question isn’t whether the device has connectivity at scan time. It’s whether the assertion being scanned was issued before the verifier’s last checkpoint fetch.

For the common case, the answer is yes. A concert ticket issued last week, a prescription filled this morning, a badge issued at enrollment, all of these predate the verifier’s cached checkpoint by hours or days. Verification is fully offline because the relevant checkpoint was already there.

The narrow failure case is an assertion issued and scanned within the same charge cycle, before any checkpoint fetch. That falls back to a single cache-miss network call, which then covers every subsequent scan of the same batch. One round trip, then fully offline for the rest of the operational period.

Witnessing is where the transparency guarantee actually lives

The issuer’s signature proves the assertion came from a specific key. That’s useful, but it doesn’t prevent a compromised issuer from presenting different views of the log to different verifiers. Split-view attacks are subtle and hard to detect after the fact.

Witnesses solve this. A witness cosigns a checkpoint only after verifying it extends the previous one they saw, establishing a consistency guarantee across the full history of the log. Once multiple independent witnesses have cosigned a checkpoint, the issuer cannot retroactively rewrite or fork the log without those witnesses catching it.

The witness protocol comes from c2sp.org/tlog-cosignature, the same infrastructure underpinning the transparency.dev witness network. I worked on that witness network during my time at Google, so it was never far from my mind when designing this. Connecting MTA-QR to it means the issuance of every assertion can be monitored by parties with no relationship to the issuer. That’s the difference between a signed QR and an auditable one.

The implementation uses Ed25519 for witness cosignatures regardless of what algorithm the issuer uses for checkpoints. That’s not a design choice I made, it’s what the spec requires. It means an issuer can use ML-DSA-44 for the checkpoint signature while the witness infrastructure stays on stable, widely deployed Ed25519 keys. The two concerns are separated cleanly, and that separation matters. The quantum threat to the issuer signature and the operational threat to the witness network are different problems on different timelines.

What I had wrong in the original post

The earlier post mentioned UOV and SQISign as especially promising for QR codes because of their smaller signature sizes. That framing isn’t wrong exactly; smaller signatures do help with the size constraint, and both algorithms are genuinely interesting work. But the NIST competition covering them isn’t finished, which means neither is practical for anything you’d want to deploy or standardize against today. More importantly, once you separate the checkpoint from the payload, signature size matters only for the checkpoint, which isn’t size-constrained anyway. The Merkle structure removes the problem that UOV and SQISign were addressing. They may still have a role in Mode 0 once the standards are settled, but they’re not the lever that makes the design work.

What’s still missing

The spec has a revocation mechanism based on index ranges that a verifier checks at scan time, but the format for distributing and authenticating those revocation lists isn’t fully defined yet. This is the most operationally significant open item. An unsigned revocation list is vulnerable to a stale-list attack at the network layer. An adversary who can delay or suppress list delivery can extend the validity of a revoked assertion. The natural fix is issuer-signed lists using the same key that signs checkpoints, but that format isn’t written yet. Until it is, revocation is a weak link in any deployment that takes revocation seriously.

Type 0x02 key assertions, where the QR proves possession of a private key rather than just embedding bearer claims, are defined in the log entry format but the challenge-response protocol isn’t specified. Two implementations can’t interoperate on key assertions without it.

The C2SP tlog-checkpoint format needs registrations for ECDSA and ML-DSA before those algorithms can interoperate with standard tlog-checkpoint parsers. Ed25519 is fully specified today. ECDSA and ML-DSA work in the reference implementation but aren’t interoperable with external tooling yet. This is a practical blocker for adoption by anyone not using the reference implementation, and it’s the right next conversation to have with the C2SP and MTC communities.

Try it

The browser demo runs entirely in-page with no backend. It generates Ed25519 or ML-DSA-44 keys in your browser, issues assertions, builds the Merkle tree, produces QR codes, and runs the full 15-step verification trace. The tamper panel lets you flip proof bytes, corrupt the TBS, zero the proof, or truncate the payload, and watch exactly which verification step catches each failure. It’s a useful way to build intuition for what the protocol is actually checking and why each step is there.

The repo is at github.com/PeculiarVentures/mta-qr-demo. Pull requests welcome, especially on the open items.

When Compliance Records Become the Only Honest Signal

I’ve been spending a lot of time lately building Systematic Reasoning with my long-time friend Vishal. The core premise is straightforward. Organizations reveal their true operational character through how they design to prevent failure, how they plan to handle it when it happens, and how they actually do. That signal deserves to be tracked, structured, and acted on. We’re building an agentic compliance platform to do exactly that.

Systematic Reasoning won’t be limited to any single domain, but we decided to start with the Web PKI. The reasoning was simple. It’s high impact in a way that’s hard to overstate. Every internet user depends, whether they know it or not, on a relatively small number of Certificate Authorities getting things right. The margin for error is zero. If that trust layer breaks, it breaks for everyone.

DigiNotar is the canonical example. A small Dutch CA, compromised so thoroughly that attackers could impersonate any website on the web, and did. That capability was used to spy on Iranian dissidents, intercepting communications that people believed were private and secure. The trust infrastructure that was supposed to protect them was turned into a weapon against them. DigiNotar isn’t an edge case or a cautionary tale from a more naive era; it’s a demonstration of the actual ceiling of what can go wrong. And it isn’t the only one. State-affiliated certificate authorities have been caught performing man-in-the-middle attacks on their own citizens’ traffic, something the Baseline Requirements explicitly prohibit, but prohibition only matters if it’s enforced. The web’s trust model works right up until the moment someone decides it’s more useful as surveillance infrastructure.

At the core of Systematic Reasoning, is a belief I’ve held for a while. Compliance can be a vital sign of organizational security, but only if it’s continuous. The reality today is that it isn’t. Code ships daily. Audits happen annually. The gap between those two rhythms is where things go quietly wrong.

I’ve written before about why I have limited faith in the current audit regime. Auditors are engaged by the organizations they assess. Their product is a clean seal; their incentive is to keep the client. They operate on point-in-time sampling with auditee-selected scope, and they’re often compliance professionals rather than engineers, which means they’re checking whether a policy exists more than whether the system actually behaves correctly. That’s if you’re lucky. Sometimes the audit is scoped against a version of the Baseline Requirements that was superseded over a year ago.

The same incentive shapes how certificate authorities write their governance documents. A CP/CPS that relies heavily on incorporation by reference, that omits specifics about what the organization actually does and what constraints it operates under, is easier to audit against than one that makes precise, testable commitments. Vagueness isn’t always carelessness. Sometimes it’s a design choice. The same thing happens in incident reports. A report that attributes a failure to “organic process evolution” or “human error” without describing the actual control gap is easier to close than one that names the broken system and commits to a specific fix. In both cases the document gets the box checked without creating accountability. References establish authority. Commitments establish accountability.

The audit gap isn’t compensated for by strong internal monitoring either. The majority of significant compliance failures are not caught internally. They are caught by external researchers, root program staff, or community tooling. A broken validation endpoint runs for five years and the organization finds out because someone posted a 404 error in a public issue tracker. A validation race condition exists undetected for seven and a half years not because it was well hidden but because nobody was looking. The absence of an internal alarm is not evidence that the system is healthy. It is often evidence that the monitoring itself is missing.

So public incident reports and governance documents become some of the most signal-rich material available. Policy documents tell you what an organization claims it will do. Incident reports tell you what happened when reality diverged from that claim. Together they create a longitudinal picture that neither document produces alone.

Building a system to reason over that data surfaced a problem I didn’t fully anticipate. When you’re working from the outside, with no access to internal systems and no way to verify what actually changed, the public record is almost all you have. The question isn’t whether to treat it with skepticism. It’s how much skepticism to build in by default.

The temptation is to give the benefit of the doubt. Organizations are required to describe the blast radius of an incident. Not every localized bug is a symptom of something systemic. But accepting minimizing language at face value is its own failure.

“Only” is doing a lot of work when the bug it’s describing went undetected for seven and a half years. “No compromise of end-entities” is doing a lot of work when what it really means is that nobody found the gap before you did. Framing survival as security isn’t reporting, it’s PR. And if an organization believes an incident is no big deal, you can predict with reasonable confidence that the root cause analysis will be shallow and the remediation will be a band-aid.

ForgeIQX, our first offering, tracks those signals longitudinally across both policy documents and incident reports. Not to prosecute organizations for their language choices, but to notice when a commitment made in a CP/CPS quietly disappears in the next version, or when a promised fix is nowhere to be found when the same failure mode surfaces years later. That’s commitment decay, the slow evaporation of a promise made under pressure, and it’s only visible if you’re tracking across multiple documents and incidents over time rather than treating each one in isolation.

The calibration problem is real and doesn’t have a clean answer. Get it wrong in one direction and you build a system that cries wolf. Get it wrong in the other and you build a system that launders PR-speak into clean signals, which is just automating the thing we already do too much of.

There’s a third failure mode that took me longer to see. A system like this can be gamed. Swap “we got lucky” for “our monitoring detected no active exploitation.” Replace “only thirty certificates” with a more clinical impact scoping statement that says the same thing in language that sounds like engineering rigor. The words change; the institutional posture doesn’t. A system that can be satisfied by better prose isn’t measuring operational maturity, it’s measuring communications sophistication.

That means the system has to be built with structural pessimism. Not cynicism for its own sake, but a deliberate prior that clean language is not the same as clean operations, and that the absence of red flags is not the same as the presence of green ones. We can’t verify that an organization fixed what it said it would fix. What we can do is watch whether the same failure mode surfaces again and whether the pattern of shallow root cause analyses continues or breaks. The historical record doesn’t tell us what’s true inside these organizations. It tells us what they were willing to say in public, under pressure, over time. Given the alternatives, that may be the most honest signal available.

A certificate authority with genuine operational maturity should want this kind of scrutiny applied to itself. Not because it will always produce a clean result, but because it surfaces the gaps before an external party does. ForgeIQX gives organizations a way to continuously monitor their own compliance posture, so their practices and code keep pace with their commitments. The same is true for auditors who want their findings to mean something beyond a checkbox. The problem with the current regime isn’t that the people in it are careless. It’s that the incentive structures don’t reward rigor, and the tooling to demonstrate it continuously doesn’t exist. That’s what we’re building.

The Web PKI is where we started because the stakes are concrete and the public record is unusually rich. But any regulated industry where compliance is measured annually, where governance documents are written to satisfy auditors rather than inform relying parties, and where incident reports are drafted with one eye on legal exposure, has the same gap between what the paper says and what the organization actually does. We started here. We don’t intend to stop here.

The Signal They Chose to Ignore

Two prior posts worked through the statistics of the SB 6346 sign-in data. In the first I established the methodology and the finding. After applying a birthday-corrected collision test to separate organic participation from anomalous windows, roughly 90,000 legitimate CON participants remain against roughly 9,100 legitimate PRO participants. In the second I addressed the legislature’s claim that duplicate names make the dataset unreliable. The finding runs the other way. A genuine sample drawn from a real community produces name collisions at a predictable rate. People share surnames, people hit submit twice, households have two people with the same name. The PRO overnight batch produced zero collisions across 934 draws, where the statistical minimum expected is around 30. The anomaly is suspicious precisely because it has too few duplicates, not too many. Real participation is messy. This was not.

This post is not about those results. It is about what legislators said about them at a February 24 media availability, and whether their positions are statistically defensible.

They are not.

The Math Problem With “Not Helping Us Make Decisions”

“It’s not like we are making decisions not to pass a bill because of a sign in… they’re not really helping us make decisions in terms of amendments to bills or whether to pass it out of committee or not. We rely on people who actually come and testify in person.”

— Senator Manka Dhingra, February 24 media availability

That is a statistical claim. It asserts that the sign-in data has no decision-relevant information. For that to be true, one of two things must hold. Either the signal is too noisy to be meaningful, or legislators have better information that makes it redundant.

Neither holds.

The 10:1 ratio across 90,000 legitimate responses is not ambiguous. The margin of error at that sample size is roughly a third of a percentage point. The ratio does not wobble under any standard statistical treatment. Even applying the most aggressive self-selection correction anyone has proposed, assuming CON participants are twice as motivated to engage as PRO participants, the adjusted ratio is still 5:1. The signal does not disappear. Calling it noise is not a statistical judgment. It is a refusal to do the math.

As for better information, what would that be? Testimony at a two-hour hearing. Phone calls. Letters. The intuitions of members who have held their seats for multiple cycles. None of those are more statistically rigorous than 90,000 data points. Most are orders of magnitude less rigorous. If a senator’s read of the room outweighs a dataset this large at a ratio this clear, that is not superior methodology. That is substituting anecdote for evidence.

Dhingra’s preferred alternative, people who show up in person, has its own problem. The photo below is from the February 6 Senate hearing. The room is full of people in matching purple shirts and teal sashes. That is coordinated turnout, organized in advance, by people with the resources and flexibility to get to Olympia on a weekday. It is the physical equivalent of a sign-in campaign, except it requires taking a day off work and driving to the state capitol.

That standard also systematically excludes the people most affected by legislation. A small business owner in Spokane worried about a new tax on their income cannot easily testify on a Wednesday. A nurse working a shift cannot. A retired teacher in Yakima cannot. The sign-in system exists precisely because geographic and economic barriers make in-person participation inaccessible to most Washingtonians. Dismissing sign-ins in favor of in-person testimony is not a quality upgrade. It is a substitution of one self-selected sample for a smaller, more organizationally filtered one.

What Statistically Relevant Engagement Actually Looks Like

A standard poll commissioned to gauge public opinion on a major policy question uses around 1,000 respondents. That produces a margin of error of roughly 3.1% at 95% confidence. Those numbers drive legislation, inform campaign strategy, and get cited on the floor. Nobody demands methodology disclosure before a senator cites a Crosscut poll. That is simply the accepted evidentiary standard for constituent sentiment.

The sign-in dataset, after deduplication, contains roughly 90,000 legitimate CON responses. While strict margin of error calculations require randomized polling rather than opt-in data, the mathematical gravity at this scale is inescapable: a random sample of this size would carry a margin of error of approximately 0.33%. This dataset is ninety times larger than what legislators already treat as a reliable signal, with precision ten times tighter.

Washington has approximately 5.5 million registered voters. Ninety thousand responses represents roughly 1.6% of that population engaging with a single bill in committee. In political science research on constituent contact, engagement rates on individual pieces of legislation are typically measured in fractions of a percent. At 1.6%, this dataset is not a rounding error above that baseline. The prior record for sign-ins on any Washington bill was reportedly around 45,000, itself considered extraordinary. This dataset doubled it, and the legislative website crashed under the volume because nothing in the system’s design anticipated engagement at this scale.

The infrastructure of participation failed because the signal exceeded its design limits. That is not a data quality problem. That is evidence of something real happening in the electorate.

To put 90,000 in electoral terms: Washington has 49 legislative districts. Distributed statewide, that averages roughly 1,800 CON sign-ins per district. The 2024 state Senate race in the 10th district was decided by 153 votes. The House race in the 17th district was decided by fewer than 200. Several competitive seats turned on margins smaller than the number of people in those districts who showed up to oppose this bill. Legislators are not dismissing a fringe signal. They are dismissing a constituency that is, in several of their districts, larger than their margin of victory.

Consider how the same legislators would respond to a poll of 1,000 Washingtonians showing 10:1 opposition to a bill. That finding would be treated as dispositive. It would be cited in floor speeches, appear in press releases, and be described as a clear signal of constituent sentiment. This dataset shows the same ratio at ninety times the sample size, with a margin of error ten times tighter, with an audit trail, with a reproducible methodology, and after removing anomalous windows on both sides.

The legislators who called it noise do not apply that standard to anything else they use.

You Don’t Need to Read the Bill

“I don’t think everyone who’s signing in in support or opposition is actually reading the bill. So I think you got to take it for what it’s worth.”

— Senator Yasmin Trudeau, February 24 media availability

For a technical bill where the title might mislead, that would be a legitimate point. SB 6346 is not that kind of bill. Washington has not had an income tax in nearly a century. Voters have rejected it ten times. For most constituents signing in CON, reading the bill is beside the point. They already know where they stand. The question SB 6346 raises for them is not what the rate structure looks like. It is whether Washington should have an income tax at all, and on that question they have a consistent ninety-year answer. Beyond that settled position, the architects of this legislation documented their strategy in writing years before the bill was introduced.

In April 2018, Senator Jamie Pedersen sent an email to a former Democratic legislator explaining the real value of passing a capital gains tax. The major use of revenue, he wrote, was secondary. The more important benefit was on the legal side. Passing a capital gains tax would give the Supreme Court the opportunity to revisit its decisions that income is property, and would “make it possible to enact a progressive income tax with a simple majority vote.” Those emails were obtained through public records and published by the Washington Policy Center, which also documented the three-step sequence Pedersen described. Pass the capital gains tax to break the legal seal. Pass a millionaires tax to build the administrative infrastructure. Then lower the threshold to capture the middle class.

The capital gains excise passed in 2021. Pedersen also promised the revenue would reduce property and sales taxes. The state collected $1.8 billion in capital gains revenue from 2022 to 2024. Not a dollar went to reducing property or sales taxes. New spending absorbed everything. The Supreme Court upheld the excise in Quinn v. State in 2023, doing precisely what Pedersen predicted. A surcharge was added in 2025. SB 6346 arrived in 2026 as the simple majority vote Pedersen described eight years earlier.

A constituent who signs in CON without reading SB 6346 but who knows this history is not pattern-matching by instinct. They are responding accurately to a documented legislative strategy, now in its final stage, by an architect who wrote down the plan. Trudeau’s concern assumes the sign-in reflects ignorance. The record complicates that assumption.

The federal income tax was introduced in 1913 as a temporary measure with a top rate of 7% on incomes above $500,000. It has been neither temporary nor limited since. A constituent who has watched Washington’s capital gains excise follow the same arc, introduced with tax relief promises that were never kept and expanded within four years, is not being paranoid. They are reading the pattern correctly.

A constituent signing in CON on this bill is not evaluating the mechanics of a 9.9% rate on income above one million dollars. They are evaluating a mechanism with a documented history and a stated long-term purpose. That is not noise. That is the signal working as designed.

The Participation Double Standard

“As a general rule, I always warn my members, you shouldn’t really pay attention to that kind of dialogue… maybe focus less on numbers and more on quality of engagement.”

— Speaker Laurie Jinkins, February 24 media availability

“Quality of engagement” implies that organized participation is lower quality than spontaneous participation. Applied consistently, that standard would disqualify most of what the same legislators celebrate as democratic infrastructure.

Get out the vote campaigns are organized, at scale, through forwarded links, text banking, social media mobilization, and door knocking. They systematically encourage people to act on issues they may not have independently researched. Nobody argues that a voter who was reminded to register by a campaign text is less legitimate than one who showed up spontaneously. Nobody demands that turnout in heavily canvassed precincts be discounted because the participation was encouraged rather than organic.

The asymmetry is hard to explain on principled grounds. Get out the vote efforts are explicitly designed to shape electoral outcomes, which directly determines who holds legislative power. Organized sign-in campaigns are designed to inform legislators of constituent sentiment on a specific bill, which Jinkins then warns her members not to pay attention to anyway. If one is legitimate democratic infrastructure and the other warrants skepticism, that distinction requires an explanation nobody has offered.

The Self-Selection Argument Does Not Save Them

The legitimate version of the dismissal is astroturfing risk. Organized campaigns can mobilize sign-ins that do not reflect organic sentiment. Two problems follow.

First, the statistical work already addresses it. The anomalies I flagged in those prior posts run against the PRO side, not CON. The CON signal carries the messy collision fingerprint consistent with real people. The organized manipulation concern, applied rigorously and symmetrically, strengthens the CON signal rather than undermining it.

Second, self-selection disqualifies nothing legislators already use. Every constituent signal they rely on is self-selected. Calls. Letters. Town hall attendance. Donations. None represent a random sample of the electorate. The sign-in system is being held to an evidentiary standard that almost no feedback mechanism in democratic practice has met, and that standard is applied to nothing else.

What makes the sign-in data different from those signals is not that it is less reliable. It is that it is more systematic. It produces a record. It is auditable. It generated enough volume to run statistical tests on. The methodology applied here would hold up in a peer-reviewed context. The “I talked to my constituents” alternative would not.

For the underlying sentiment to be actually close to even, CON participants would need to be systematically ten times more motivated to engage through this specific channel than PRO participants. That is not a bias correction. That is a complete reversal of the observed signal. No one has offered a mechanism that produces that result.

The legislators dismissing this data are not applying a rigorous evidentiary standard. They are applying a selective one.

The Broader Pattern

In Disdain or Design? I wrote about what happens to constituent input in Washington when institutional actors have decided on an outcome. The user interface of democracy still renders. The buttons are there. What that piece examined is whether the backend those buttons connect to has been rewired.

The sign-in dismissal is that pattern made unusually explicit. When lawmakers assert that sign-in anomalies damage the ‘democratic process,’ the irony is staggering. The legislature already removed the actual democratic process from this bill by attaching an emergency clause, deliberately blocking the public’s ability to challenge it via referendum. They pre-emptively silenced the electoral signal; now legislative leaders are simply stating on camera that the only constituent participation left is not helping them make decisions.

Washington voters have rejected income taxation ten times through the constitutional amendment process. The legislature is effectively circumventing the initiative process that most recently codified that preference. Dismissing the largest constituent response in state legislative history as something members should not pay attention to is not a data science position.

It is a tell about whose input actually shapes the outcome.

The Question That Deserves an Answer

Every signal legislators use to read constituent sentiment is self-selected. Calls. Letters. Town halls. Protests. Donations. Sign-ins are just self-selection at scale, with a paper trail rigorous enough to audit.

It is a perfectly reasonable position to argue that 90,000 highly motivated people clicking a web form do not flawlessly represent the entire state of Washington. But if the legislature genuinely wanted a higher-fidelity democratic signal, they would not have attached an emergency clause to explicitly bypass the voters. And they would not be ignoring a century of bipartisan ballot results where Washingtonians have rejected this exact policy ten separate times.

Legislators are free to make that choice, but voters deserve transparency about it, not a smokescreen of statistical skepticism that the data itself dismantles. When the numbers speak this clearly, ignoring them isn’t methodology; it’s a deliberate unplugging of democracy’s earpiece.

Duplicates Are Not the Problem

The Washington House is now arguing that the sign-in dataset for SB 6346 is unreliable because it contains duplicate names. The claim is simple. If the same name appears more than once, you cannot trust the totals.

They are not wrong that duplicates exist. They are wrong about what duplicates mean and what to do about them.

Every real-world dataset contains noise. Names entered twice, typos, outliers, junk. This is not a scandal. It is a property of data collected from human beings at scale. The standard response is not to discard the dataset. It is to trim it. A trimmed mean, cutting the head or tail or both, is one of the oldest tools in data science. The presence of junk data is not a reason to abandon analysis. It is the reason analysis exists.

The birthday-corrected collision test applied in the previous post is a more principled version of exactly that. Rather than arbitrarily cutting a fixed percentage off the tail, it uses the population model to identify which specific windows are statistically anomalous and removes only those. The legislature is being offered a choice between principled trimming and throwing the whole dataset away. One of those is data science. The other is a talking point.

Why Duplicates Happen

Before getting to the test, it is worth being precise about why duplicates appear in the first place, because the innocent explanations are more common than the fraudulent ones.

Approximately 800 people named John Smith live in Washington state. These are real, distinct individuals.

The first is demographics. According to the U.S. Census Bureau, Smith is the most common surname in America, occurring roughly 828 times per 100,000 people. There are an estimated 32,000 people named John Smith in the United States, approximately 800 in Washington state alone. But national averages miss how name frequency actually works in practice. It clusters by community. Redmond and Bellevue have dense South Asian tech worker populations where Patel and Singh recur at rates far above the state average. Tukwila and south King County have large East African and Somali communities where Mohammed appears with predictable frequency. South Seattle and the Puget Sound corridor have substantial Vietnamese communities where Nguyen, already the most common surname in Vietnam, concentrates heavily. Name frequency is never random. It reflects religion, culture, and family tradition. Mohammed is among the most common names in the world because naming a son after the prophet is an act of Islamic devotion practiced across generations. That is not a data quality problem. The same full name appearing two or three times in 80,000 records is not evidence of anything. It is census math applied to a state that looks nothing like the national average.

The second reason duplicates appear is the sign-in form itself. It does not confirm that your submission was received. Anyone who has filled out a web form and stared at the screen knows what comes next. You submit again. Someone might also change their mind and resubmit to correct their position. A household with two people named Michael Johnson might both sign in independently. None of that is fraud. Both causes are real, and a serious analysis accounts for both.

Beyond that, even if we removed all of the duplicates, it would not even move the needle on the ultimate message being sent. With that said, it is worth noting that CON has more removals in absolute terms because it has ten times as many submissions, which is what we would expect based on the collision test.

On Rapid Submissions

A related claim is that submissions arriving within seconds of each other indicate bot activity. The timing observation is real. The interpretation is not supported by the data available.

Rapid same-name pairs are primarily a function of submission volume. When hundreds of people are submitting per hour, two people who share a name will statistically land within seconds of each other by chance alone. The chart below plots same-name rapid pairs against hourly submission rate for both sides. Both follow the same curve. The PRO overnight Feb 20 hours, at roughly 190 submissions per hour, fall below where the trend predicts they should be, which is consistent with what the collision test found. The timing argument does not add new evidence against PRO. It describes a mathematical property of any high-volume submission window.

The public export contains no IP addresses. Without them, rapid sequential submissions cannot be distinguished between three completely different explanations. The first is a single person double-submitting because the form gave no confirmation. The second is two people in the same household on the same connection. The third is two distinct people with different IPs whose submissions happened to land close together during a busy window.

The tool that would actually resolve this is IP address logs from the server. A same-IP rapid duplicate is strong resubmission evidence. A different-IP rapid duplicate from a residential ISP is two real people. A cluster of submissions from a datacenter or known VPN range is a different finding entirely. None of that analysis is possible from the public CSV, which is the only data anyone outside the AG’s office has seen.

This matters because the “within seconds” framing is being used to support a conclusion the available data cannot reach. The previous post noted that IP logs should be preserved before they age out. That recommendation stands. Until that analysis is done, timing alone is not evidence of anything specific.

It is also worth noting what the pattern does not look like. It shows zero name collisions and below-trend rapid pairs, the opposite of what cheap automation produces. What that pattern is consistent with is a large list of pre-generated unique names submitted at a controlled rate. CAPTCHA does not stop that. Each submission looks like a distinct human from the name and timing perspective. The fix legislators might reach for does not address the threat model the data actually points to.

What the Test Is Measuring

The birthday problem tells you that a room of 23 people has a 50% chance of containing a shared birthday. The same math gives you the expected number of name collisions in any random sample drawn from a community of known size. If you have 9,000 PRO supporters and draw 934 names from that pool, some names will repeat by chance. Not because anyone cheated. Because Jennifer Lee exists in multiples, and because some of them hit submit twice when the page did not respond.

The expected number of collisions for that sample is approximately 60. Not zero. Sixty. The test does not flag duplicates. It asks whether the duplication rate is consistent with what a genuine community would produce.

For the Senate PRO February 20th overnight window, the observed collisions were zero. Not fewer than expected. Zero. Across 10,000 simulations drawing from the actual PRO participant pool, the minimum produced was around 30. The overnight batch produced none.

The CON overnight windows tell the opposite story. More collisions than expected across several nights, consistent with resubmission, common names appearing organically, households submitting together. The kind of messy that real participation produces.

What This Means for the Dataset

The argument that duplicates make the dataset unreliable cuts in exactly the wrong direction. The PRO overnight batch from February 20th is anomalous precisely because it has too few duplicates, not too many. A genuine sample from a real community, one that includes people named John Smith and people who hit submit twice, does not produce zero collisions in 934 draws. It is statistically impossible.

Raw duplicate counts, without correcting for population name frequency and sample size, are not a meaningful metric. The legislature is being asked whether these sign-in totals reflect genuine public sentiment, and that is a statistical question with a statistical answer. The answer is not “the dataset has duplicates, therefore we cannot know.” The methodology was built specifically to separate expected duplication from anomalous duplication, and the findings hold.

Discarding the dataset because it contains duplicates is not data analysis. It is avoiding data analysis.

None of this is perfect. IP address analysis would not be definitive because VPNs, shared connections, and mobile carriers complicate attribution. The collision test rests on a population model that is an estimate, not a census. The rapid pairs chart fits a trend to noisy data. Statistical inference is always probabilistic, and anyone who tells you otherwise is selling something.

But the question legislators are actually asking is not whether this dataset is perfect. It is whether the sign-in totals are a reasonable signal of public sentiment, and whether the anomalies identified are significant enough to warrant skepticism about specific windows. For that question, the methodology does not need to be perfect. It needs to be fit for purpose.

A 10:1 ratio that survives deduplication, symmetrical trimming, and a collision test that was explicitly designed to tolerate legitimate duplication is a robust signal. The PRO overnight Feb 20 anomaly does not need to be proven beyond a reasonable doubt to be disqualifying for that window. The standard here is not a criminal conviction. It is whether legislators can treat the aggregate numbers as a directional guide to constituent sentiment. On that standard, the analysis is more than sufficient.

On Impersonation

Named officials discovering their identities appeared in the dataset without their consent is a real incident worth investigating. But the sign-in system was never designed to verify identity or attribute positions to specific individuals. Names are collected not to create a record of who voted, but because a completely anonymous system would be trivially manipulable. A name field is the minimal friction that makes aggregate analysis possible at all.

The relevant question for legislators is not “did John Smith actually sign this?” but “does the distribution of sign-ins reflect genuine public sentiment.” This is a survey mechanism, not a ballot. Washington has 7.8 million residents. Even a perfectly clean dataset with 100,000 CON sign-ins represents a small fraction of the population. Legislators have always understood these numbers as a directional signal, not a binding count. Treating impersonation as the central finding, rather than asking whether the aggregate signal survived manipulation, mistakes the instrument for the measurement.

The numbers behind the impersonation claim deserve scrutiny. Invest in Washington Now reported roughly 100-200 confirmed cases across 123,289 records, less than 0.2% of the dataset. Even tripling that estimate to account for unreported cases, it does not move a 10:1 ratio in any meaningful direction. And if you apply their own deduplication logic symmetrically: remove every name that appears more than once from both sides. CON drops from roughly 110,000 to 91,000 and PRO drops from roughly 10,000 to 9,000. The ratio is still 10:1. Their argument, applied consistently to both sides, does not change the conclusion.

Those confirmed cases were identified because victims self-reported. Public officials monitor mentions of their names, noticed the discrepancy, and came forward. That is the easiest fraud to find. It tells you nothing about what the rest of the dataset contains. Self-reported impersonation is the floor of what happened, not the ceiling, which is precisely why aggregate statistical analysis exists.

It is also worth considering what those confirmed cases likely represent. Some are probably legitimate resubmissions. Someone signed in, was not sure it worked, signed in again, and now appears twice. Some are probably trolling. Actual coordinated impersonation may be in there too, but the self-report mechanism cannot distinguish between the three. Treating 200 high-visibility cases driven by public figures monitoring their own names as representative of the full 123,000-record dataset is not a statistical argument. It is a press conference.

So What Does All of This Mean?

The answer to that is simple. The dataset has duplicates. The timing raised questions. Some names were submitted without consent. None of those observations, examined carefully, change what the data shows: roughly ten Washington residents opposed this bill for every one who supported it in committee. That signal has survived every test applied to it.