Why FIPS 140 Means Running Old Code

Leave a reply

You need to use FIPS 140 because of compliance, but have you ever asked what that requirement is actually for? What security properties are the authors of these policies trying to achieve?

In high-assurance deployments, the practical goal is usually to establish a meaningful security boundary around cryptographic keys. Organizations want explicit controls over who and what can use a key, and they do not want the answer to be every application or administrator with access to the host. They are also worried about key theft and abuse. For important signing and decryption keys, keeping the key out of the hands of the application and host OS is often the simplest way to force reasonable key-protection practices.

These are real problems. When the threat group Storm-0558 acquired a highly sensitive Microsoft MSA signing key, operational failures allowed key material to escape the isolated signing environment and become accessible from a compromised engineering environment. That single extraction let the attackers forge tokens and compromise customer email accounts at scale. It is much harder to see that happening when a key is non-exportable and managed inside a hardware boundary. HSMs are not the only way to get these properties, but they are the one tool that forces you to think hard about how you operationalize a key, and that discipline has value.

The trade-off is that you end up running old software you cannot patch for upstream security vulnerabilities. In the best case, you are years behind.

Worse, this is usually non-memory-safe code that is entirely opaque to you. The firmware, the middleware, and the technical documentation are kept strictly behind lock and key by the vendor. You cannot inspect the code to see if it is vulnerable, and independent review is virtually impossible. The HSM vendor may have proactively patched those vulnerabilities, but more likely they have not.

The cryptography in these modules almost never breaks. What breaks is the plumbing. For example, when an HSM takes in a payload like an administrative command or an authentication token of some sort, it has to parse it. This may mean relying on complex parsers or business logic, often written in aging C code. Because that plumbing is not memory safe, a single malformed input can lead to a buffer overflow or remote code execution right past the validated boundary, or even worse, a long-forgotten feature combined with new code could bypass policy controls around accessing the module altogether.

Take the U-Boot forks as another example. When you look at how these embedded systems are actually built, they lean heavily on bootloaders, embedded operating systems, and vendor firmware that sit outside the cryptographic boundary but inside your trust story. A vendor might fork U-Boot, validate their module, and then that code is essentially frozen.

Think about what boot verification actually involves. Something has to parse the firmware image, figure out which bytes are covered by the signature, compute the digest, and check the signature. The math for the signature might be FIPS validated, but the parsing, the offset arithmetic, and the handling of attacker-controlled structures is all just non-memory-safe C code.

When researchers find vulnerabilities in X.509 parsing or U-Boot image processing, the flaws are not in the cryptography. They are in the plumbing. And because vendor version strings in this opaque, locked-down firmware often have no meaningful relationship to upstream release numbers, that fork may be a decade old.

The real danger is that two curves are moving in opposite directions. The attack surface is being examined continuously. AI and automated variant analysis are getting exponentially better at finding previously unknown exploitable memory-safety bugs in old C code. Meanwhile, the validated base firmware remains essentially fixed.

We do not have to guess about this. I recently published a data project analyzing recent FIPS 140-3 validations. The data shows how frozen these systems actually are. Out of 415 validated modules we looked at, 324 of them, 78 percent, showed no recorded public update after their initial validation. The knowledge of security issues compounding every week is colliding with code that barely moves.

FIPS 140 compliance buys you necessary domain separation and forces good operational habits. But the way the market achieves that separation often leaves you depending on a static, aging, and opaque codebase. The certificate tells you that a particular version met the requirements when it was evaluated. It does not tell you that the same code remains secure years later. At some point, the certificate becomes evidence not only of what was validated, but of how long the underlying system has stood still.

The Certification Ends Where the Code Begins

Leave a reply

Disclosure: I am an advisor to Binarly.

I recently built the FIPS 140-3 Corpus, a dataset that pulls together the public record of FIPS validations. It combines CMVP certificate records, Security Policies, implementation details, operational environments, firmware versions, algorithm claims, and lifecycle data into something you can actually query and analyze rather than read one certificate at a time.

I built it because I have spent enough years around certification programs to know that the interesting information is rarely in any single document. It emerges when you look at the record as a system. Once you do, a pattern shows up that I think deserves more attention than it gets. The public evidence tells you a great deal about what was evaluated and almost nothing about whether the code that shipped actually behaves the way the evaluation assumed.

What the paper trail shows

When you read Security Policies in bulk, you start seeing the same dependencies over and over. Validated modules lean on bootloaders, embedded operating systems, update agents, and vendor firmware that sit outside the cryptographic boundary but inside the trust story. The certificate covers the module. The security property depends on everything around it.

U-Boot is a good example. Several modules in the corpus disclose it as part of their firmware or boot environment, and in some of those cases it participates directly in verifying firmware integrity before execution. Think about what that verification actually involves. Something has to parse the firmware image, figure out which bytes are covered by the signature, compute the digest, check the signature, and then decide what to run. RSA and SHA-256 handle two steps in that sequence. The parsing, the offset arithmetic, the decision about which fields are authenticated and which are attacker controlled, all of that is ordinary C code, and it is exactly where things tend to go wrong.

Binarly’s researchers recently published Unfit to Boot, which found previously unknown vulnerabilities in U-Boot’s FIT image processing and signature verification path. The flaws were not in the cryptography. They were in the handling of attacker controlled structures before and around the verification operation. The math was fine. The plumbing was not.

This is the oldest lesson in applied cryptography and we keep relearning it. The primitive is almost never the weakest link. The code that feeds the primitive is.

Where the evidence runs out

Here is where it gets uncomfortable. A Security Policy might identify its bootloader with a string like CNN35XX-UBOOT-4.03-03. That tells you a vendor U-Boot derivative is present. It tells you almost nothing else. Which upstream revision was it forked from? What did the vendor change? Which FIT features were compiled in? Were the fixes for known parsing flaws ever backported? Can externally supplied firmware even reach those code paths in this product?

None of that is answerable from the certification record. Vendor version strings in embedded firmware often have no meaningful relationship to upstream release numbers. The fork may be a decade old. The fixes may have been applied selectively, or renamed, or lost in a rebase nobody documented.

Conventional software composition tools do not close this gap either. They work by matching. Filenames, manifests, version strings, hashes, YARA rules, CVE mappings. That approach answers a useful question, namely whether a binary appears to contain a component already known to be vulnerable. Firmware defeats it routinely. Dependencies get statically linked into larger executables, symbols get stripped, and vendor forks drift far enough from upstream that the signatures stop matching anything.

And matching cannot help with flaws nobody has found yet. Before the Unfit to Boot research existed, there was no CVE to map, no affected version range, no signature to match. Someone had to go look at the implementation first. Credit to the Binarly team for doing that work, and disclosure noted, but the point stands independent of any vendor. Until somebody examines what actually shipped, every downstream tool, database, and compliance process is working from an empty record.

Why I care about this for HSMs and BMCs

The corpus is full of devices that sit in unusually trusted positions. HSMs hold the keys for certificate authorities, payment systems, and governments. BMCs sit beneath the host operating system with control over firmware updates, recovery, and remote administration. I have spent much of my career depending on the first category and being quietly worried about both.

These devices are exactly where the paper trail is weakest. They accumulate long lived vendor forks, inherited open source components, proprietary parsers, and hardware specific code written over many years, most of it statically linked and distributed only as compiled firmware. HSM firmware makes the visibility problem even worse. It is almost never publicly accessible, and on the rare occasion you do get an image, it is often encrypted or obfuscated, frequently with a key shared across the product line. Whatever that design accomplishes, it means customers and independent researchers see less of the code than a motivated attacker willing to recover the key. So we end up trusting these devices on the strength of certifications that, as the U-Boot example shows, stop well short of the code paths where real failures happen.

That does not make the certifications worthless. It makes them a starting point. A validation record that discloses a U-Boot derivative in the boot chain has handed you a concrete question to ask your vendor. What evidence supports the claim that your product is unaffected by this class of flaw? Has anyone analyzed the released binary, or is the answer derived from a spreadsheet of version strings? Which fields of an incoming update are actually authenticated before any code touches them? Vendors who can answer those questions with evidence are telling you something important. So are vendors who cannot.

Documentation, inference, evidence

The way I think about it, assurance comes in layers and each layer answers a different question. The certificate tells you what was evaluated and under what assumptions. The corpus connects those artifacts across the whole ecosystem and exposes the shared dependencies and recurring architectures the individual documents obscure. The final layer is evidence about what actually shipped, and it can come from several places. Vendors tracking their forks against upstream and documenting backports. Independent analysis of released binaries. Researchers publishing the failure modes of mechanisms everyone assumed were sound.

Most of the industry stops at the first layer. Procurement checks for the certificate, the checkbox gets ticked, and the boot chain full of forked bootloader code goes unexamined until someone publishes research like Unfit to Boot and everyone scrambles to figure out whether they are affected.

I built the corpus to make the second layer easier, so that the public record can generate the right questions instead of just decorating RFP responses. The questions are the point. A validation record that names a bootloader fork should end in a conversation with the vendor about what is in it, not in a filed PDF. Trust in labels got us the last twenty years of firmware security. Trust supported by evidence is going to have to get us the next twenty.

Steve Jobs, AI, and the Problem of Analysis Without Ownership

Leave a reply

There is an old Steve Jobs clip from a 1992 MIT Sloan talk that feels newly relevant in the age of AI. In the talk, available here as Steve Jobs MIT 1992 Lecture, Jobs is asked about consultants. His answer is not that consultants are unintelligent or useless. His criticism is more subtle. He says consultants often get to see a lot, analyze a lot, and recommend a lot, but they do not stay with the work long enough to own the consequences.

They do not spend years living with the product, the team, the tradeoffs, the mistakes, the customers, the budgets, the bugs, or the recovery. They may see the fruit, as Jobs put it, but they “never really taste it.”

That distinction matters.

There is a kind of knowledge that comes from observation, and there is a different kind of knowledge that comes from ownership. Observation can make you articulate. Ownership makes you careful. Observation helps you describe what should happen. Ownership teaches you what actually happens when a recommendation meets constraints, incentives, politics, timelines, systems, and human behavior.

There is also a kind of knowledge that only comes from time.

Some problems cannot be understood in a single sitting. You need to carry them around for a while. You read, step away, come back, notice what still bothers you, test a different framing, sleep on it, and then see the thing that was hiding in plain sight. That kind of soaking is not inefficiency. It is often how judgment forms.

That is the parallel to AI.

AI is making analysis abundant. It can read more than we can read, summarize faster than we can summarize, find patterns across larger datasets, generate plausible options, and produce recommendations that sound polished and confident. That is useful. But it is not the same as judgment.

Used poorly, AI becomes consulting at machine scale. It is fast, articulate, and superficially impressive, but disconnected from whether its recommendations actually survive contact with reality.

It can say what should be done without knowing what happened after someone tried to do it. It can identify risks without understanding which ones mattered. It can produce a roadmap without living through the missed dependency, the customer objection, the policy constraint, the budget cut, the migration failure, or the second-order effect six months later.

It can also make us confuse speed of response with depth of understanding. That may be the deeper risk. AI can collapse the slow work of thinking into the first plausible answer. It can make a problem feel resolved before we have really spent time with it. It can produce fluency before we have earned conviction.

That does not make AI useless. It makes the design problem clearer.

The lazy version of the AI story is the self-driving car analogy, namely once the machine becomes safer, faster, or more consistent, the human gets pushed out of the loop. There will be domains where that is true. But much of knowledge work is different. The goal is not only to execute a task correctly. The goal is to understand the problem well enough to make better decisions the next time.

Execution tools can displace. Reasoning tools should compound.

That is why the most interesting promise of AI is not simply that it becomes a better consultant or even a better operator. It is that AI can help humans become better operators.

Used well, AI becomes a way to think with the material. It helps us understand datasets that are too large to hold in our heads. It lets us explore problem spaces from more angles. It helps test assumptions, compare interpretations, surface edge cases, and ask better questions. It can show us patterns we would have missed, but the value is not just the pattern. The value is that, through the process of interrogation, we understand the problem more deeply ourselves.

AI should not shorten our attention so much as deepen what our attention can hold.

A good AI system should help us return to a problem with more context than we had the last time. It should preserve the questions we asked, the assumptions we tested, the contradictions we found, the evidence that mattered, and the places where our understanding changed. It should make it easier to spend real time with the problem, not merely produce an answer faster.

In that sense, the best use of AI is not instant certainty. It is structured patience.

It lets us soak in a problem more effectively. By that I mean it enables us to hold more evidence in view, revisiting prior interpretations, comparing today’s answer to yesterday’s uncertainty, and gradually turning analysis into understanding.

The goal should not be to outsource judgment to AI. The goal should be to use AI to improve the conditions under which judgment is formed.

A good AI system should not merely say, “Here is the answer.” It should help us see why the answer might be true, where it might be fragile, what evidence supports it, what alternatives exist, and what would change our mind. It should help us move from diagnosis to action, from action to feedback, and from feedback to learning.

That is the line between AI as consultant and AI as learning partner.

This is also where many AI products will disappoint. Dashboards full of findings, risks, summaries, and recommendations can look impressive while still leaving the actual burden on the human team. They create the appearance of progress without necessarily improving understanding. The human still has to decide what matters, translate the finding into action, make the change, verify the outcome, and remember the lesson later.

The point is not that findings and recommendations are useless; they are necessary. Findings are the beginning of the loop, not the end of it. Systems that stop there are not doing judgment automation. They are doing analysis transfer.

The more interesting systems will close the loop. They will connect analysis to execution, execution to verification, and verification to institutional memory. Not because humans should be removed from the process, but because humans should be able to reason from a better substrate.

This is where Jobs’ point lands today. The scarce thing is not access to analysis. AI will make analysis abundant. The scarce thing is accumulated judgment, something that only comes from acting, observing, correcting, and learning over time.

Observation gives you language. Ownership gives you consequence. Time gives you depth. Feedback gives you judgment.

Jobs’ critique of consulting was not just a warning about consultants. It was a warning about any tool, process, or person that gets rewarded for sounding right without having to live with whether they were right.

AI will be most valuable not when it becomes the smartest consultant in the room, but when it helps teams build judgment faster, seeing more, acting sooner, sitting with the problem longer, verifying outcomes, and remembering what reality taught them.

The future of AI in knowledge work should not be analysis without ownership. It should be ownership made smarter.

The Breaker, the Priest, and the Philosopher

Leave a reply

Spend enough years in security and you notice that the people whose judgment you actually trust are rarely the ones with the cleanest credentials.

They are the ones who have been wrong in public often enough to develop taste. Their authority is earned backward, from scars rather than definitions. When they look at a scheme and say, no, that is wrong, and here is the deeper reason, they are not deriving it from first principles. They are recognizing a shape they have been cut by before.

That is worth taking seriously. In security you get little standing to philosophize until you have shipped something, broken something, defended something, or watched something fail. The person who starts from what is identity, or what is trust, but has never lived with the consequences of an answer, barely exists as a respected type. The people who carry real philosophical weight almost always came up through contact with failure first.

So the security philosopher is not the pure theorist. The philosopher is what a survivor of failure becomes once the failures start to rhyme and a reflective habit sets in. Most often that survivor is a breaker, because breaking is the most direct contact you can have with the gap between how a system should work and how it does. But breaking is not the only contact that leaves marks, and that turns out to matter later.

Everyone who matures this way is answering one question, whether or not they say it out loud.

Why does this keep happening?

You earn the right to answer by watching it happen enough times that fixing the bug stops feeling like an answer. Whatever conclusion you settle on is what you turn into. Some decide people do not understand the systems deeply enough. Some, that bad claims go unchallenged too long. Some, that security is downstream of engineering. Some, that the bytes are downstream of institutions and power. Some, that the abstractions themselves are broken. Some, that the failure is intrinsic and the only honest response is to keep hunting.

Those answers create the taxonomy. The genus is philosopher. The species are sorted by the answer each one gives, and by the way each one tries to make that answer true for other people.

The Sage

The Sage believes it keeps happening because nobody understands the systems deeply enough.

The Sage transmits by instantiation rather than argument. The worldview gets built into a tool, written into a book, embedded in a way of working, and you absorb it by use. A fuzzer can teach an entire philosophy of bug finding. Coverage becomes the thing worth chasing. The tool makes the argument every time it runs, so nobody has to persuade you in a thread. The position is already standing in the room, made of working code.

The Sage is contemplative, the monk of the genus. The work is not quiet because it is timid. It is quiet because it expects reality to do the teaching.

The Gadfly

The Gadfly believes it keeps happening because people are wrong in public and nobody corrects them.

The Gadfly’s philosophy exists only in motion. It lives in the thread, the argument, the review comment, the refusal to let an incorrect claim stand. This is Socratic in the original and irritating sense. You learn what is true by watching the argument refuse to die.

The Gadfly may share the Sage’s diagnosis exactly. People do not understand the system. But the method is the opposite. The Sage builds, the Gadfly fights. Both believe misunderstanding is the enemy. They differ on whether the cure is a tool or a wound.

The Builder Evangelist

The Builder Evangelist believes it keeps happening because security is downstream of bad engineering.

This is the breaker who concludes the dramatic failure is only the visible symptom. The real incident happened earlier, in how software was designed, reviewed, deployed, owned, or forgotten. The answer is not more heroics. It is to change how teams build, with security folded into normal engineering life rather than bolted on as a gate at the end or a priesthood that arrives with findings after everyone has moved on. The insight only counts if it propagates into how people actually work.

The missionary energy is the tell. Converts always have it, and this is the convert’s slot. Philosophy that has to spread to count as true.

The Statesman

The Statesman believes it keeps happening because the bytes are downstream of institutions, incentives, and power.

This one did not go deeper into the stack. They went up, out of the assessment shop and into platforms, governments, trust programs, procurement, incident governance, the places where the adversary is sometimes the org chart and sometimes a nation. The wisdom gets cashed out in policy and in who sits at which table.

The risk of the type is floating clear of the actual bytes. The best Statesmen never do. They remember the policy is only real if it changes what happens at the machine, the credential, the incident bridge. The worst become fluent in altitude and lose contact with the ground.

The Theorist

The Theorist believes it keeps happening because the abstractions themselves are unsound.

This is the opposite vector from the Statesman. Where the Statesman goes up toward power, the Theorist goes down toward formalism. Weird machines. Exploitation as programming a machine nobody meant to build. Security as a subset of reliability. Trust as an operational claim, not a noun.

This is the species that comes closest to the academic register, but the ticket was still bought through breaking first. The formalism is trusted because the person doing it has felt the abstraction fail in their hands. That is the difference between earned theory and decorative theory.

The Refusenik

The Refusenik is the apex breaker who declines to metamorphose at all. Not from inability. From principle.

The refusal is itself an answer. It keeps happening because failure is intrinsic to systems of any real complexity, and pretending a framework or a doctrine can end it is the deeper error. There is no theory waiting at the top of the climb, only the next finding. So the Refusenik stays the predator and calls the philosophizing a comfortable retreat from the only thing that is real, the work in front of them.

This is the lower bound of the taxonomy. It proves that becoming a philosopher of the discursive kind is a choice, not an inevitability, and it does so by holding a real position rather than an empty one. Some of the best who ever lived remain here permanently, and they are right to.

The Priest

The Priest is the auditor, the compliance keeper, the custodian of the framework. The Priest is the scandal of the taxonomy, but not for the reason the field assumes.

The reflexive complaint is that the Priest arrived without breaking. No public exploit, no system torn open, no credential earned the old way. By the breaker’s accounting, no scars at all. That accounting is the actual error.

Much compliance really is theater. Much audit work mistakes evidence for reality. Much framework worship trains people to pass inspections while risk keeps moving underneath them. None of that is in dispute. But at scale the Priest is often the only force keeping an organization’s vital signs visible. Compliance read generously is not theater. It is a pulse, one of the few observable ways to ask whether the organization is doing the things it claims to.

And the Priest does have scars, just not the kind the breaker recognizes. The Priest learns by watching organizations lie to themselves in patterns, watching the same control fail the same way across a dozen audits, watching risk reappear in process and ownership long after the technical finding was closed. That is sustained contact with failure. It leaves marks. The breaker simply does not read them as marks, because they did not draw blood the familiar way.

So the scandal is not that the Priest skipped the initiation. It is that the breaker cannot see the Priest’s scars as scars. Seeing them requires fusing two diagnoses that rarely live in the same person. The Builder Evangelist says automate or drown. The Statesman says the real system is organizational health. The generous read of the Priest requires both at once, and most people who came up breaking cannot hold both, so they read the Priest as the enemy.

Sometimes they are right. Sometimes they are only defending their own credentialing system.

The seam in the taxonomy

Two forces are doing the work here. One is diagnosis, the answer to why this keeps happening. The other is temperament, how you try to transmit that answer. The clean version of the theory says these collapse into one, that the way you transmit is downstream of what you concluded and who you are. The Sage builds because he decided depth is the problem and because he is contemplative. The Gadfly fights because he decided public error is the problem and because he cannot leave it alone.

But the Sage and the Gadfly may share the same diagnosis. If they do, they are one species wearing two faces, and the real joint is diagnosis, with transmission as a surface effect. The alternative is that transmission is itself fundamental, that how you choose to make a thing true for other people is a deeper fact about you than the proposition you are trying to make true.

I do not think this is settled, and I am not sure it should be. A taxonomy that resolved it cleanly would claim to know which of two people who believe the same thing is the more serious, on the sole basis of whether they build or fight. That is a claim worth resisting.

The point

Security does not really trust credentials. It trusts scars. That instinct is mostly healthy. It keeps empty abstraction out of the room and gives weight to people who can smell a failure coming rather than just describe one.

But every credentialing system has a blind spot, and the breaker’s is believing that only breaking confers standing. The whole taxonomy is one argument against that belief, because every species on it earned its authority through a different kind of contact with failure.

The breaker learns by cutting into systems.

The builder learns by watching teams repeat the same mistakes.

The statesman learns by watching incentives defeat correctness.

The theorist learns by watching abstractions collapse.

The priest learns by watching organizations lie to themselves in patterns.

The refusenik learns by never looking away from the hunt long enough to be comforted by a story about it.

Five are reflective and one refuses reflection, but all six are forms of earned contact, and none is the only one that counts. What you become is the answer you settle on for why this keeps happening, and the way of making it true for others that you cannot help but reach for.

The breakers who refuse the question stay hunters. The keepers who never broke anything are mistrusted for it, sometimes fairly and sometimes not. Everyone else becomes a philosopher of one species or another, whether or not they would accept the word.

The only real question is which one you became.

The Prompt Is an Argument

Leave a reply

The prior piece made a narrow claim. The prompt is the record, because a system can only act on what reaches it. Intent that stays in your head does not govern anything. Context that never reaches the model does not constrain anything. Purpose that is not in the prompt, the retrieved material, the tools, the policies, or the examples is not part of the decision environment.

If you accept that, the next question is unavoidable. What kind of record should a good prompt be?

For goal-oriented work, it should be an argument. More precisely, it should be enthymematic, an argument that leaves its obvious premise unstated for the audience to supply.

The form is Aristotle’s. He called the enthymeme the body of proof, the strongest of rhetorical proofs, a kind of syllogism with a premise left out. Its strength comes from the omission. Stating what the audience already accepts is tedious, while assuming the right premise makes the conclusion feel almost self-evident. That power holds on one condition. The missing premise has to be one the audience can already supply.

A capable model supplies a premise readily. It does not reliably supply yours, and from the output alone you cannot tell which one it used. Where your premise and its default mostly agree, the gap is cheap. Where the wrong one is costly, or where you have to reconstruct the reasoning later, it is not.

Look at a bad prompt with that in mind.

“Write about DevOps automation.”

That is a topic, not a goal. The model can go anywhere. CI/CD, infrastructure as code, job displacement, Kubernetes, a bland survey that explains nothing. It did not disobey. The prompt simply did not contain an argument for it to advance, so it advanced none.

Now the same subject with the premise restored.

“Explain why DevOps automation is becoming more viable now that production systems can be modeled, tested, and validated in synthetic environments.”

This carries a claim the first version left in your head. Automation gets safe where results can be checked. Coding agents became useful because code has feedback loops. It compiles or it does not, tests pass or fail, the type system objects. As production systems become modelable and testable, operations starts to acquire the same machinery, so it starts to look more automatable for the same reason coding did. You could not spell all of that out if you tried. The context behind any goal is unbounded, so every prompt is already a compression, and the only real choice is what to keep. Keep the load-bearing premise and drop what the model already handles. Here that premise is the causal claim, present enough that the model knows which answers would count as success, the one it would otherwise have to invent.

That is the difference between a task prompt and a goal prompt. A task prompt says do X. A goal prompt says do X in service of Y, because Z is the relationship that matters. The “because Z” is where most prompt failures live.

People leave it out because they assume the model already holds the frame. They write as if it knows the business context, the adversary, the institutional scar tissue, the product strategy, the audience, the risk appetite, the standard of correctness. Then a plausible, fluent, useless answer comes back and surprises them. The model was not confused. It was unconstrained. It got a task without the premise that made the task mean anything.

Humans skip premises constantly and get away with it. A lawyer says “that creates a reliance problem” and the room knows the shape of the issue. An engineer says “that breaks rollback” and the team feels the operational cost. A security person says “that moves the trust boundary” and the people who have lived with the system know why it matters. The shorthand works because the history is shared. A model does not inherit that history unless you hand it over. It does not know which premise is obvious inside your company, which analogy is carrying the real weight, which part of the request is decorative and which part is load-bearing. It does not know that “make this clearer” meant keep the technical claim but cut the wording that lets a reader hear monocausality.

This is why writing a good prompt feels more like drafting than asking. A good lawyer does not write words that merely sound like the client’s intent. A good lawyer writes words that survive interpretation by someone else, later, under pressure, with incentives to read them the wrong way. The document has to carry its operative meaning forward once the author has left the room. A prompt carries the same burden into a system that completes patterns from whatever materials it has. Leave the wrong premise implicit and the model may complete the argument in the wrong direction. Omit it and the model substitutes a generic one. Supply several competing premises and the model optimizes for the one easiest to write about rather than the one that matters. That is how a prompt produces fluent nonsense. Not that the model cannot write, but that the prompt did not preserve the reasoning.

So before writing a goal-oriented prompt, ask one question. What has to be true for this output to be useful? Not what topic it should cover, what format it should take, how long it should run. What has to be true. The answer is usually the missing premise. For a product strategy it is often the market wedge. For a security review it is the attacker’s actual path. For an executive summary it is the decision the executive has to make. For a critique it is the standard the work should be judged against. For a rewrite it is the misunderstanding you are trying to prevent.

That premise does not have to appear as a formal sentence. It can live in the framing, in an example, in the acceptance criteria, in the source material, or in an instruction about what not to optimize for. It only has to be somewhere in the record. Otherwise the system is not helping you pursue a goal. It is producing text near a topic.

In a single prompt you can hold the whole record in view. A production system spreads it out. The effective prompt is no longer the user’s sentence. It is that sentence plus the system instruction, the developer instruction, the retrieved documents, the available tools, the policies, the examples, the memory, the model version, and the configuration around all of it. The argument is distributed across those layers, which makes enthymematic design more important rather than less. The question stops being what the user asked and becomes what argument the system received. Did retrieval supply the missing premise or omit it? Did the tool definitions encode the right standard for action? Did the policy layer quietly override the user’s goal? Did the examples teach the wrong pattern?

Those are governance questions. A system that cannot reconstruct its effective prompt cannot reconstruct the argument it acted on. And a system that cannot reconstruct that argument cannot explain why its output was reasonable or unreasonable, compliant or not, safe or merely plausible.

The enthymeme works when the missing premise is shared enough to be supplied safely. That is the one condition these systems do not meet on their own.

So the work is not to make prompts longer. It is to make them carry the right inference.

The missing premise will be supplied either way. The only question is by whom. You provide it, or you let the system invent it.

The Prompt Is the Meaning

Leave a reply

Why textualism, original public meaning, and AI governance all turn on the same uncomfortable fact: intent does not travel unless it becomes part of the record.

There is an old fight in legal interpretation about where meaning lives.

Intentionalists look for purpose. They ask what Congress meant to do, what the drafters were trying to accomplish, and what problem the law was meant to solve. Legislative history matters in this view because floor speeches, committee reports, drafter notes, and surrounding debate can reveal the intent behind the enacted words.

Textualists are skeptical of that move. They argue that the law is the text that was enacted, not the private intentions of the people who helped write it. The words are the law. The meaning is what the text would have conveyed to a reasonable reader, not what a motivated advocate can later reconstruct from a convenient committee report.

Originalists make a related move in constitutional interpretation. Original public meaning says the Constitution means what its words would have been understood to mean by the public at the time of ratification. Not what a drafter secretly intended. Not what a later judge wishes it said. The meaning is anchored in the text, the historical context, and the interpretive record available at the relevant time.

You do not have to be a textualist or an originalist to see the infrastructure point.

Once a system has to act on language, intent is not enough. Intent has to travel through something. It has to travel through text, context, rules of interpretation, and a record of what was available to the interpreter when the decision was made. Otherwise, “what I meant” becomes an after-the-fact story.

Anyone who has spent time debugging prompts has run into the same problem.

You thought you were telling the model to be careful. You were actually telling it to hedge every answer into uselessness. You thought you were asking it to be concise. You were actually removing the context it needed to be right. You thought you were giving it freedom to reason. You were actually giving it permission to invent.

The words on the page were doing work you did not realize they were doing.

That is the uncomfortable part of working with language models. A prompt is not what you meant. It is not the conversation you wish you had with the model. It is not the background assumptions in your head. It is not the thing you would have clarified if another person had looked confused.

A prompt is the artifact the system received, interpreted, and acted on.

That is why the analogy to textualism matters. In human organizations, we constantly rely on unwritten context. We rely on shared history, institutional memory, tone, relationships, and the ability to stop and ask, “wait, what did you mean by that?” Human communication survives ambiguity because humans have recovery mechanisms.

AI systems do not get those recovery mechanisms for free.

There is no hallway conversation where you explain that when you said X you obviously meant Y. No shared institutional memory unless it is supplied. No unstated assumptions unless they are embedded somewhere in the context. No ability to rely on “what everyone knew” unless what everyone knew made it into the materials the system was given.

The system only has the record.

This does not mean the model is a textualist judge. It is not. The originalist’s reasonable reader is a legal construct. A model is a versioned, probabilistic system operating inside a specific runtime. The same words can produce different behavior under a different model, a different instruction hierarchy, a different retrieval result, a different tool definition, a different policy layer, or a different configuration.

So the lesson is not that the model found the one correct meaning of your prompt. The lesson is that your intent did not travel.

Intent does not become operational just because it existed in your head. Context does not exist just because your team would have understood it. Purpose does not govern the system unless it is encoded in the materials the system actually sees.

This is the prompt engineering lesson people often miss. Prompt engineering is not merely a collection of magic phrases, though some prompt patterns work for model-specific reasons that are not obvious from the surface text. At its core, prompt engineering is drafting. It is the work of turning intention into operative language.

That is why it feels so much like legal drafting. You are not writing what you wish the system understood. You are writing the thing the system will act on.

In production AI systems, that “thing” is larger than the sentence the user typed.

The operative prompt includes the system instruction, the developer instruction, the retrieved documents, the tool definitions, the prior turns, the examples, the policies, the memory, the model version, the ranking logic that decided which facts were included and which were left out, and the configuration that shaped how deterministic or creative the output could be.

That is the effective prompt.

This is where context engineering starts to absorb prompt engineering. The hard problem is no longer merely finding better words to ask the model. The hard problem is constructing the interpretive environment in which the model can behave reliably enough, predictably enough, and accountably enough for the job it is being asked to do.

Legal interpretation has always depended on more than raw text. Textualists still need grammar, usage, canons of construction, dictionaries, historical context, and an account of the reader. Original public meaning still needs evidence of how words were used at the relevant time. Even the most text-centered theories need a record.

AI systems make that dependency operational.

What was the user prompt? What system instruction controlled it? What policy applied? What documents were retrieved? What facts were omitted? Which tool schemas were available? Which model ran? Which version of the surrounding system produced the answer?

These are not merely implementation details. They are the interpretive record of the system.

That is why the prompt is the meaning. Not because the model is a perfect reader. Not because the words have one stable meaning across all systems. Not because intent is irrelevant. But because the system can only act on what made it into the operative prompt. If something was not in that record, it was not available to govern the system’s behavior.

At that point the issue changes. It is no longer just an interpretation problem. It becomes an evidence problem.

For a toy prompt, failure looks like annoyance. The answer was too verbose. The model misunderstood. The prompt needed tuning.

For a production system, failure looks like a governance gap. The system made a decision, but nobody can reconstruct what it was asked, what it knew, what it was allowed to do, what policy constrained it, what model produced it, or which version of the surrounding context shaped the result.

If you cannot reconstruct the effective prompt, you cannot explain the output. If you cannot explain the output, you cannot evaluate whether the system behaved correctly. If you cannot evaluate whether it behaved correctly, you cannot govern it.

This is why prompts need version control. Retrieved documents need provenance. Tool definitions need change history. Policy layers need to be preserved. Model versions need to be tied to outputs. Evaluations need to capture not just the answer, but the context that made the answer plausible.

But preservation is only the first step. A record without evaluation is archaeology. A record without monitoring is trivia. A record without enforcement is a diary. Governance requires the record, but it also requires machinery that acts on it: tests, policy gates, drift detection, human review, incident analysis, and change control.

Otherwise, the organization has outputs but not governance.

A log tells you what happened. A record tells you what the system was asked to do, what it was allowed to know, what constraints it was operating under, and why the resulting behavior was plausible from the materials available to it.

Without that, accountability collapses into storytelling.

The team says the model was supposed to be careful. The prompt says “avoid unsupported claims,” but the retrieved material was stale. The policy says “follow the customer’s procedure,” but the procedure was missing from context. The audit says the system made a decision, but nobody can reconstruct the versioned bundle of instructions, documents, tools, model, policies, and configuration that shaped it.

At that point, you are not governing the system. You are narrating around it.

That is the deeper lesson from textualism, originalism, and prompt debugging.

Text is never just text. It is text plus an interpretive frame, text plus context, text plus a record of what counted as meaning at the time. In law, we fight about that because rights, obligations, and institutional power depend on it. In AI systems, we are turning that fight into software.

The prompt is not your intent.

The output is not self-explaining.

The effective prompt is the record.

You may not fully own the model. You may not own the provider’s policy stack. You may not own the training process, the safety layers, or the hidden machinery that shapes the output.

But you can own your side of the interpretive record. You can preserve what you supplied, what the system saw, what it was allowed to do, what it returned, how it was evaluated, and what changed afterward.

That is the infrastructure of accountable AI.

The prompt is the meaning because the prompt, properly understood, is the record the system acted on.

You either govern that record, or you inherit someone else’s explanation.

A CA That Produces Evidence, Not Promises

Leave a reply

In my last post I argued that high-assurance systems should stop asking to be trusted on the basis of institutional promises and start producing verifiable runtime evidence about what actually happened. This post is the worked example. A certificate authority built that way, what choices it forced, and what is and is not done yet.

When I was at Google I got to work a bit with the BeyondCorp folks. What most didn’t understand is that the BeyondCorp Google used internally was substantially different from the BeyondCorp they launched to customers. On Windows and Linux, the path was TPM-backed device credentials associated with each machine, turning possession of the laptop and an authenticated credential into another factor. The deployment timelines varied by platform, but the important point was the architectural shape hardware-bound device identity became part of the access decision.

You will notice I didn’t mention Macs. That’s because Apple, although it had a similar secure processor on its devices, did not give customers attestations over keys stored inside it. We could put keys in the Secure Enclave. We could not prove to a third party that we had. For years we tried to get Apple to change that. Eventually they did, and what they shipped was not what we asked for.

We didn’t get the ability to put arbitrary keys in the enclave under our control with attestation. We got a device-bound credential signed by Apple that let us verify, at enrollment time, that a request was coming from one of our devices. We used that as a bootstrap to enroll a short-lived credential that the OS stored and used for day-to-day authentication. Apple’s attestation answered the which device question. Our short-lived credential answered everything that came after.

That worked. The standardization piece is draft-ietf-acme-device-attest, an IETF working group document I co-author, which lets the same ACME flow carry an Apple Managed Device Attestation statement, a TPM key attestation, or a YubiKey assertion without the CA needing to special-case each platform. Apple’s adoption was what unlocked normalizing on a single way across the fleet to authenticate these devices.

That normalization is the relying-party side of the credential story. The hard-edge property, this key lives on this specific device, signed by a chain you can verify back to the manufacturer, is now what we expect from a workload, a laptop, a phone, a passkey, a SPIFFE workload identity. MFA was the first compensating control for the weaknesses in passwords and API keys and bearer tokens. Passkeys, SPIFFE, and certificate-based zero-trust segmentation are the structural answer. We replaced the secret-you-know with a key-you-hold and bound it to hardware where we could.

Google’s internal production systems run the same shape, with Titan chips as the foundational substrate for the devices that get on the network for internal and cloud workloads.

That shift is well underway on the side of the wire that uses credentials.

The side that issues them is still mostly software on a server with an API key and a SOC 2 report.

If issuance hasn’t kept up with what we now expect from credential holders, what would catching up actually look like? The previous post made the general argument. The CA should be asked to produce evidence, not to be trusted. This post is what that actually looks like when you build it. A certificate authority where the issuance path itself is the evidence, where the policy that fired is part of what was measured, where the key release is gated on the measurement of the binary asking for it, and where every issued certificate is accompanied by a portable bundle a relying party can verify against trust anchors they already hold.

The architecture is designed to run on both AWS and Google Cloud. On AWS, enclave-based deployments use Nitro Enclave attestations, while VM-level deployments can use NitroTPM-backed evidence for measured boot, instance identity, and workload state. On Google Cloud, Confidential VM deployments use AMD SEV-SNP or Intel TDX attestation for the protected execution environment, with Cloud HSM as the key custodian. Where the system needs VM identity, boot-state evidence, or platform posture outside the confidential-computing attestation itself, it can also use vTPM-based evidence.

That breadth is intentional, and I’ll come back to why. But the real load-bearing architectural choice is not which cloud, HSM, enclave, VM, or attestation primitive we use. It is the shape of the issuance system inside each trust boundary.

Two components, one evidence chain

The split is structural, not procedural.

In a single-binary CA, policy enforcement and signing authority are separated by code paths inside one process. A vulnerability in the policy evaluator is a vulnerability in the signing authority, because they share an address space. The compartments exist in the source code. They do not exist at runtime.

Here, the compartments are real. The architecture splits issuance into two attested components.

The registration authority receives the certificate request, resolves identity from authoritative sources, evaluates the issuance policy against the requester’s posture and attestation, and produces a signed authorization context. The signing oracle holds the path to the CA’s signing key and produces the signature. Each runs in a separate attested environment. Each is measured separately. Each has its own keys. Both are written in Go. Memory-safe in normal code, reproducible without ceremony, and a standard library that already covers most of what a CA needs.

The RA and the oracle are different binaries, with different measurements, with different keys, on different network endpoints, joined by mutually-attested TLS where each side has independently verified the other’s measurement before either will talk. A compromise of the RA does not give the attacker the signing key, because the signing key is not in the RA’s address space and is not reachable from the RA’s network position. A compromise of the oracle does not give the attacker the policy evaluator’s identity providers, because the oracle has no identity providers wired into it. The interface between them is narrow, signed, and replay-protected.

This is the cross-machine version of the kernel-userland boundary. The kernel does not trust userland’s claim that a syscall is authorized. It does the check itself, every time. The oracle does the same thing for the parts of issuance it can verify independently.

What this buys is a bounded blast radius on a compromised RA.

If an attacker takes over the RA, including the RA’s signing key, the damage is bounded only to the extent that the oracle requires independently verifiable evidence for the facts that matter. For profile authorization, key binding, replay protection, RA authorization, and certificate structure, the oracle can check those facts locally.

For domain control validation, the same is true only when the evidence is cryptographic or independently corroborated, such as DNSSEC-validatable DNS evidence or signed observations from independent multi-perspective validators. The current implementation does not yet do either of these things, but adding support for them is a straightforward extension of the model. Without that, however, an RA compromise can still become a validation compromise.

That distinction matters. The oracle does not make an RA trustworthy. It makes the RA’s assertions conditional. Where the RA presents verifiable evidence, the oracle can check it. Where the RA presents only its own statement about what it observed, the oracle can enforce structure, freshness, profile policy, replay protection, and RA authorization, but it cannot turn that statement into ground truth.

The architectural property is not that the oracle magically knows every fact the RA observed. It is structural separation plus independent verification wherever the fact is independently verifiable. Every other property the rest of this post will describe, measured policy, gated key release, per-operation attestation, portable evidence, depends on that boundary.

What I mean by evidence

Before walking through the attestations, it is worth being precise about the word evidence.

A raw measurement is a narrow statement, this binary had this digest, this key was generated by this HSM, this quote was signed by this platform, this public key matches the attested key. Those statements matter because they are the hard edge of the system.

But they are not enough to explain issuance. A certificate is issued because policy evaluated a set of facts and authorized a specific profile for a specific requester at a specific time.

So evidence here means the decision record, the raw attestations where they can be disclosed, the verifier results, the policy digest, the profile binding, the facts the policy evaluated, the signed authorization context, the oracle’s per-operation attestation, the custodian’s key-release evidence, and the transparency proof.

draft-ietf-acme-device-attest helps with the requester-side device and key attestation. The CA-side runtime evidence chain is still built from platform-specific TEE, TPM, HSM, KMS, and transparency-log evidence. The point is not that one standard covers all of it. The point is that the CA should preserve the evidence chain instead of collapsing it into a one-bit pass/fail result or a promise.

What each attestation lets you verify

Cross-checking only matters if the things being checked are concrete. Four attestations cross the issuance flow, each one produced by a different party, each one letting the next party check something specific. It is worth walking through what each one actually lets you verify before going further.

The client’s attestation

What the RA verifies: that the requestor controls the private key whose public half is in the certificate request, that the key lives on a specific piece of hardware, that the hardware has a manufacturer-vouched identity, and that the key was generated under conditions the policy can reason about.

The shape of the attestation changes between platforms. A TPM-bound key on a Windows or Linux machine, a Secure Enclave key on an Apple device, and a key in a YubiKey’s PIV slot each arrive in a different format with a different signing chain and different platform-specific fields. The function is the same. Each one carries a binding between the key in the CSR and a piece of hardware, an identifier for that hardware, the conditions under which the key can be used, and a certificate chain back to a manufacturer the policy can decide whether to trust.

draft-ietf-acme-device-attest is the wire format that carries any of these in the same ACME flow, so the CA doesn’t need to special-case each platform. What the policy sees in the end is the same five things regardless of which platform produced them. Does the requestor control the private key. Is the key on a piece of hardware. Whose hardware. Under what conditions can it be used. Does the manufacturer chain trace back to a root the policy will accept.

The RA’s environment attestation

What anyone holding the document can verify: that the RA ran a specific measured image inside an attested execution environment, that the cloud provider signed off on the measurement, that the workload identity or role is the one the deployment expects, and that the public key bound to that measured environment for the rest of its lifetime is the one in the document.

AWS Nitro, AMD SEV-SNP, and Intel TDX each produce a different document, but the load-bearing contents are the same: a measurement of what was loaded, an identifier for the platform that produced it, the operating context, and a vendor chain back to a root the relying party can verify.

Worth being explicit about what the document does not let you verify. It does not tell you who deployed the measured environment, who runs the cloud account, or what the operator intended to run. It tells you what was measured. The operator’s intent is a separate question, answered by the published image list and the policy that names which measurements are acceptable. A relying party who trusts the hardware vendor’s chain can verify the measurement. They still have to verify, separately, that the measurement is one they should accept.

The oracle’s environment attestation

What it lets the RA verify, during handshake: that the oracle the RA is about to send an authorization context to is running the published image, and that the signing key the oracle will use for its half of the mutual TLS is the one bound to that image at boot.

What it lets a relying party verify, after issuance: the same thing, plus one additional binding. The oracle produces a fresh attestation for every signing operation, and the attestation is bound to the certificate that operation just produced and to the RA on the other end of the conversation that authorized it. Where the RA’s boot-time attestation carries the long-lived public key the measured environment will sign with, the oracle’s per-operation attestation pins this specific certificate to this specific oracle measurement with this specific RA.

The difference between boot-time and per-operation attestation is the difference between “the box looked right when it started” and “the box looked right when it did the thing you actually care about.” Boot-time attestation is what most confidential-computing deployments do today. It tells a relying party the deployment was valid at startup. It tells them nothing about whether the deployment was still valid five hours later when an actual issuance happened. Per-operation attestation closes that gap.

The custodian’s attestation

The CA private key is not loaded by the operator. It is held by a custodian, an HSM with discrete-silicon attestation, a cloud KMS that gates use on attestation, or another measured execution environment. The custody model matters. In the HSM case, the CA key is non-exportable. The signing oracle never receives the private key. It sends a signing request to the HSM, and the HSM signs only when the relevant policy, authorization, and attestation conditions are satisfied. In the software-protected or KMS-protected case, the key, or the key-encryption material needed to use it, may be wrapped so it is usable only inside a signing oracle whose attestation matches a published image.

What the custodian’s evidence lets a relying party verify depends on that model. For an HSM-held key, the evidence is not that the key was released. It is that the non-exportable key was used by a particular custodian, under particular firmware, configuration, policy, and authorization conditions. For a wrapped software or KMS-backed key, the evidence can show that key material was made usable only because the oracle’s measurement matched a measurement on the published list. Either way, if an operator runs a different binary, however benign the reason, the signing path fails. The dragon’s teeth around the data center are still there. They no longer have to do the whole job.

For a concrete look at what one of these documents actually contains, rather than just what it proves, the Peculiar Ventures attestation library parses examples from each of these platforms, including the Marvell HSM attestation produced by Google Cloud HSM. An attestation without a verifier is a claim. With a verifier, it is something a relying party can act on.

Policy as mechanism, not promise

Reading all of those attestations means the policy that evaluated them has to be something concrete enough to read.

Today’s CAs publish a CP/CPS. The Certificate Policy and Certification Practice Statement is a document describing what the CA will and will not do. An auditor samples evidence once a year against the document. The document and the system that produces certificates are not cryptographically linked. The relying party trusts that the document describes the system. The auditor’s annual report is the closing of the loop.

Cedar policies are different. They are a domain-specific language with declarative semantics, written under version control, statically analyzable, and small enough to read in a sitting. The policy that fires inside the signing oracle is compiled into the measured binary. The digest of the policy travels in the evidence bundle that accompanies the certificate. A relying party can re-fetch the source at the named digest, read the rules, and decide for themselves whether the policy that authorized their certificate is a policy they accept.

The contrast that matters operationally is the one between policy-as-promise and policy-as-mechanism. A CP/CPS is a promise. The auditor verifies, by sample, that the practice resembles the promise. A Cedar policy compiled into a measured binary, with its digest in the bundle, is a mechanism. The relying party verifies, per certificate, that the rules that fired are the rules the CA published.

There is a sharp footgun specific to Cedar that is worth naming because the answer to it is part of the architecture. Cedar’s evaluator skips a policy that throws while accessing an attribute that was not present. The convenient result is that policies stay readable in the absence of optional context. The inconvenient result is that a forbid policy with an unguarded attribute access can silently drop, which is a fail-open. The lint at build time requires a has guard before any optional attribute access. A policy that would have failed open is now a build error. The defense-in-depth move is structural. The policy author cannot ship a fail-open by accident.

What this looks like in practice

An employee gets a new smart card from corporate IT in a sealed blister pack. They plug it in.

The enrollment client on the laptop sees a card it doesn’t know. It reads the card’s GlobalPlatform Card Production Lifecycle data and the card recognition data, learns this is a factory-fresh retail token from a manufacturer it has a trust root for, and verifies the card identity attestation back to that root. The card is genuine. It has never been provisioned. The enrollment client knows what kind of token it is looking at and what the policy says to do with one.

The client provisions the token. It generates a new keypair on the card under a policy that requires user PIN for use and marks the key non-exportable. The card produces a key attestation: a statement signed by the card’s manufacturer-installed attestation key asserting that this specific public key was generated on this specific token, has these specific usage constraints, and will never leave the hardware. The enrollment client builds a CSR for the new key and has the card sign it, which is the standard proof that the requestor controls the corresponding private key. It then packages the CSR, the user’s identity claim, and the card’s attestation into an ACME request and sends it to the RA.

That is the first link. The chain is going to have several.

The RA receives the request inside its measured execution environment. It verifies the CSR signature against the public key in the CSR, which proves the requestor controls the corresponding private key. It verifies the card’s attestation against the manufacturer chain. Yes, this is a genuine retail token from a manufacturer we trust. Yes, the key in the CSR is on the card. Yes, the usage constraints match policy. It resolves the identity claim against the corporate identity provider. Yes, this user exists. Yes, they are entitled to this credential type. Yes, their device posture matches. It evaluates the Cedar policy against the cross-product of the device, the identity, and the requested profile. Yes, all of it permits issuance. It builds an authorization context, signs it with the key bound to its measured environment, and sends it to the signing oracle.

The signing oracle receives the context inside its own measured execution environment, over mutually-attested TLS where both sides have verified the other’s measurement before they would talk. It does not simply believe what the RA told it. For facts backed by independently verifiable evidence, the oracle re-verifies that evidence against its own configured verifier. In this example, it re-verifies the card’s attestation, cross-checks the claims the RA made about the card, the requested profile, and the requester, validates the profile binding, the certificate type, the validity window, the structural invariants the profile requires, and confirms that this RA is authorized to ask for this kind of issuance. It rejects replays. Only then does it ask the custodian for the signing key. The custodian releases it because the oracle’s attestation matches the published image. The oracle signs. It produces a fresh per-operation attestation binding this certificate to this measured execution environment and to the RA that authorized it. The issuance is written to a transparency log that independent witnesses cosign.

The bundle that comes back to the enrollment client contains the card’s attestation that the key is on the token, the RA’s signed authorization context naming the identity, the profile, the policy digest, and the verifiers it ran, the oracle’s per-operation attestation binding the signature to a measured binary on a measured platform, the custodian’s evidence that the key was released only because the oracle’s measurement matched, the transparency log inclusion proof and witness cosignatures showing the issuance was published before the certificate was returned, and the certificate itself.

Every party in the flow did the logical equivalent of what every other party did. It verified upstream evidence against a manufacturer, platform, custodian, or witness root it already trusted, did its work, and produced its own evidence for the downstream party to verify. The bundle packages those attestations and proofs so the subscriber, or anyone the subscriber shares it with, can re-walk the chain end to end against the same trust roots, without having to take any single party’s word for it.

The card manufacturer’s root says the key is on the hardware. The chip vendor’s TEE root says the RA ran the measured image. The chip vendor’s TEE root says the oracle ran the measured image. The HSM vendor’s root says the key was released to a measurement that matched. The witness network’s cosignatures say the log is what the operator published, not a fork served to one relying party. A relying party who trusts each of those roots can verify the certificate’s basis of issuance from the evidence bundle, instead of relying only on the CA’s institutional promise.

The CA is not asked only to be trusted. The CA produced evidence.

What is built

The architecture above runs in preproduction today on AWS and Google Cloud.

On AWS, that means Nitro Enclaves for enclave-based issuance components and NitroTPM-backed evidence for VM-level identity, measured boot, and workload posture. On Google Cloud, it means AMD SEV-SNP and Intel TDX Confidential VMs for protected execution, Cloud HSM as custodian, and vTPM-based evidence where VM boot state or workload identity needs to be represented.

Classical ECDSA and post-quantum ML-DSA-65 (FIPS 204) hierarchies operate in parallel. ML-KEM-768 (FIPS 203) is the subject key for TLS key-exchange certificates. Cedar policy with the fail-open lint is enforced in the oracle. Per-operation attestation, evidence bundles, the custodian gating key release on measurement, mutually attested TLS between RA and oracle, end-to-end on both clouds.

Each trust domain signs from its own sub-CA, and classical and post-quantum issuance never share a key. Machine, machine-with-EAP, user, group, workload, smart card, TPM AK, and SSH all sit under separate sub-CAs; classical and PQ are separated within each family. A compromise of any single signing key bounds the damage to one family-and-algorithm slice. The architecture treats hierarchy multiplicity as security-domain separation, not as an algorithm-bridging side effect.

Profiles wired up today cover machine authentication including EAP-TLS, DNS-validated TLS server certificates, workload identity including SPIFFE-style URI identifiers, user and group signing and encryption, smart card and PIV logon, TPM AK bootstrap, and SSH user, workload, and host certificates. Each family that supports both has a classical and a post-quantum variant.

The platform breadth is there because no single TEE family fits every customer environment, and the architecture should not be hostage to one chip vendor or one cloud.

The 2029 problem

None of what I’ve described above is algorithm-bound. That matters because the algorithms are about to change.

In March 2026, Google’s Heather Adkins and Sophie Schmieg set 2029 as the target for completing Google’s migration to post-quantum cryptography. Google’s timeline matters beyond Google. They run Chrome and Android, and when they move, the WebPKI moves with them. CNSA 2.0 puts 2027 on software and firmware signing in National Security Systems and 2030 on general use. The CABF is working through its own timeline. Federal procurement requirements are already moving.

The CA infrastructure that exists today was designed for the snapshot of math problems that the 2029 transition invalidates. Every CA in production is going to be re-architected before it lands. The algorithms expire. The migration is not optional.

The transition itself is going to be heterogeneous. The classical-PQC X.509 path is going to run for a long time alongside what eventually replaces it. Merkle Tree Certificates — batched, transparency-native issuance with much smaller per-certificate overhead — are a likely part of the answer to ML-DSA’s signature size on the wire. The architecture above does not care which container format the certificate is in. The attested issuance pipeline, the custodian-gated key release, the evidence bundle, the transparency log — all of it operates on the issuance side. MTC issuance benefits from runtime evidence the same way X.509 issuance does, and the patterns in this post carry over.

A CA built on the runtime-evidence pattern does not cost more to deploy at the moment you are already rebuilding. It costs more only if you skip the rebuild, and skipping is not on the table. The hardware-anchored credential side of the wire has been arriving in production for a decade. The issuance side is the part still running on the old shape. The PQ deadline is the forcing function that makes the issuance side move. The choice is between rebuilding the old shape with new algorithms, and rebuilding it with the same discipline that the relying-party side has spent the last decade adopting.

Short lifetimes with ARI make the operational side tractable. Seven-day certificates with ARI-driven renewal turn the PQ migration from a flag day into a moving window. The fleet rotates without an emergency, without anyone touching a machine, because the CA can shorten the renewal window for specific machines or profiles whenever it wants to.

That is what the next CA looks like. It is not a different CA than the one I have been describing. It is the same one.

A CA Built for the Threat Model We Actually Have

1 Reply

This builds on earlier posts on what attestation actually proves, what confidential computing is and isn’t, and an honest accounting of the problems with the current generation of TEEs. None of those problems go away here. The argument is that despite those limitations, attestation is an important tool. Certificate issuance is overdue to use it.

Back in the 1990s, I was doing some consulting for DigiNotar, yes, that DigiNotar. They had CA facilities in a data center whose perimeter still had WWII-era anti-tank obstacles, large concrete barriers sometimes called “dragon’s teeth.” Of course, this was an artifact of the facility’s history, but data centers are designed from a security perspective with layers of physical protection, including barriers, mantraps, biometrics, individual vaults with cages, individual racks with their own locks and biometrics, cameras, and more. The threat of physical theft, destruction, or manipulation is exactly what these facilities are designed to mitigate.

When building a CA inside one of these facilities, we design yet another layer of protection. Administration networks are segmented from transaction networks, interconnects from supporting infrastructure, the issuance environment from the systems holding root keys. We add our own physical segmentation on top of that so we can build controls around multiple parties being necessary for the more sensitive operations, while still letting routine hardware maintenance happen on the schedule the SLA needs.

These are all useful and important things, but the reality is that CA key material is not likely to be physically stolen. It is more likely to be compromised from the outside. We solve these problems through design, not by writing more code.

Meaningfully measured code forces design upstream of the code itself.

A measurement of a monolithic blob proves almost nothing useful. A measurement that names a specific role, a specific security domain, and a specific assertion the verifier is supposed to act on, proves something. The roles, the domain boundaries, and the questions the verification has to answer have to exist before any code is written, or the attestation is a signature on nothing in particular.

In operating system design, we have similar problems. In naive systems, we load cryptographic keys into memory on a running, network-connected system, accidentally exposing ourselves to memory-disclosure bugs where a network attacker can steal keys. Heartbleed is the canonical example, but the class is what matters. We do this because it is simpler and faster, but it is also less secure. As systems designers, we address this by moving those keys out of the process of the network-connected application and into a different user context. That way, an attacker cannot simply get the network-connected service to dump memory. They have to get persistence and cross a kernel-enforced user boundary.

This is old wisdom. Least privilege and privilege separation exist because network-facing code should not also be the thing that controls the keys.

A parallel showed up in early cryptocurrency exchanges. Hot wallets were used as signing oracles because the design and deployment work needed to prevent that had not been done, and many of the high-profile compromises of that era trace back to that gap. The exchanges that survived learned to put boundaries between the wallet and the network. That boundary did most of the work. The dragon’s teeth around the building did the rest.

When third parties need to rely on external services operating in these environments, they often rely on auditors to attest that management assertions about operational practices are actually being followed. These assessments are usually performed by CPAs, not security specialists, which can limit their value. They also often rely on sampling a small portion of transactions to confirm that the controls being evaluated are being followed. That sample is drawn from evidence provided by the entity being audited, which is also the party paying for the audit.

All of the things discussed above help bring some minimal level of transparency and verifiability, but it is turtles all the way down, layer on layer, none of them reaching the runtime where the actual compromise happens. This is where confidential computing, and solutions like Private Cloud Compute, start to matter.

Policy changes meaning in this model. In the traditional assurance world, policy is a written promise. The CA publishes a CP or CPS, the operator commits to following it, and the auditor samples evidence to decide whether that promise was kept. In a runtime-evidence model, policy becomes part of the mechanism. A measured binary evaluates a specific policy, produces a decision, and the digest of that policy travels with the evidence. The shift is from policy as promise to policy as enforcement, from “trust us, this is what we do” to “this is the policy the measured system actually applied.”

Apple’s Private Cloud Compute is the worked example. PCC nodes attest to the binary they are running, refuse to do work for clients that cannot verify that attestation, and publish every production build for public inspection. The user’s device, not Apple, decides whether a given node is acceptable. That inversion, the relying party verifying the service rather than the service asserting to the relying party, is the part of the pattern that matters. The pieces are not new individually. The combination, at the scale Apple shipped it, proves the pattern is real. The third-party security reviews prove the architecture is serious. Attacks on confidential computing do not refute that point. They prove there is now a boundary worth attacking, measuring, and improving.

Apple is not the only proof point. Signal used SGX remote attestation for private contact discovery in 2017, with clients verifying that the enclave was running the expected open-source code. WhatsApp’s end-to-end encrypted backups use an HSM-based Backup Key Vault to keep recovery keys out of the ordinary service path, and that design was publicly reviewed by NCC Group. Microsoft’s Confidential Consortium Framework powers Azure Confidential Ledger. Different systems, different threat models, same direction of travel. High-assurance services are moving from institutional assurances toward runtime evidence.

What it looks like

Concretely, a CA built on the Private Cloud Compute pattern looks like this.

Issuance is split into two attested components. The first, the registration authority, takes the certificate request, resolves identity from authoritative sources, evaluates the issuance policy, and produces a signed authorization context. The second, the signing oracle, holds the CA private key and produces the signature. Each runs in a separate attested enclave, and each is measured separately. This means policy can evolve without re-measuring the key-custody component, and keys can rotate without re-measuring the policy component.

The policy layer matters here too. Each component is not just running code, it is making a verifiable policy decision before it acts. The RA decides whether the request is authorized and the identity evidence is sufficient. The signing oracle decides whether the RA, the request, and the authorization context are acceptable before it signs. The evidence does not just say which binary ran. It also says which policy that binary evaluated.

The two components do not trust each other because they are on the same network. They trust each other through attestation, mutually verified at every connection, and the signing oracle does not merely accept the RA’s conclusion. Before it signs, it independently verifies the RA’s attestation, checks that the authorization context is fresh, confirms that the request is bound to an allowed profile, and verifies that the policy facts asserted by the RA match the evidence presented to the oracle. A compromised RA, even one with its own signing key, does not get to mint an out-of-profile certificate, bypass attestation, or turn the CA key into a general-purpose signing oracle.

The CA private key is not loaded by the operator. It is held by a custodian, a hardware security module, a cloud KMS, or another enclave, and it is wrapped so that the custodian will release it only to a signing oracle whose attestation matches a published image. The list of acceptable images is small, public, and updated through a documented process. An operator who runs a different binary, however benign the reason, does not get the key. The dragon’s teeth around the data center are still there. They no longer have to do the whole job.

Both the RA binary and the oracle binary are built from public source and are reproducibly buildable. Anyone can rebuild from the published sources, compare their measurement to the one in the attestation, and confirm that the two match. This is the part of the model that makes trust mean something specific. Not the operator’s word, not the auditor’s snapshot, not the CA’s policy statement, but the build process and the published source. To verify what a particular issuance was actually done by, you would not need to be admitted to the data center. You would need a compiler.

Each issued certificate is accompanied by a portable evidence bundle, signed by the attested issuance system. The bundle names the binary that produced the signature, the attestation root that vouched for the binary, the RA policy decision, the oracle policy decision, the identity assertion the RA accepted, and the inputs the oracle independently verified before signing. A relying party who trusts the chip vendor’s attestation root can determine for themselves whether the issuance was performed by code on the published list, against the policy on the published list, by an RA that accepted the identity claim it claimed to accept. The CA is not asked to be trusted. The CA is asked to produce evidence.

None of this removes the HSM, the auditor, or the operator. The HSM is still excellent at the threat it was built for, and a custodian holding a key wrapped to an attestation policy is still doing HSM work under the hood. The auditor is still needed to attest that the published policy is sensible, that the source matches the binary, that the threat model is honest, and that the runbook is followed in the moments where attestation cannot help. The operator is still needed to run the infrastructure and respond when things break.

What changes is what they are asked to prove.

Today, a relying party mostly gets institutional assurances. The CA says it followed its policy. The auditor samples evidence and says the controls were operating. The operator says the production system was the one described. Those are useful assurances, but they are indirect. They do not let the relying party inspect the actual path between a request, a policy decision, and a signature.

A Private Cloud Compute style CA changes that. It turns the issuance path itself into evidence. The question is no longer only whether the CA says it followed the rules. The question becomes which measured binary evaluated this request, which measured binary signed it, which policy digest was used, which identity evidence was accepted, what validation methods were used during issuance, and whether all of that matches the public commitment the CA made.

When the source is open and reproducibly buildable, that evidence includes a hash of the code that made the decision and signed attestations about the runtime elements that went into that decision. When the code is not open source, third parties can come in and validate the source, the build process, and the correctness of the claims, as Apple did with Private Cloud Compute. The public hashes then let others verify that the code claiming to provide these guarantees is, in fact, the code that ran.

Open source is not magic, and the point is not faith in “many eyes.” The point is that this shifts the emphasis from betting on physical security and operational practice audits to secure system design and cryptographic evidence about what code actually ran and what it actually did.

That is the threat model mismatch, and it is not only a CA problem. We built the WebPKI around buildings, cages, ceremonies, HSMs, and audits because those were the tools we had. We did the same thing cryptographically. We built systems around the assumption that factoring large composites and solving discrete logs on elliptic curves were out of reach. Q-day changes that assumption. Runtime compromise changes the operational assumption just as fundamentally.

We apply the same instincts in any environment we want to call high-assurance. They still matter, but most of the failures we care about are not physical failures. They are logical, remote, operational failures in the runtime path. The rate of change makes that gap wider every year. Annual audits are retrospective, and between them systems change thousands of times, so what the auditor described is rarely what is actually running when a relying party sees a certificate.

Cryptography turns security problems into key-management problems. AI turns assurance problems into runtime-evidence problems. Once agents are making decisions, calling tools, and changing state, the question is no longer what policy you wrote or what control an auditor sampled. The question is what actually ran, what it saw, what boundary contained it, what policy constrained it, and what evidence survived execution.

A Private Cloud Compute style CA gives us a way to make that path visible, attestable, and independently verifiable. The same pattern applies wherever the gap between what we say a system does and what it actually does at runtime matters.

The First AI-Built Zero-Day Is Not the Interesting Part

Leave a reply

In the mid 90s I worked at a company called Cybersafe. Today it would get labeled an IAM/SSO vendor. What we actually built was a first-generation security platform: Kerberos, password management, PKI-based MFA, key management, host intrusion detection, and what would now be called zero trust access. The company failed for the usual startup reasons. People. Corporate Politics. Timing. The technology was a decade ahead of its market.

One debate from that period has stayed with me. As we expanded into host intrusion detection, the question of automated response kept surfacing. Could a system safely act on its own to contain an intrusion in progress? Drop a connection. Kill a process. Isolate a host. Nobody on the team could imagine a credible answer. The false positive risk was unbounded. The response itself could be weaponized. The rule sets were not trustworthy enough to delegate authority. We shipped detection and let humans make the call.

That debate has an answer now, and it is not the one we expected. Automation on the offensive side is not new. Worms, exploit kits, credential stuffing, and phishing infrastructure have been automated for decades. What is new is broad delegated judgment at machine speed, in the hands of people who do not have to worry about false positives because the blast radius is somebody else’s network.

What the report actually shows

The interesting question is not whether AI helped produce a zero-day. That was inevitable. The interesting questions are operational. What kinds of systems make bad machine judgment cheap enough to deploy at scale. What kinds of defensive systems are still pretending human review is the control boundary.

Google Threat Intelligence Group’s latest AI Threat Tracker report documents the first zero-day exploit that GTIG says it has high confidence was developed with AI assistance. The headline framing is technically correct. The specifics tell a more interesting story.

The exploit was a Python script that bypassed 2FA on an open-source web-based system administration tool. It required valid user credentials in the first place. The criminal group planned a mass exploitation campaign, and Google disrupted it through responsible disclosure to the vendor. GTIG identified the artifact as AI-developed because the code carried obvious tells. A hallucinated CVSS score. Textbook Python formatting. Detailed help menus. Educational docstrings characteristic of training data. The artifact still carried the seams of its production.

This is not the LLM failing at the hard part. The vulnerability itself is a real find. GTIG specifically notes that the 2FA flaw stems from a hardcoded trust assumption, a high-level semantic logic flaw of the kind that fuzzers and static analyzers tend to miss but that frontier LLMs can reason about by reading developer intent. The model did discovery work that previously required a competent human auditor. Where the operation broke down was in weaponization. The attacker shipped an artifact that still looked like a tutorial.

This is a familiar failure pattern showing up on the offensive side for the first time. Fluency reads as competence. The attacker trusted an artifact with hallucinated metadata and educational comments still attached because it looked like a real exploit, in the same way over-eager engineering teams hand agents production credentials because the agent sounded like it knew what it was doing. The criminals here got bitten by the same dynamic that has been producing outages and data loss in vibe-coded production systems for the last eighteen months. The substrate is doing some of the work of inviting the misconfiguration.

Hultquist’s thread on the report is hedged correctly. The importance is the trajectory, not this specific specimen. Pull the camera back and the rest of the report is more interesting than the lede.

Three things worth surfacing

APT45 sending thousands of repetitive prompts. The North Korean group has been observed using recursive prompting to analyze CVEs and validate proof-of-concept exploits at scale. That is the industrial-scale answer to LLM variance. Solve the quality problem by amortizing across volume, then have humans cherry-pick the outputs that survived validation. The same statistical strategy that makes modern fuzzing work, applied one layer up the stack. The model does not have to be reliable. The pipeline has to be cheap enough that unreliability does not matter.

CANFAIL and LONGSTREAM using LLM-generated decoy code. A Russia-nexus intrusion cluster has been deploying malware that uses LLM-generated code to conceal malicious functionality. GTIG documented LONGSTREAM containing 32 instances of code querying the system’s daylight saving status, repetitive benign-looking activity used to camouflage the malicious core. CANFAIL carries similar filler logic with LLM-generated comments self-describing the decoy blocks. The stylistic noise of LLM output is becoming the obfuscation layer. The verbose docstrings. The textbook structure. The over-explained variable names. These used to be tells. They are now camouflage. Any heuristic built on the AI-tell will start producing false negatives.

The wooyun-legacy skill plugin. A specialized GitHub repository is being distributed as a Claude code skill plugin that integrates a distilled knowledge base of over 85,000 real-world vulnerability cases from the Chinese bug bounty platform WooYun (2010 to 2016). This is the supply side of the same market. Skill packs are tooling. Tooling gets distributed. The economic logic for adversarial skill packs is identical to the economic logic for legitimate ones. Any platform hosting them inherits a familiar problem. App stores and package registries have been working through it for two decades. Making trust decisions at distribution scale about code from parties you cannot directly inspect.

Both sides are running on the same substrate

On the defensive side, Google is using Big Sleep to find vulnerabilities and CodeMender (Gemini-driven) to fix them automatically. The criminals are pulling from a model class indistinguishable from the one Google is running its defensive tooling on. Both sides have access to the same substrate. The differential collapses to data quality, harness sophistication, and discipline around permissions.

That last one is the part the 90s HIDS conversation did not anticipate. It is also the part that should be the least surprising. The controls discipline did not get easier because the platform got more capable. If anything the gradient got worse. A confused regex IDS in 1999 had a bounded action space. The rule set was enumerable. You could write down what it would do wrong. A confused agent in 2026 has whatever action space its credentials grant it, which in most deployments is more than it should. The fluency that made it easy to give the agent broad permissions in the first place is exactly the property that makes its failures look reasonable in the moment.

The race Hultquist refers to is real, and it has started. The race is not about model capability. Both sides are running models from the same vendors, often the same model. The race is about who has better-curated data feeding their harnesses. Who has stricter discipline around what their automation can touch. Who has the institutional memory of what happens when you delegate authority to a system whose judgment you cannot audit in advance.

The HIDS debate from the mid-90s got an answer. It came from the other side of the wire. Not because defenders learned how to trust autonomous judgment, but because attackers learned they did not need to. They could delegate broadly, externalize the blast radius, and let volume compensate for judgment. The defensive answer cannot be more vibes, broader credentials, and better prompts. It has to be the inverse. Narrower authority. Better harnesses. Replayable decisions. And institutional memory about what happens when fluent systems get mistaken for trustworthy ones.

AI Is Not Why They Are Cutting (Yet)

Leave a reply

Back in 2000, the rule of thumb at Microsoft was that each employee needed to average roughly $600K in top-line revenue. Inflation adjusted, that is about $1.1M to $1.2M today. Microsoft was a high-margin software monopoly at peak, so it is not a universal benchmark, but it gives a sense of what disciplined operating leverage looked like even at a company printing money.

Over the last decade, and especially during the COVID-era zero-rate and QE environment, many companies responded to dysfunction by hiring around it instead of fixing it. Cheap capital reduced the pressure to make hard operating decisions. Necessity is the mother of invention, but cheap money suppressed that necessity for a long time.

Then two things changed at roughly the same time. Rates went from zero to five, and Section 174 of the tax code stopped letting companies expense software developer salaries in the year incurred. The R&D amortization rule from TCJA kicked in for the 2022 tax year, forcing five-year amortization domestically and fifteen years for work done offshore. At the exact moment capital got expensive, a major software-company cost center became less friendly from a cash-tax and after-tax economics perspective.

Now AI has added a new pressure. Companies are adopting AI quickly, but we are still early. Much of what is happening inside enterprises is still R&D, experimentation, platform buildout, workflow redesign, and internal tooling. That work is not free. It comes with token costs, infrastructure commitments, GPU capacity, vendor contracts, and a lot of expensive trial and error.

Jensen Huang has made the point, in characteristically aggressive form, that if he pays someone $500K, he expects them to use a meaningful amount of compute to become more productive. Whether or not you take the specific numbers literally, and you probably should not since Nvidia sells the machines that consume those tokens, the economic point matters. AI spend has to come from somewhere.

That is the part many layoff narratives miss. Companies are not simply replacing workers with AI. They are also reallocating budget toward AI. Token budgets, model access, inference costs, internal AI platforms, data infrastructure, and R&D commitments are becoming real line items. To fund them, companies are looking at the headcount they accumulated under different interest-rate assumptions, different tax assumptions, and a different view of software demand.

There is also a demand-side story. COVID pulled years of enterprise software adoption into eighteen months, and a lot of what gets reported as growth now is ARR rotating through M&A rather than new logos landing. In parts of the market, revenue is moving around as much as it is expanding.

That is the real backdrop for the wave of layoffs. AI is the story being told on earnings calls. The reality is accumulated management debt finally meeting a cost of capital that punishes it. Layers of process. Unclear ownership. Duplicated work. Headcount that grew faster than execution improved. And now, on top of that, companies need to make room for a new class of AI-related spend.

The pressure also lands hard on old farts like me. We are expensive. And to be honest, some of us (not all) do not want to change how we work or keep up with how the technology is evolving. That makes us easy targets when finance needs to hit a cost number. AI gives the story a forward-looking sheen, but the underlying move is simpler: reduce expensive headcount, flatten layers, correct years of operational laziness, and redirect budget toward the new thing everyone believes they must fund.

AI is real. The layoff narrative around it usually is not. When you read a layoff announcement blaming AI, you are mostly reading a press release about cost of capital, tax policy, demand pull-forward, AI infrastructure spend, and an org chart that finally got too expensive to defend.

Read the 10-Qs, not the blog posts.

UNMITIGATED RISK

un.mit.i.gat.ed: Adj. Not diminished or moderated in intensity or severity; unrelieved. risk: N. The possibiity of suffering harm or loss; danger.

Why FIPS 140 Means Running Old Code

The Certification Ends Where the Code Begins

What the paper trail shows

Where the evidence runs out

Why I care about this for HSMs and BMCs

Documentation, inference, evidence

Steve Jobs, AI, and the Problem of Analysis Without Ownership

The Breaker, the Priest, and the Philosopher

The Sage

The Gadfly

The Builder Evangelist

The Statesman

The Theorist

The Refusenik

The Priest

The seam in the taxonomy

The point

The Prompt Is an Argument

The Prompt Is the Meaning

A CA That Produces Evidence, Not Promises

Two components, one evidence chain

What I mean by evidence

What each attestation lets you verify

The client’s attestation

The RA’s environment attestation

The oracle’s environment attestation

The custodian’s attestation

Policy as mechanism, not promise

What this looks like in practice

What is built

The 2029 problem

A CA Built for the Threat Model We Actually Have

What it looks like

The First AI-Built Zero-Day Is Not the Interesting Part

What the report actually shows

Three things worth surfacing

Both sides are running on the same substrate

AI Is Not Why They Are Cutting (Yet)