How TEE attestation actually works

2026-06-17 · ~1500 words

"TEE-attested inference" is a useful phrase. It is also a phrase people repeat without understanding the underlying mechanism. This post walks through Intel TDX quote structure, NVIDIA Confidential Computing GPU attestation, what the measurement values actually mean, and how to verify a quote yourself.

▸ The problem attestation solves

You send a request to a remote server. The server claims to run your computation inside a secure enclave. How do you know? The operator says so. The operator could be lying. The operator could be compromised. The operator could be served a National Security Letter and forced to lie. You cannot verify any of this by reading marketing pages.

Attestation closes that gap with hardware. The CPU itself signs a message saying "I, this specific physical CPU, executed exactly this code in a confidential mode at this moment, here is the cryptographic proof". The hardware vendor (Intel, NVIDIA) anchors a chain of trust to a root certificate baked into the silicon. You verify against the vendor's root cert. The operator is not in the trust chain.

This is the only known mechanism for trusting a remote computation without trusting the remote operator.

▸ Intel TDX: the CPU side

Intel TDX (Trust Domain Extensions) is the successor to Intel SGX for confidential computing at VM granularity. A TDX-enabled CPU can create Trust Domains (TDs): VMs whose memory is encrypted with a key the OS, hypervisor, and bare-metal operator cannot read. The encryption key is generated inside the CPU and never leaves silicon.

A TD measures everything it runs. Measurement means SHA-384 over the loaded code, the initial register state, and the configuration. The measurements are stored in TD-specific hardware registers (MRTD, RTMR0..RTMR3, mr_config_id, mr_owner, mr_owner_config).

When the TD wants to prove what it's running, it requests a quote from the CPU's Quoting Enclave. The quote contains:

mr_td: SHA-384 of the initial TD image. Identifies the exact binary loaded.
mr_config_id, mr_owner, mr_owner_config: additional measurement slots for runtime configuration and ownership.
RTMR0..RTMR3: runtime measurement registers, extended as the TD makes attestation calls.
td_attributes, xfam: flags indicating debug mode, certain ISA features.
report_data: 64 bytes of arbitrary data the TD can pin into the quote (typically a nonce + transcript hash).
signature: ECDSA-P384 over the report, signed by an attestation key derived inside the CPU and certified by Intel.

The signature chain roots at Intel's TDX root certificate. Verifying the quote means: parse the report, check the signature against the attestation cert, walk the cert chain back to Intel's root, and verify each measurement against the value you expect for the enclave image you trust.

▸ NVIDIA Confidential Computing: the GPU side

Modern AI workloads need GPUs. A TDX-protected CPU running an inference model with the model itself on an unprotected GPU is mostly theater: the model weights and intermediate tensors are visible to the host. NVIDIA Confidential Computing closes this gap for H100, H200, B100, and other CC-capable parts.

The GPU operates in confidential-compute mode. Memory and PCIe traffic between the CPU and GPU is encrypted. The GPU's vBIOS, driver state, and firmware are measured and reported in an attestation report.

The attestation report contains:

GPU UUID and serial: unique hardware identifier.
vBIOS hash: the firmware version actually running.
Driver state hash: what driver is loaded.
CC mode flags: confidential mode enabled.
Nonce: challenge from the verifier to prevent replay.
Signature: chain rooted at NVIDIA's attestation root cert.

NVIDIA exposes the attestation report through their NVTrust verifier service or through the local attestation SDK. The chain of trust is independent of Intel's chain. Both must verify for the full TEE-attested guarantee to hold.

▸ Putting them together for inference

For Phantom's TEE tier, the request flow is:

Client sends a chat completion request over TLS to Phantom.
Phantom forwards the request to the upstream confidential-computing host.
The host's TDX-protected VM (the TD) receives the request. The TD also has CC-mode GPU(s) attached.
The TD loads the model into encrypted GPU memory if not already resident.
The TD runs inference. Tensors stay inside the CPU's encrypted RAM and the GPU's CC-mode memory.
The TD writes the response. Phantom forwards back over TLS.
Phantom also records an inference ID. The TD can later produce, on demand, a fresh TDX quote covering its current measurement plus an NVIDIA CC attestation covering the GPU(s) used. The quote's report_data binds to the inference ID.

Verifying client-side: query POST /v1/inference-attest with the inference ID. Phantom returns the TDX quote bytes plus the NVIDIA CC attestation bytes plus the endorsement chain. Run them through your own verifier (or Phantom's at POST /v1/verify/tdx and POST /v1/verify/gpu).

▸ What a successful verification proves

If both quotes verify against their respective vendor roots and the measurements match the expected values for the model image you wanted, you know:

The CPU was a real Intel TDX-capable part, in a non-debug confidential mode, with secure boot.
The TD ran exactly the binary identified by mr_td (no swap, no MITM, no instrumented build).
The GPU was a real NVIDIA CC-capable part with the expected vBIOS, in confidential mode.
The CPU-GPU bus was encrypted.
The enclave decrypted your request inside the protected memory boundary, ran the model, and emitted the response. The host operator could not read RAM during execution.

The guarantee is hardware-rooted. You are not trusting Phantom for any of it. You are trusting Intel's silicon design and NVIDIA's silicon design (a different and much smaller trust assumption).

▸ What attestation does NOT prove

Attestation is precise. It proves narrow things. It does not prove:

That the model weights are what they claim. The TD measurement covers the loader binary and configuration. It does not by itself cover gigabytes of model weights. Most implementations include a weight-hash check inside the measured boot chain, but you have to verify that chain too.
That the model output is "correct". Attestation says "this code ran in confidential mode". It does not say the code is bug-free.
That nothing leaks through side channels. Timing, cache, power-draw side channels exist in confidential-computing literature. Vendors mitigate; verifiers should still treat side-channel resistance as an open research problem.
That the prompt was not logged outside the enclave. If the TD does I/O outside the encrypted boundary (e.g., writes prompts to a log file), the measurement may still verify but privacy is gone. The measured boot has to cover the data-path policy too.
That the operator cannot be served process. Hardware attestation does not protect against legal compulsion of the operator. It protects against the operator's CURIOSITY or COMPROMISE. Two different threat models.

▸ Verifying without trusting Phantom

The honest version of "do not trust us" is: ask for the quote and verify it against vendor roots you fetched independently.

curl -X POST https://phantom.codes/v1/inference-attest \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"inference_id":"chatcmpl-..."}' \
  | jq '.tdx_quote, .gpu_attestation'

The TDX quote is binary. Decode it with Intel's DCAP libraries or any open-source TDX verifier. Match mr_td against the expected enclave image hash for the model you called. Walk the cert chain to Intel's root. Verify the ECDSA signature.

For the GPU side, use NVIDIA's NVTrust verifier or the open-source NRAS (NVIDIA Remote Attestation Service) client. Same idea: decode, check signature, walk chain to NVIDIA's root, compare measurements against expected.

This is not casual work. Most users will not do it. The point is that the data is available, the verification is mathematical, and no trust in Phantom is required at any step.

▸ Why Phantom can't lie about this

If Phantom claimed TEE-attested inference but actually ran inference on a normal cloud GPU, the TDX quote would not verify. The signature would not match Intel's root cert. There is no way to forge that without breaking Intel's CPU-internal signing key, which would be the most expensive supply-chain compromise in history.

Phantom could route a quote from a legitimate TD that ran a totally different model. The report_data binding to the inference ID, plus the model-image hash in mr_td, makes that detectable. The user has to actually check those values.

The honest disclaimer: Phantom's claim about the proxy itself ("we hold no logs", "no IP retention") is policy, not hardware. The claim about upstream inference ("ran inside a TEE on this hardware with this code") is hardware. Different trust levels. We do not conflate them.