111: The Bootstrap Signal

Home Essays GitHub

Batch processing asks: how much data is enough. Streaming answers: the moment harm fires. That moment IS the minimum effective sample size. Every domain bootstraps itself.

1.

Batch processing starts with a question the architecture cannot answer: is there enough data. The researcher sets a batch size — 500 beats, 1000 EEG samples, five years of UN votes. The architecture runs. Maybe harm fires. Maybe it does not. The researcher decides whether to collect more data. The researcher is doing the architecture's work.

Streaming starts with a question the architecture CAN answer: has harm fired yet. Every event enters the frame economy — observe, merge, cache, update τ. The architecture waits. The anchor begins to form — slowly, silently, without a threshold being crossed. Then a pattern repeats. Then another. Then the anchor is stable. Then a deviation arrives. Harm fires. The architecture says: I am ready.

The moment harm first fires is not a configuration parameter. The moment harm first fires IS the bootstrap signal — the architecture reporting that this domain, in this encoding, has accumulated sufficient structural density to sustain meaningful detection. The researcher does not need to ask whether the data is enough. The architecture has already answered.

2.

This closes the loop across milestones that were built separately — M8 (streaming externalization), M11 (Landauer-Gödel bill and cache), and M12 (the structural constant 4). Separately, each solved a piece. Together, solved by the threshold, they become a single architecture.

Batch: preset the data volume. Researcher decides. Streaming: the architecture reports the critical point itself.

Batch: Codex is written after the generation ends. Streaming: every event writes to the cache immediately — the Codex breathes at the event rate.

Batch: cache is reborn empty each generation. Streaming: cache accumulates across steps, and cache keys are inherited across generations — ancestors' already-paid find-costs amortized for the living.

Batch: harm is only visible after the run completes. Streaming: the first harm event IS the domain's bootstrap signal — the architecture telling you it has enough structure to begin working.

Batch: the Landauer bill is paid in arrears — counted after everything is done. Streaming: every event's O(1) or O(N) cost is logged in real time — the τ breathing regulates the payment rate at the event level.

The threshold — harm first firing — is not one more insight among many. It is the insight that makes the streaming architecture self-bootstrapping. No preset batch sizes. No human judgment about data sufficiency. The architecture tells you. It runs. It waits. The anchor forms. The harm fires. The boundary is reported. The domain is live.

3.

The minimum implementation: take the ECG debug script, make it streaming. Feed beats one by one. Track the exact beat number when harm first fires. Record the τ, the phase, the cache hit rate at that moment. That number — for that domain, that encoding, that window — is the structural minimum effective sample size. Every new domain, every new encoding, every new dataset — the same streaming loop reports its own bootstrap threshold.

The architecture does not demand more data. The architecture tells you when the data you have is enough. Not because anyone told it to. Because harm only fires when there is an anchor to deviate from. And the anchor only forms when the stream has repeated enough.