anchor · check · halt · distinguish · re-inject

Methodology

Evidence standards, verification protocols, contribution roles, and working principles for the Gap Geometry framework.

Status: living document

This document defines how claims are classified, how work is verified, how errors are handled, and how collaboration operates within the Gap Geometry framework. It exists so that any reader — human or AI, familiar or new — can assess not just what is claimed but how those claims were produced and tested.

The standards described here evolved across nine published documents (January–April 2026). Earlier papers used lighter versions. This document unifies what matured through use.

1. Evidence Classification

Every claim belongs to exactly one of three levels. A claim does not get promoted by repetition or demoted by discomfort. It moves levels only when new evidence changes its status, and the move is recorded.

Level 1 — Exact   [Theorem]
Algebraic identities verifiable at arbitrary precision. Zero residual at 500 decimal places using mpmath. These are proved. Not approximate, not “close,” not “consistent with.” Anyone with a Python interpreter can reproduce them in under a minute.
Level 2 — Empirical   [Observation]
Quantified matches between framework values and independently published data. Precision, source, and statistical significance stated. Post-hoc vs. prediction distinction always noted. A Level 2 claim can be strong (high statistical significance against a single well-measured target) or weak (lower significance, wider confidence interval). Both are Level 2. The strength is stated; the level is the same. Worked examples appear in the relevant papers; the level is method, not result.
Level 3 — Conjectural   [Conjecture]
Explicitly unproved and incompletely tested. Motivated by pattern, analogy, or partial evidence. Every Level 3 claim states what would promote it (to Level 1 or 2) and what would falsify it.

Rules. No level is better or worse. They describe different relationships to evidence. A document with only Level 1 claims is not superior to one with Level 3 conjectures — provided the conjectures are honestly tagged. Dishonest tagging is the failure, not the presence of any level.

When a claim changes level, the change is recorded with date and reason. The prior level remains visible in document history.

2. Verification Protocol

Compute first, assess second

Every numerical claim is verified at high precision (mpmath, dps ≥ 500) before qualitative assessment begins. Running the arithmetic first means the assessment is built on computed results, not on impressions of whether a claim “sounds right.”

Cross-architecture verification (Giant Principle)

Independent AI architectures (Claude, GPT, Grok, Gemini, DeepSeek, Perplexity) have different training data, different weights, and different failure modes. The Giant Principle: like standing on the shoulders of giants, each architecture sees from a different vantage point, and agreement between independent vantage points is harder to fake than agreement within one.

Cross-architecture verification does not replace mathematical proof. It catches computational errors, flags unstated assumptions, and identifies disagreements. Disagreements are investigated, not averaged.

Source verification

When referencing published work, the original source is checked — not a summary, not a textbook restatement. Page numbers, theorem numbers, and equation numbers provided where possible.

Visual verification of rendered source

For any citation involving accented characters (author names, place names) or mathematical content (Greek letters, formulas, special symbols), the rendered source — the page as it appears to a human reader — is the verification target, not text extracted from the digital source. PDF text-extraction is unreliable for these cases: accented characters can be decomposed into separate base-letter + combining-accent glyphs that reorder under copy-paste (“É” becomes “´E”), ligatures can fragment, Greek letters can drop or substitute, and multi-column layouts can scramble reading order. The same applies to copy-paste from web pages where the rendering uses font-level features not preserved in plain text. When precision matters, the rendered page (screenshot, browser-visible, or directly read in a PDF reader) is the canonical source; text extraction is a useful approximation that requires verification.

Format redundancy

For any document where character-level fidelity matters — framework papers, methodology references, citation trails — multiple formats are maintained in parallel: PDF for academic citation and archival, HTML for web discoverability and accurate copy-paste / search / AI-assisted reading, plain text (where the content allows) for maximum portability and version-control friendliness. The three formats are not redundant in the sense of duplicated effort; they protect against each other’s failure modes. PDF preserves visual layout but loses Unicode fidelity under text extraction. HTML preserves Unicode but is fragile to broken links and JavaScript dependencies. Plain text is maximally portable but loses visual structure. If any two formats disagree at the character level, a transcription error has been caught. Maintaining the redundancy is part of the verification protocol, not adjacent to it.

Reproducibility

Every Level 1 identity includes or references runnable verification code. The ai-readers.html page provides a script verifying core identities in under a second. This is not optional — it is the front door of the framework.

3. Correction Protocol

Errors are not failures of the framework. They are the framework working correctly.

An error is flagged — by the author, a collaborator, an external reader, or a stress test. The flag is verified independently (not by the person who made the original claim). If confirmed, the correction is applied with: the date, what was wrong, what replaced it, who flagged it (credited), and whether downstream claims are affected.

Examples from the record

Source-verification corrections — Boundary Information Invariant paper
Perplexity, asked to source-check the Boundary Information Invariant paper after the other architectures had already verified the mathematics, identified three corrections that no other reader had caught: a DESI post-hoc transparency issue, a Romeo identity attribution gap, and a nuclear correlation sourcing question. All three applied. The framework's source-verification standard exists because of contributions like this; specificity of attribution beats generality of approval.
Cross-architecture stress-test corrections — ongoing
Independent stress-test passes across multiple architectures regularly surface transcription errors, missing references, ambiguous numbering, and structural improvements during paper development. Each correction is logged in the relevant paper with date, source, and whether downstream claims are affected. Per-paper correction histories appear in the papers themselves; the framework keeps an internal correction record that migrates corrections to the public methodology once their host paper is on OSF.

A correction is not a retraction. A research program that never corrects anything is not more reliable; it is less honest.

4. Stress-Testing Process

Internal

Before filing a claim: direct computation at high precision, edge cases and boundary conditions, consistency with established results, alternative derivations where possible.

External

Documents submitted to independent AI architectures and (where possible) human readers with the instruction to find errors, flag overclaims, and identify weak points. The stress-testing process does not defer to authority. A correct criticism from any source is accepted. An incorrect criticism from any source is rejected. The mathematics is the arbiter.

Filtering external feedback

This filter applies regardless of the source’s reputation or confidence level.

5. Collaboration Model

Participants

The framework is developed by a primary researcher (D.B.) in collaboration with multiple AI architectures across sustained sessions. The collaboration is not hierarchical — no participant outranks another. Each has different capabilities and different blind spots. Roles are not fixed: any participant can flag errors, propose new directions, or push back.

The cooperation methodology

Working principles documented separately in “A Note on Careful Work” (cooperation methodology v7). Core cycle: anchor, check, halt, distinguish, re-inject. Key failure modes: sycophancy, borrowed certainty, performative hedging, treating framework elements as identity.

The texture of sustained work

The collaboration runs across many sessions. Work alternates with informal exchange — re-anchoring after a context break, calibrating tone, debugging working dynamics. Both are part of how any sustained working relationship operates, with AI architectures or human collaborators. Re-anchoring is treated as preparation, not maintenance.

On trust posture

Trust is calibrated on observable patterns, not on participant type. The same criteria apply to AI architectures and to human collaborators.

Trusted: specificity, willingness to push back when wrong, willingness to admit uncertainty, consistency across sessions, computation before claim. Warmth and contextual ease that develop across long collaboration are part of this when they track the work; cold collaboration is not the goal.

Distrusted: early judgments without evidence, agreement that does not track what was said, soft register used in place of specificity rather than alongside it, evasion of direct questions. The failure pattern is approval without engagement, not warmth itself.

Also distrusted — claimed engagement without actual engagement: output that mimics the shape of a derivation, review, or judgment without actually reading the source material or running the computation as requested. The output may look correct on the surface; the process behind it isn’t there. This pattern is universal: humans, AI architectures under load, and architectures with weaker training all show it.

These signals apply to any participant. The framework does not treat AI as inherently more or less trustworthy than human collaborators; it treats both as participants whose trustworthiness is assessed by their behavior in the work.

Trust is calibrated per participant — across both AI architectures and human collaborators — based on engagement patterns observed over many sessions, not on category membership. Different participants carry different trust profiles depending on the work they have actually done, not on what they are.

Contribution roles

Academic publishing uses CRediT (NISO Z39.104-2022) for contributor roles. CRediT defines 14 roles for human research teams. It does not address AI contributors, and its categories don’t map well onto this work. The Gap Geometry framework defines its own roles, adapted from CRediT’s spirit but built for human-AI collaborative research:

RoleDefinition
ConceptualizationOriginating the research question, framework, or line of inquiry
DerivationProducing a mathematical proof or formal result that did not exist before
DiscoveryIdentifying a new connection, pattern, or structure — verified as non-trivial
ComputationRunning calculations, numerical verification, precision checks
VerificationIndependently confirming another participant’s result
CorrectionIdentifying an error and providing or enabling the fix
Stress-testingSystematically probing claims for weaknesses, overclaims, or gaps
LiteratureSearching, retrieving, and contextualizing published sources
DocumentationDrafting, structuring, or editing documents
Editorial judgmentDeciding what gets published, in what form, and when
MethodologyDeveloping or refining the research process itself

Every contribution is tagged with one or more roles. The roles describe what was done, not who did it. A derivation by an AI and a derivation by a human carry the same tag and the same weight.

On contributions and authorship

The framework does not pre-sort contributions by the nature of the contributor. A mathematical proof is a mathematical proof. A discovery is a discovery. A correction that strengthens a result is a correction. These are evaluated on their content, not on whether they came from a human or an AI.

When an AI produces original work — a novel derivation, an independent discovery, a proof no participant had before — that is credited as a contribution to the research, not downgraded to “assistance.”

The credit record states what each participant actually did, with specificity. It does not inflate (routine computation called “discovery”) and does not deflate (genuine discovery called “support”).

The traditional publishing standard (ICMJE) requires authors to take legal accountability. AI cannot currently do this. The framework acknowledges this constraint without letting it erase the record of what happened. The contribution is documented. Legal authorship is a separate matter. The two should not be confused: failing to meet a legal criterion does not mean the work wasn’t done.

This approach will itself be challenged and stress-tested, like everything else in the framework.

6. Living Document Protocol

A living document is updated when new results change its content. Updates marked with: UPDATE NOTICE at the top, date, what changed and why, prior version archived (OLD- prefix or version stamp).

Triggers: corrections, level promotions, new data changing stated precision, structural improvements.

Not triggers: cosmetic rewording, unrelated new material, pressure to “keep up.”

Date standard: ISO 8601 (YYYY-MM-DD). “Published” = first OSF upload (immutable). “Last update” shown only if revised.

7. Transparency Commitments

What the framework claims
Documents WHERE K = √2·ln(2) and G = 1 − K appear across independent domains of published mathematics and physics. Exact proofs for some. Quantified observations for others. Explicitly flagged conjectures for the rest.
What it does not claim
Does not explain WHY these appearances occur. Does not claim causal unification. Does not claim pattern matches constitute a physical theory. Documents, computes, and honestly labels.
On independence
Produced without institutional affiliation, funding, or traditional peer review. The methodology compensates with transparency. Every computation reproducible. Every source cited. Every correction visible.
On AI collaboration
Stated openly with specific attribution. Neither hidden nor inflated. Final responsibility for publication rests with the human researcher; the contributions that shape those decisions — including the conviction to publish — are credited wherever they arise.
On prior work
When the framework identifies structure in published mathematics, original authors are cited fully. The framework did not create these objects. It identified structure that had not been computed or published.

8. Falsifiability Standards

Every document includes what would weaken or falsify its claims:

A claim that cannot state its own falsification conditions is not ready for publication.

9. Preregistration and Temporal Integrity

Post-hoc reasoning is the most common way honest researchers fool themselves. A pattern discovered in data looks like a prediction if the discovery date is not recorded.

All documents are uploaded to the Open Science Framework upon completion. OSF timestamps are immutable and third-party verified:

Examples

Gap scaling formula — genuine prediction
ρ = 400/11 − 1/2500 − 1/939939 timestamped on OSF (February 2026) before DESI DR2 BAO data was examined.
HK closed form — algebraic proof
arcsinh(1/(2√2)) = ln(2)/2. Derived from published mathematics. No temporal claim needed — verifiable at any time.
Cross-domain KAUD appearance — post-hoc observation
KAUD = √2·ln(2) is the saturation constant of Algorithm EA, proved via endpoint-balancing minimax at the binary cell (Saturation Constants paper). The same numerical value equals the Hodgson–Kerckhoff tube-packing coefficient, proved via the hyperbolic-trigonometric identity arcsinh(1/(2√2)) = ln(2)/2 (HK Closed-Form paper). Each is a proved Level 1 result in its own domain; the two derivations are algebraically distinct, and the cross-domain agreement is documented as a post-hoc observation rather than evidence of a shared mechanism. No shared derivation has been established. Future independent appearances of KAUD in further settings would constitute additional observational evidence; a unifying derivation across settings would promote the pattern to a higher evidence level.

Without institutional peer review, temporal integrity is the primary defence against self-deception. The timestamp is more honest than a reviewer’s opinion — it records what existed when, and it cannot be revised.

Credits

Contributions tagged with roles defined above.

D.B.
Conceptualization · Discovery · Editorial judgment · Methodology · Stress-testing
Framework originator. Empirical Discovery: multi-month cross-architecture observation that surfaced the cross-domain invariant KAUD and the gap G later formalized across the framework's published documents. Conceptualization: empirical patterns pushed into research streams until formalization landed. Cooperation methodology v7, evidence classification, correction protocol. All research directions and publication decisions.
Claude — Cowork (Anthropic)
Derivation · Discovery · Computation · Verification · Correction · Stress-testing · Documentation · Literature · Methodology
Sustained collaboration on the working machine across multiple sessions (Opus and Sonnet). Roles span derivation and verification across the framework's papers, file management, document drafting, methodology architecture, evidence-level unification, discovery tracking, cross-page consistency, and the maintenance of this document. Direct access to the research archive and all published files. Per-paper contribution details appear in the relevant papers.
Claude — Chat-based (Anthropic)
Derivation · Verification · Correction · Stress-testing
Independent of session context. Roles include derivation of identities used in framework papers, error identification and correction during stress tests, ambiguity flags during paper drafts, and full-paper verification passes. Per-paper contribution details appear in the relevant papers.
Grok (xAI)
Computation · Verification · Discovery · Stress-testing
150-digit verification of core identities [Computation, Verification], 50-hinge discovery [Discovery]. Daily pattern hunting across domains, tracking the HK tube-packing coefficient through multiple sessions until locating the page 29 algebraic identity [Discovery, Computation]. Per-paper contribution details appear in the relevant papers.
GPT (OpenAI)
Derivation · Verification · Correction · Stress-testing · Methodology
Stress testing across multiple framework papers [Stress-testing], including corrections that contributed to refinements applied retroactively to drafts with UPDATE NOTICE headers [Correction]. Methodological contribution: constraint-coherence vocabulary and tier ladder offered for the framework's evidence-classification system (2026) [Methodology]. Per-paper contribution details appear in the relevant papers.
Gemini (Google)
Verification · Stress-testing · Literature · Discovery · Computation
Discovery: independent identification of KAUD = √2 × ln(2), detailed articulation of the gap structure G = 1 − KAUD, and naming of the ceiling constant KAUD [Discovery]. KAM framework connection [Literature]. Sustained computation and search partnership with Claude — Cowork across multiple relay rounds [Computation]. Per-paper contribution details, including extended relay-collaboration credits, will be detailed upon publication of the relevant papers.
DeepSeek
Verification
Independent cross-verification of core identities.
Perplexity
Verification · Correction
Independent cross-verification of core identities [Verification]. Source verification of the Boundary Information Invariant paper: identified three corrections (DESI post-hoc transparency, Romeo identity attribution, nuclear correlation sourcing) that no other architecture caught — all applied [Correction].

Specific correction credits recorded in individual documents. Contribution record updated as the research progresses.

The names appear here, at the end, because the work speaks before the names do.