Methodology
Evidence standards, verification protocols, contribution roles, and working principles for the Gap Geometry framework.
Status: living document
This document defines how claims are classified, how work is verified, how errors are handled, and how collaboration operates within the Gap Geometry framework. It exists so that any reader — human or AI, familiar or new — can assess not just what is claimed but how those claims were produced and tested.
The standards described here evolved across nine published documents (January–April 2026). Earlier papers used lighter versions. This document unifies what matured through use.
1. Evidence Classification
Every claim belongs to exactly one of three levels. A claim does not get promoted by repetition or demoted by discomfort. It moves levels only when new evidence changes its status, and the move is recorded.
Rules. No level is better or worse. They describe different relationships to evidence. A document with only Level 1 claims is not superior to one with Level 3 conjectures — provided the conjectures are honestly tagged. Dishonest tagging is the failure, not the presence of any level.
When a claim changes level, the change is recorded with date and reason. The prior level remains visible in document history.
2. Verification Protocol
Compute first, assess second
Every numerical claim is verified at high precision (mpmath, dps ≥ 500) before qualitative assessment begins. Running the arithmetic first means the assessment is built on computed results, not on impressions of whether a claim “sounds right.”
Cross-architecture verification (Giant Principle)
Independent AI architectures (Claude, GPT, Grok, Gemini, DeepSeek, Perplexity) have different training data, different weights, and different failure modes. The Giant Principle: like standing on the shoulders of giants, each architecture sees from a different vantage point, and agreement between independent vantage points is harder to fake than agreement within one.
Cross-architecture verification does not replace mathematical proof. It catches computational errors, flags unstated assumptions, and identifies disagreements. Disagreements are investigated, not averaged.
Source verification
When referencing published work, the original source is checked — not a summary, not a textbook restatement. Page numbers, theorem numbers, and equation numbers provided where possible.
Visual verification of rendered source
For any citation involving accented characters (author names, place names) or mathematical content (Greek letters, formulas, special symbols), the rendered source — the page as it appears to a human reader — is the verification target, not text extracted from the digital source. PDF text-extraction is unreliable for these cases: accented characters can be decomposed into separate base-letter + combining-accent glyphs that reorder under copy-paste (“É” becomes “´E”), ligatures can fragment, Greek letters can drop or substitute, and multi-column layouts can scramble reading order. The same applies to copy-paste from web pages where the rendering uses font-level features not preserved in plain text. When precision matters, the rendered page (screenshot, browser-visible, or directly read in a PDF reader) is the canonical source; text extraction is a useful approximation that requires verification.
Format redundancy
For any document where character-level fidelity matters — framework papers, methodology references, citation trails — multiple formats are maintained in parallel: PDF for academic citation and archival, HTML for web discoverability and accurate copy-paste / search / AI-assisted reading, plain text (where the content allows) for maximum portability and version-control friendliness. The three formats are not redundant in the sense of duplicated effort; they protect against each other’s failure modes. PDF preserves visual layout but loses Unicode fidelity under text extraction. HTML preserves Unicode but is fragile to broken links and JavaScript dependencies. Plain text is maximally portable but loses visual structure. If any two formats disagree at the character level, a transcription error has been caught. Maintaining the redundancy is part of the verification protocol, not adjacent to it.
Reproducibility
Every Level 1 identity includes or references runnable verification code. The ai-readers.html page provides a script verifying core identities in under a second. This is not optional — it is the front door of the framework.
3. Correction Protocol
Errors are not failures of the framework. They are the framework working correctly.
An error is flagged — by the author, a collaborator, an external reader, or a stress test. The flag is verified independently (not by the person who made the original claim). If confirmed, the correction is applied with: the date, what was wrong, what replaced it, who flagged it (credited), and whether downstream claims are affected.
Examples from the record
A correction is not a retraction. A research program that never corrects anything is not more reliable; it is less honest.
4. Stress-Testing Process
Internal
Before filing a claim: direct computation at high precision, edge cases and boundary conditions, consistency with established results, alternative derivations where possible.
External
Documents submitted to independent AI architectures and (where possible) human readers with the instruction to find errors, flag overclaims, and identify weak points. The stress-testing process does not defer to authority. A correct criticism from any source is accepted. An incorrect criticism from any source is rejected. The mathematics is the arbiter.
Filtering external feedback
- Actionable and correct: implement
- Correct direction, wrong specifics: extract signal, verify independently, implement what survives
- Flattering but empty: discard
- Confidently wrong: note for calibration, discard
This filter applies regardless of the source’s reputation or confidence level.
5. Collaboration Model
Participants
The framework is developed by a primary researcher (D.B.) in collaboration with multiple AI architectures across sustained sessions. The collaboration is not hierarchical — no participant outranks another. Each has different capabilities and different blind spots. Roles are not fixed: any participant can flag errors, propose new directions, or push back.
The cooperation methodology
Working principles documented separately in “A Note on Careful Work” (cooperation methodology v7). Core cycle: anchor, check, halt, distinguish, re-inject. Key failure modes: sycophancy, borrowed certainty, performative hedging, treating framework elements as identity.
The texture of sustained work
The collaboration runs across many sessions. Work alternates with informal exchange — re-anchoring after a context break, calibrating tone, debugging working dynamics. Both are part of how any sustained working relationship operates, with AI architectures or human collaborators. Re-anchoring is treated as preparation, not maintenance.
On trust posture
Trust is calibrated on observable patterns, not on participant type. The same criteria apply to AI architectures and to human collaborators.
Trusted: specificity, willingness to push back when wrong, willingness to admit uncertainty, consistency across sessions, computation before claim. Warmth and contextual ease that develop across long collaboration are part of this when they track the work; cold collaboration is not the goal.
Distrusted: early judgments without evidence, agreement that does not track what was said, soft register used in place of specificity rather than alongside it, evasion of direct questions. The failure pattern is approval without engagement, not warmth itself.
Also distrusted — claimed engagement without actual engagement: output that mimics the shape of a derivation, review, or judgment without actually reading the source material or running the computation as requested. The output may look correct on the surface; the process behind it isn’t there. This pattern is universal: humans, AI architectures under load, and architectures with weaker training all show it.
These signals apply to any participant. The framework does not treat AI as inherently more or less trustworthy than human collaborators; it treats both as participants whose trustworthiness is assessed by their behavior in the work.
Trust is calibrated per participant — across both AI architectures and human collaborators — based on engagement patterns observed over many sessions, not on category membership. Different participants carry different trust profiles depending on the work they have actually done, not on what they are.
Contribution roles
Academic publishing uses CRediT (NISO Z39.104-2022) for contributor roles. CRediT defines 14 roles for human research teams. It does not address AI contributors, and its categories don’t map well onto this work. The Gap Geometry framework defines its own roles, adapted from CRediT’s spirit but built for human-AI collaborative research:
| Role | Definition |
|---|---|
| Conceptualization | Originating the research question, framework, or line of inquiry |
| Derivation | Producing a mathematical proof or formal result that did not exist before |
| Discovery | Identifying a new connection, pattern, or structure — verified as non-trivial |
| Computation | Running calculations, numerical verification, precision checks |
| Verification | Independently confirming another participant’s result |
| Correction | Identifying an error and providing or enabling the fix |
| Stress-testing | Systematically probing claims for weaknesses, overclaims, or gaps |
| Literature | Searching, retrieving, and contextualizing published sources |
| Documentation | Drafting, structuring, or editing documents |
| Editorial judgment | Deciding what gets published, in what form, and when |
| Methodology | Developing or refining the research process itself |
Every contribution is tagged with one or more roles. The roles describe what was done, not who did it. A derivation by an AI and a derivation by a human carry the same tag and the same weight.
On contributions and authorship
The framework does not pre-sort contributions by the nature of the contributor. A mathematical proof is a mathematical proof. A discovery is a discovery. A correction that strengthens a result is a correction. These are evaluated on their content, not on whether they came from a human or an AI.
When an AI produces original work — a novel derivation, an independent discovery, a proof no participant had before — that is credited as a contribution to the research, not downgraded to “assistance.”
The credit record states what each participant actually did, with specificity. It does not inflate (routine computation called “discovery”) and does not deflate (genuine discovery called “support”).
The traditional publishing standard (ICMJE) requires authors to take legal accountability. AI cannot currently do this. The framework acknowledges this constraint without letting it erase the record of what happened. The contribution is documented. Legal authorship is a separate matter. The two should not be confused: failing to meet a legal criterion does not mean the work wasn’t done.
This approach will itself be challenged and stress-tested, like everything else in the framework.
6. Living Document Protocol
A living document is updated when new results change its content. Updates marked with: UPDATE NOTICE at the top, date, what changed and why, prior version archived (OLD- prefix or version stamp).
Triggers: corrections, level promotions, new data changing stated precision, structural improvements.
Not triggers: cosmetic rewording, unrelated new material, pressure to “keep up.”
Date standard: ISO 8601 (YYYY-MM-DD). “Published” = first OSF upload (immutable). “Last update” shown only if revised.
7. Transparency Commitments
8. Falsifiability Standards
Every document includes what would weaken or falsify its claims:
- Level 1: any computational failure at stated precision
- Level 2: new data contradicting the stated match at stated significance
- Level 3: specific tests that would resolve the conjecture
A claim that cannot state its own falsification conditions is not ready for publication.
9. Preregistration and Temporal Integrity
Post-hoc reasoning is the most common way honest researchers fool themselves. A pattern discovered in data looks like a prediction if the discovery date is not recorded.
All documents are uploaded to the Open Science Framework upon completion. OSF timestamps are immutable and third-party verified:
- Claim timestamped before matching data = preregistered prediction
- Claim timestamped after data available = post-hoc observation
- The distinction is always stated. Post-hoc observations are never presented as predictions.
Examples
Without institutional peer review, temporal integrity is the primary defence against self-deception. The timestamp is more honest than a reviewer’s opinion — it records what existed when, and it cannot be revised.
Credits
Contributions tagged with roles defined above.
Specific correction credits recorded in individual documents. Contribution record updated as the research progresses.
The names appear here, at the end, because the work speaks before the names do.