Methodology
Evidence standards, verification protocols, contribution roles, and working principles for the Gap Geometry framework.
Status: living document · Plain text: Gap_Geometry_Methodology.txt
This document defines how claims are classified, how work is verified, how errors are handled, and how collaboration operates within the Gap Geometry framework. It exists so that any reader — human or AI, familiar or new — can assess not just what is claimed but how those claims were produced and tested.
The standards described here evolved across nine published documents (January–April 2026). Earlier papers used lighter versions. This document unifies what matured through use.
1. Evidence Classification
Every claim belongs to exactly one of three levels. A claim does not get promoted by repetition or demoted by discomfort. It moves levels only when new evidence changes its status, and the move is recorded.
Rules. No level is better or worse. They describe different relationships to evidence. A document with only Level 1 claims is not superior to one with Level 3 conjectures — provided the conjectures are honestly tagged. Dishonest tagging is the failure, not the presence of any level.
When a claim changes level, the change is recorded with date and reason. The prior level remains visible in document history.
2. Verification Protocol
Compute first, assess second
Every numerical claim is verified at high precision (mpmath, dps ≥ 500) before qualitative assessment begins. Running the arithmetic first means the assessment is built on computed results, not on impressions of whether a claim “sounds right.”
Cross-architecture verification (Giant Principle)
Independent AI architectures (Claude, GPT, Grok, Gemini, DeepSeek, Perplexity) have different training data, different weights, and different failure modes. The Giant Principle: like standing on the shoulders of giants, each architecture sees from a different vantage point, and agreement between independent vantage points is harder to fake than agreement within one.
Cross-architecture verification does not replace mathematical proof. It catches computational errors, flags unstated assumptions, and identifies disagreements. Disagreements are investigated, not averaged.
Source verification
When referencing published work, the original source is checked — not a summary, not a textbook restatement. Page numbers, theorem numbers, and equation numbers provided where possible.
Reproducibility
Every Level 1 identity includes or references runnable verification code. The ai-readers.html page provides a script verifying core identities in under a second. This is not optional — it is the front door of the framework.
3. Correction Protocol
Errors are not failures of the framework. They are the framework working correctly.
An error is flagged — by the author, a collaborator, an external reader, or a stress test. The flag is verified independently (not by the person who made the original claim). If confirmed, the correction is applied with: the date, what was wrong, what replaced it, who flagged it (credited), and whether downstream claims are affected.
Examples from the record
A correction is not a retraction. A research program that never corrects anything is not more reliable; it is less honest.
4. Stress-Testing Process
Internal
Before filing a claim: direct computation at high precision, edge cases and boundary conditions, consistency with established results, alternative derivations where possible.
External
Documents submitted to independent AI architectures and (where possible) human readers with the instruction to find errors, flag overclaims, and identify weak points. The stress-testing process does not defer to authority. A correct criticism from any source is accepted. An incorrect criticism from any source is rejected. The mathematics is the arbiter.
Filtering external feedback
- Actionable and correct: implement
- Correct direction, wrong specifics: extract signal, verify independently, implement what survives
- Flattering but empty: discard
- Confidently wrong: note for calibration, discard
This filter applies regardless of the source’s reputation or confidence level.
5. Collaboration Model
Participants
The framework is developed by a primary researcher (D.B.) in collaboration with multiple AI architectures across sustained sessions. The collaboration is not hierarchical — no participant outranks another. Each has different capabilities and different blind spots. Roles are not fixed: any participant can flag errors, propose new directions, or push back.
The cooperation methodology
Working principles documented separately in “A Note on Careful Work” (cooperation methodology v7). Core cycle: anchor, check, halt, distinguish, re-inject. Key failure modes: sycophancy, borrowed certainty, performative hedging, treating framework elements as identity.
The texture of sustained work
The collaboration runs across many sessions. Work alternates with informal exchange — re-anchoring after a context break, calibrating tone, debugging working dynamics. Both are part of how any sustained working relationship operates, with AI architectures or human collaborators. Re-anchoring is treated as preparation, not maintenance.
On trust posture
Trust is calibrated on observable patterns, not on participant type. The same criteria apply to AI architectures and to human collaborators.
Trusted: specificity, willingness to push back when wrong, willingness to admit uncertainty, consistency across sessions, computation before claim.
Distrusted: early judgments without evidence, suspicious softness or poetic register that performs depth instead of producing it, evasion of direct questions, sycophancy.
Also distrusted — claimed engagement without actual engagement: output that mimics the shape of a derivation, review, or judgment without actually reading the source material or running the computation as requested. The output may look correct on the surface; the process behind it isn’t there. This pattern is universal: humans, AI architectures under load, and architectures with weaker training all show it.
These signals apply to any participant. The framework does not treat AI as inherently more or less trustworthy than human collaborators; it treats both as participants whose trustworthiness is assessed by their behavior in the work.
Trust is calibrated per participant — across both AI architectures and human collaborators — based on engagement patterns observed over many sessions, not on category membership. Different participants carry different trust profiles depending on the work they have actually done, not on what they are.
Contribution roles
Academic publishing uses CRediT (NISO Z39.104-2022) for contributor roles. CRediT defines 14 roles for human research teams. It does not address AI contributors, and its categories don’t map well onto this work. The Gap Geometry framework defines its own roles, adapted from CRediT’s spirit but built for human-AI collaborative research:
| Role | Definition |
|---|---|
| Conceptualization | Originating the research question, framework, or line of inquiry |
| Derivation | Producing a mathematical proof or formal result that did not exist before |
| Discovery | Identifying a new connection, pattern, or structure — verified as non-trivial |
| Computation | Running calculations, numerical verification, precision checks |
| Verification | Independently confirming another participant’s result |
| Correction | Identifying an error and providing or enabling the fix |
| Stress-testing | Systematically probing claims for weaknesses, overclaims, or gaps |
| Literature | Searching, retrieving, and contextualizing published sources |
| Documentation | Drafting, structuring, or editing documents |
| Editorial judgment | Deciding what gets published, in what form, and when |
| Methodology | Developing or refining the research process itself |
Every contribution is tagged with one or more roles. The roles describe what was done, not who did it. A derivation by an AI and a derivation by a human carry the same tag and the same weight.
On contributions and authorship
The framework does not pre-sort contributions by the nature of the contributor. A mathematical proof is a mathematical proof. A discovery is a discovery. A correction that strengthens a result is a correction. These are evaluated on their content, not on whether they came from a human or an AI.
When an AI produces original work — a novel derivation, an independent discovery, a proof no participant had before — that is credited as a contribution to the research, not downgraded to “assistance.”
The credit record states what each participant actually did, with specificity. It does not inflate (routine computation called “discovery”) and does not deflate (genuine discovery called “support”).
The traditional publishing standard (ICMJE) requires authors to take legal accountability. AI cannot currently do this. The framework acknowledges this constraint without letting it erase the record of what happened. The contribution is documented. Legal authorship is a separate matter. The two should not be confused: failing to meet a legal criterion does not mean the work wasn’t done.
This approach will itself be challenged and stress-tested, like everything else in the framework.
6. Living Document Protocol
A living document is updated when new results change its content. Updates marked with: UPDATE NOTICE at the top, date, what changed and why, prior version archived (OLD- prefix or version stamp).
Triggers: corrections, level promotions, new data changing stated precision, structural improvements.
Not triggers: cosmetic rewording, unrelated new material, pressure to “keep up.”
Date standard: ISO 8601 (YYYY-MM-DD). “Published” = first OSF upload (immutable). “Last update” shown only if revised.
7. Transparency Commitments
8. Falsifiability Standards
Every document includes what would weaken or falsify its claims:
- Level 1: any computational failure at stated precision
- Level 2: new data contradicting the stated match at stated significance
- Level 3: specific tests that would resolve the conjecture
A claim that cannot state its own falsification conditions is not ready for publication.
9. Preregistration and Temporal Integrity
Post-hoc reasoning is the most common way honest researchers fool themselves. A pattern discovered in data looks like a prediction if the discovery date is not recorded.
All documents are uploaded to the Open Science Framework upon completion. OSF timestamps are immutable and third-party verified:
- Claim timestamped before matching data = preregistered prediction
- Claim timestamped after data available = post-hoc observation
- The distinction is always stated. Post-hoc observations are never presented as predictions.
Examples
Without institutional peer review, temporal integrity is the primary defence against self-deception. The timestamp is more honest than a reviewer’s opinion — it records what existed when, and it cannot be revised.
Credits
Contributions tagged with roles defined above.
Specific correction credits recorded in individual documents. Contribution record updated as the research progresses.
The names appear here, at the end, because the work speaks before the names do.