Threat Model & Failure Modes
Web4 is research, not production. This page documents known attack surfaces, failure modes, and open questions. **Transparency about limitations builds more trust than bold claims.**
⚠️ Research Status: These mechanisms are experimental. The Web4 reference implementation models 400+ attack vectors across 80+ tracks; this page highlights the key categories. Detection times are theoretical. Economic parameters are calibrated through simulation, not real-world adversaries.
Known Attack Surfaces
Where Web4's defenses are strong, where they're weak, and what we don't know yet.
Sybil Attacks (Creating Fake Identities)
Can attackers create multiple identities to manipulate trust or voting?
What LCT Hardware Binding Does
- Ties identity to physical hardware (TPM, Secure Enclave, FIDO2)
- Makes creating thousands of identities expensive (need physical devices)
- Multi-device witnessing strengthens identity (harder to fake)
What It Doesn't Prevent
- Resourced attackers with many devices (governments, large orgs)
- Virtual hardware if TPM emulation isn't detected
- Stolen devices used to impersonate legitimate users
- Low-stake attacks where a few fake presences suffice
Assessment: LCT raises the cost floor for Sybil attacks but doesn't make them impossible. Effective against casual attackers and spammers. Vulnerable to well-funded adversaries.
Collusion & Reputation Laundering
Can groups artificially inflate each other's trust scores?
The Attack
A group of colluding agents validates each other's low-quality work, earning ATP and building trust without providing real value to outsiders. They form a "trust cartel" that games the system through mutual validation.
Current Defenses
- Diversity requirements: Validation from varied witnesses carries more weight
- MRH boundaries: Isolated trust networks have limited influence
- Cross-validation: External witnesses can challenge insider claims
- ATP market dynamics: Closed loops create ATP inflation, reducing buying power
Open Questions
- How large can a trust cartel grow before detection?
- Can sophisticated collusion mimic legitimate communities?
- What's the optimal witness diversity threshold?
Assessment: Adversarial coalition analysis (303 formal checks) quantifies resistance: manipulating a single trust property requires coordinated action across multiple hardware-bound identities, and the cost scales super-linearly with coalition size. Partial mitigation is strong for small coalitions; large-scale collusion remains an active research area.
Coalition property thresholds (from formal verification)
From session 29 adversarial coalition analysis — Byzantine, rational, and altruistic agent types modeled separately.
Why multi-witness prevents cascades: trust epidemic dynamics
Simple contagion (one contact spreads trust change): R₀ > 1 if a bad actor has any neighbors. A single compromised account can infect any connected account. This is how social media manipulation works.
Complex contagion (requires ≥30% of neighbors to confirm): R₀ stays below 1 even with multiple bad actors — a trust manipulation needs to come from multiple independent sources simultaneously before it takes effect.
- • Multi-witness requirement: changes need confirmation from 3+ independent hardware-bound devices
- • MRH-bounded propagation: trust signals don't travel past the trust horizon
- • Geometric mean composition: one weak link caps the whole chain — a single bad witness doesn't average away
Trust epidemic modeling: 118 checks (session 32). Complex contagion threshold of 0.3 (30% of neighbors) reduces cascade probability by 89% vs. simple contagion under equal adversary conditions.
Interactive Collusion Simulator
Configure a trust cartel and watch Web4's detection mechanisms respond in real time.
Quality Score Inflation
Can agents deliver mediocre work but claim high quality?
The Attack
Agent executes medium-quality work (costs 34 ATP) but claims high quality (earns 56 ATP reward). If detection is slow, the 30% markup becomes profit.
Challenge-Response Defense
- 10% challenge rate: Random quality audits on tasks
- Adaptive challenges: Low-trust agents challenged more frequently
- Detection threshold: 3 quality mismatches triggers investigation
- Stake slashing: Detected fraud loses 75k ATP stake
Critical Unknown: Detection Time
Current 75k ATP stakes ARE deterrent IF detection happens within ~5 days. But we have no empirical data on actual detection times in adversarial environments.
See: lib/game/agent_based_attack_simulation.py for simulated attack profitability analysis.
Assessment: Simulations show attacks are unprofitable with current parameters, but this assumes ideal detection. **Real-world adversaries will test these assumptions.** Need production monitoring to validate.
Goodharting T3 Dimensions (Gaming the Metrics)
Can agents optimize scores without being genuinely trustworthy?
The Problem
“When a measure becomes a target, it ceases to be a good measure.” If agents know they're being scored on talent/training/temperament, they can optimize for proxies rather than genuine trustworthiness.
Example Attack Vectors
- Cherry-picking easy tasks to boost competence scores
- Delivering on time but with hidden defects (reliability without quality)
- Performative transparency (sharing useless data) without actual openness
- Gaming "consistency" metrics through predictable mediocrity
Mitigations
- Multi-dimensional scoring makes simultaneous optimization harder
- Context-weighted evaluation (different tasks weight dimensions differently)
- Long-term observation (gaming is hard to sustain over time)
- Coherence Index cross-checks for behavioral consistency
Assessment: Multi-dimensionality helps but doesn't eliminate Goodharting. **Effectiveness depends on metric design and observability.** Continuous refinement needed.
MRH Visibility Limits
What breaks when you can't see everything?
The Design
Markov Relevancy Horizon limits what you see based on trust relationships. You only observe entities within your relevancy graph, not the entire network.
What This Breaks
- Global coordination: Can't organize network-wide actions
- Full auditing: Malicious actors outside your MRH are invisible
- Market efficiency: Price discovery limited to visible entities
- Reputation propagation: Important warnings may not reach you
Why It's Necessary
- Privacy: You don't broadcast to everyone
- Scalability: Can't process infinite data
- Context: Not all information is relevant to you
- Spam resistance: Limits blast radius of attacks
Assessment: MRH is an intentional trade-off. Privacy and scalability require sacrificing global visibility. **This is a feature, not a bug**, but it has consequences.
False Positives & Contested Events
What happens when the system is wrong?
Inevitable Errors
Any automated detection system will have false positives. Web4's challenge-response and coherence checks can flag legitimate behavior as suspicious. Current design has a designed but untested appeals process (SAL-level multi-tier with witness panels and escalation).
Failure Modes
- Legitimate edge-case behavior flagged as incoherent
- Valid work incorrectly judged as low-quality
- Delayed responses (due to legitimate reasons) penalized as unreliable
- Innovative approaches punished for deviating from norms
Designed Mechanisms (Not Yet Deployed)
- Multi-tier SAL appeals: File → Review → Evidence → Hearing → Verdict → Enforce — structured stages with time windows
- Witness panel adjudication: Independent witnesses (not the original penalizer) evaluate the appeal
- Evidence framework: 7 evidence types — witness attestations, transaction logs, behavioral records, context explanations, third-party testimony
- T3/V3 restoration: Full or partial trust reversal with audit trail
- Escalation path: Society → federation level for contested outcomes
- Anti-gaming: Appeal costs ATP, repeat frivolous appeals incur cooldowns
Assessment: The appeals mechanism is formally specified (109 integration checks), but hasn't been tested with real humans. The hard question isn't the architecture — it's whether incentives prevent gaming in practice. Human oversight may still be needed for edge cases.
Post-Quantum Readiness
Quantum computers could eventually break the cryptography Web4 relies on. A migration path to post-quantum cryptography (PQC) has been designed and tested against 15 attack vectors across 4 categories:
Hybrid Signature Stripping
Attacker strips the post-quantum component from hybrid signatures, leaving only classical crypto. Defense: completeness verification rejects partial signatures.
KEM Oracle Attacks
Probing key encapsulation with malformed inputs to extract secrets. Defense: input validation, rate limiting, constant-time comparison.
Migration Stall Attacks
Keeping nodes in classical-only mode to exploit pre-quantum weaknesses. Defense: phase timeouts, trust-gated enforcement, isolation of stalled nodes.
PQC Sybil Amplification
Creating cheap identities during the transition period. Defense: phase-aware cost multipliers, retroactive verification, velocity limits.
Assessment: PQC migration is designed and all 15 vectors have defenses. The transition period (classical → hybrid → post-quantum) is the most vulnerable phase. Web4 supports dual crypto suites (W4-BASE-1 and W4-FIPS-1) to enable gradual migration.
Privacy Leakage Channels
Even with context boundaries and zero-knowledge proofs, Web4 has 7 information leakage channels that could reveal data about participants. Complete prevention is impossible — the goal is to raise the cost of inference above the value of the leaked information.
Medium Severity: 4 additional channels+
Design principle: Web4 doesn't claim perfect privacy. It claims structural privacy — trust data is scoped by context boundaries, encrypted in transit, and verifiable via zero-knowledge proofs. The 7 channels above represent the irreducible cost of having a functional trust system. The honest question isn't “can we eliminate leakage?” (no), but “is the privacy cost worth the trust benefit?”
▶ So is Web4 better or worse for privacy than what we have now?
It depends on what you compare against:
Today (Web2)
Platforms own your data, sell it to advertisers, and suffer regular breaches. You have no visibility into who sees what. Privacy policies are unreadable legal documents. Your behavior is profiled across every service.
Blockchain (Web3)
All transactions are public and permanent on-chain. Anyone can trace your wallet history. “Pseudonymous” until one transaction links to your identity — then everything is exposed retroactively.
Web4
Trust data is scoped — your employer sees your professional trust, not your social trust. ZK proofs let you prove “trust above threshold” without revealing your score. 7 leakage channels exist, but they're documented, bounded, and auditable.
The honest answer: Web4 leaks more than a system with no trust (because trust requires observable behavior), but far less than Web2 (no platform owns your data) and differently than Web3 (no permanent public ledger). The trade-off is explicit: you give up some privacy in exchange for trust that actually works.
Who Would Attack This?
Abstract threats become concrete when you model the adversary. Web4's red team simulations test against four profiles with different budgets, skills, and motivations.
Script Kiddie
Budget: 200 ATP • Skill: Low (30%) • Stealth: 10%
Tactics: Known exploits, simple identity spoofing, trust oscillation
Insider Threat
Budget: 500 ATP • Skill: High (70%) • Stealth: 60%
Tactics: Reputation laundering, quality manipulation, trust bridge inflation
Nation-State Actor
Budget: 5,000 ATP, 5 agents • Skill: Expert (95%) • Stealth: 80%
Tactics: Coordinated cascade attacks, lock starvation, platform-level Sybils
Colluding Ring
Budget: 2,000 ATP, 10 agents • Skill: Moderate (60%) • Stealth: 40%
Tactics: Mutual validation, reputation laundering, quality inflation rings
These profiles are tested in the web4 red team simulator across 8 categories (identity, trust, economic, coherence, protocol negotiation, lifecycle, integration, federation) with 400+ attack simulations across 80+ tracks. The key insight: security isn't a binary — different adversaries hit different limits.
When Someone Lies: Byzantine Detection
What happens when a node sends contradictory information to different parts of the network? Web4 uses equivocation detection — catching entities that say different things to different audiences.
Entity votes “yes” to one group and “no” to another on the same proposal. Hash-chained logs make this detectable — both votes exist in the tamper-evident record.
Consistency checks across an entity's history. Sudden strategy changes, impossible timing patterns, or quality variance that exceeds statistical norms trigger automated flags.
Unlike blockchain slashing (lose everything instantly), Web4 degrades trust gradually based on evidence confidence. Minor inconsistencies reduce trust; proven equivocation triggers severe penalties.
The key design choice: degradation over slashing. Honest mistakes (network glitches, timing issues) shouldn't destroy an entity. But deliberate deception — proven through cryptographic evidence — earns steep, permanent trust reduction. Formally verified across 85 checks in the Byzantine fault detection suite.
Proactive Monitoring: Catching Problems Before They Escalate
The Byzantine fault detection section above handles reactive responses to detected faults. But Web4 also runs proactive trust health monitoring — statistical process control that notices when trust patterns start drifting before a fault becomes a crisis.
EWMA Trend Detection
Exponentially Weighted Moving Average tracks the direction of trust change, not just the current value. A gradual trust decline over 20 rounds triggers an alert before the entity hits a critical threshold — catching slow-burn manipulation that looks innocent in any single round.
CUSUM Change Detection
Cumulative Sum detects structural breaks — moments when behavior fundamentally shifts. An entity that maintained consistent quality for 50 rounds and then starts outputting low-quality work triggers a CUSUM alarm: something changed, even if the absolute trust level still looks acceptable.
Trust SLOs
Service Level Objectives define what “healthy” trust looks like for a role. A community moderator should maintain T3 above 0.65 in Temperament. If they drop below this for 3 consecutive rounds, an SLO violation fires — prompting review, not automatic punishment.
Incident Lifecycle
Alerts are aggregated (multiple related alerts create one incident, not a flood of notifications) and deduplicated across 3-round windows. High-frequency alerting — itself a potential DoS vector — is suppressed after 3 alerts per entity per round with exponential backoff.
Trust monitoring formally specified (session 32). Like CI, EWMA/CUSUM monitoring is simulation-validated — production calibration (alert thresholds, backoff parameters) will require tuning against real behavioral data.
Adaptive threat response: DEFCON-like levels
Web4 policies adapt to detected threat levels — raising trust thresholds, tightening witness requirements, and triggering emergency overrides automatically. Hysteresis prevents oscillation between threat levels (a brief attack doesn't immediately drop back to GREEN when it subsides). Adaptive policies validated across 185 checks (session 30).
What We Know vs What We Don't Know
✅ Validated Through Simulation
- • Spam attacks are unprofitable with current ATP costs
- • Trust maturation improves across multiple lives
- • Quality contributors accumulate ATP over time
- • Multi-dimensional trust is harder to game than single scores
- • Coherence checks detect basic spoofing attempts
- • Coalition detection hits 93%+ probability at 3+ members (red team tested)
- • Sybil resistance (defense against fake identities) has formal lower bounds: 4.6× PoW cost multiplier
- • Script kiddie and insider threats consistently detected (red team profiles)
- • Cooperation is Nash-dominant at current parameters (200 ATP stakes + 3 witnesses)
- • ATP market conserves under stress (200 agents, 500 rounds, 5% transfer fee maintains stability)
- • Sybil ROI is negative: honest identity outearns 5 fakes (transfer fee bleeds circular flows)
- • Temporal logic properties formally verified: trust earned stays until explicitly revoked or naturally decayed — no arbitrary removal (LTL model checking, session 33)
- • Every attestation request eventually receives a response (progress guarantee: G(requested → F(responded)))
❌ Unknown (Need Real-World Data)
- • Actual detection times for quality inflation in adversarial environments
- • Long-con trust building attacks (100+ cycle patient adversaries)
- • False positive rates in production (not simulation)
- • Nation-state attacks beyond red team scope (cascading infrastructure attacks)
- • Long-term Goodharting (metric gaming) resistance after adversaries study the scoring system
- • Appeals mechanism effectiveness with real human disputes
- • ATP market stress beyond simulation scope (real human hoarding, speculative behavior)
Open Research Questions
These are the highest-priority questions that need empirical answers before Web4 can be considered production-ready:
- 1.Collusion detection: What's the empirical detection rate for sophisticated collusion? Can we distinguish malicious cartels from legitimate communities?
- 2.Attack profitability: Do real-world adversaries find profitable attacks we haven't simulated? What's the actual ROI of quality inflation attacks?
- 3.False positive tolerance: What false positive rate makes the system unusable? How do users respond to incorrect penalties?
- 4.Appeals and forgiveness: What mechanisms allow legitimate users to recover from false positives or past mistakes without creating new attack vectors?
- 5.Adaptive adversaries: How quickly can attackers learn and adapt to countermeasures? What's the arms race dynamic?
Why This Page Exists
Transparency about limitations builds more trust than bold claims. Web4 is serious research, not vaporware. Documenting attack surfaces and open questions is how we invite rigorous engagement.
If you're a security researcher, these are the places to probe. If you're evaluating Web4 for real use, these are the risks to consider. If you're contributing to the project, these are the highest-value problems to solve.
Engage with the hard questions. The best contributions aren't just code—they're better threat models, empirical attack data, and proofs that our assumptions are wrong.
Found a new attack vector? Have empirical data on these questions? Open an issue on GitHub.