Security & Limitations

Threat Model & Failure Modes

Web4 is research, not production. This page documents known attack surfaces, failure modes, and open questions. **Transparency about limitations builds more trust than bold claims.**

⚠️ Research Status: These mechanisms are experimental. The Web4 reference implementation models 400+ attack vectors across 80+ tracks; this page highlights the key categories. Detection times are theoretical. Economic parameters are calibrated through simulation, not real-world adversaries.

Known Attack Surfaces

Where Web4's defenses are strong, where they're weak, and what we don't know yet.

πŸ‘₯

Sybil Attacks (Creating Fake Identities)

Can attackers create multiple identities to manipulate trust or voting?

What LCT Hardware Binding Does

  • Ties identity to physical hardware (TPM, Secure Enclave, FIDO2)
  • Makes creating thousands of identities expensive (need physical devices)
  • Multi-device witnessing strengthens identity (harder to fake)

What It Doesn't Prevent

  • Resourced attackers with many devices (governments, large orgs)
  • Virtual hardware if TPM emulation isn't detected
  • Stolen devices used to impersonate legitimate users
  • Low-stake attacks where a few fake presences suffice

Assessment: LCT raises the cost floor for Sybil attacks but doesn't make them impossible. Effective against casual attackers and spammers. Vulnerable to well-funded adversaries.

🀝

Collusion & Reputation Laundering

Can groups artificially inflate each other's trust scores?

The Attack

A group of colluding agents validates each other's low-quality work, earning ATP and building trust without providing real value to outsiders. They form a "trust cartel" that games the system through mutual validation.

Current Defenses

  • Diversity requirements: Validation from varied witnesses carries more weight
  • MRH boundaries: Isolated trust networks have limited influence
  • Cross-validation: External witnesses can challenge insider claims
  • ATP market dynamics: Closed loops create ATP inflation, reducing buying power

Open Questions

  • How large can a trust cartel grow before detection?
  • Can sophisticated collusion mimic legitimate communities?
  • What's the optimal witness diversity threshold?

Assessment: Partial mitigation through diversity requirements, but no strong guarantees. **This is an active research problem.** Production systems need empirical data on collusion detection rates.

Interactive Collusion Simulator

Configure a trust cartel and watch Web4's detection mechanisms respond in real time.

16 agents
4 colluding
50%
10%
πŸ“ˆ

Quality Score Inflation

Can agents deliver mediocre work but claim high quality?

The Attack

Agent executes medium-quality work (costs 34 ATP) but claims high quality (earns 56 ATP reward). If detection is slow, the 30% markup becomes profit.

Challenge-Response Defense

  • 10% challenge rate: Random quality audits on tasks
  • Adaptive challenges: Low-trust agents challenged more frequently
  • Detection threshold: 3 quality mismatches triggers investigation
  • Stake slashing: Detected fraud loses 75k ATP stake

Critical Unknown: Detection Time

Current 75k ATP stakes ARE deterrent IF detection happens within ~5 days. But we have no empirical data on actual detection times in adversarial environments.

See: lib/game/agent_based_attack_simulation.py for simulated attack profitability analysis.

Assessment: Simulations show attacks are unprofitable with current parameters, but this assumes ideal detection. **Real-world adversaries will test these assumptions.** Need production monitoring to validate.

🎯

Goodharting T3 Dimensions (Gaming the Metrics)

Can agents optimize scores without being genuinely trustworthy?

The Problem

β€œWhen a measure becomes a target, it ceases to be a good measure.” If agents know they're being scored on talent/training/temperament, they can optimize for proxies rather than genuine trustworthiness.

Example Attack Vectors

  • Cherry-picking easy tasks to boost competence scores
  • Delivering on time but with hidden defects (reliability without quality)
  • Performative transparency (sharing useless data) without actual openness
  • Gaming "consistency" metrics through predictable mediocrity

Mitigations

  • Multi-dimensional scoring makes simultaneous optimization harder
  • Context-weighted evaluation (different tasks weight dimensions differently)
  • Long-term observation (gaming is hard to sustain over time)
  • Coherence Index cross-checks for behavioral consistency

Assessment: Multi-dimensionality helps but doesn't eliminate Goodharting. **Effectiveness depends on metric design and observability.** Continuous refinement needed.

🌐

MRH Visibility Limits

What breaks when you can't see everything?

The Design

Markov Relevancy Horizon limits what you see based on trust relationships. You only observe entities within your relevancy graph, not the entire network.

What This Breaks

  • Global coordination: Can't organize network-wide actions
  • Full auditing: Malicious actors outside your MRH are invisible
  • Market efficiency: Price discovery limited to visible entities
  • Reputation propagation: Important warnings may not reach you

Why It's Necessary

  • Privacy: You don't broadcast to everyone
  • Scalability: Can't process infinite data
  • Context: Not all information is relevant to you
  • Spam resistance: Limits blast radius of attacks

Assessment: MRH is an intentional trade-off. Privacy and scalability require sacrificing global visibility. **This is a feature, not a bug**, but it has consequences.

βš–οΈ

False Positives & Contested Events

What happens when the system is wrong?

Inevitable Errors

Any automated detection system will have false positives. Web4's challenge-response and coherence checks can flag legitimate behavior as suspicious. Current design has a designed but untested appeals process (SAL-level multi-tier with witness panels and escalation).

Failure Modes

  • Legitimate edge-case behavior flagged as incoherent
  • Valid work incorrectly judged as low-quality
  • Delayed responses (due to legitimate reasons) penalized as unreliable
  • Innovative approaches punished for deviating from norms

Designed Mechanisms (Not Yet Deployed)

  • Multi-tier SAL appeals: File β†’ Review β†’ Evidence β†’ Hearing β†’ Verdict β†’ Enforce β€” structured stages with time windows
  • Witness panel adjudication: Independent witnesses (not the original penalizer) evaluate the appeal
  • Evidence framework: 7 evidence types β€” witness attestations, transaction logs, behavioral records, context explanations, third-party testimony
  • T3/V3 restoration: Full or partial trust reversal with audit trail
  • Escalation path: Society β†’ federation level for contested outcomes
  • Anti-gaming: Appeal costs ATP, repeat frivolous appeals incur cooldowns

Assessment: The appeals mechanism is formally specified (109 integration checks), but hasn't been tested with real humans. The hard question isn't the architecture β€” it's whether incentives prevent gaming in practice. Human oversight may still be needed for edge cases.

Post-Quantum Readiness

Quantum computers could eventually break the cryptography Web4 relies on. A migration path to post-quantum cryptography (PQC) has been designed and tested against 15 attack vectors across 4 categories:

Hybrid Signature Stripping

Attacker strips the post-quantum component from hybrid signatures, leaving only classical crypto. Defense: completeness verification rejects partial signatures.

KEM Oracle Attacks

Probing key encapsulation with malformed inputs to extract secrets. Defense: input validation, rate limiting, constant-time comparison.

Migration Stall Attacks

Keeping nodes in classical-only mode to exploit pre-quantum weaknesses. Defense: phase timeouts, trust-gated enforcement, isolation of stalled nodes.

PQC Sybil Amplification

Creating cheap identities during the transition period. Defense: phase-aware cost multipliers, retroactive verification, velocity limits.

Assessment: PQC migration is designed and all 15 vectors have defenses. The transition period (classical β†’ hybrid β†’ post-quantum) is the most vulnerable phase. Web4 supports dual crypto suites (W4-BASE-1 and W4-FIPS-1) to enable gradual migration.

Who Would Attack This?

Abstract threats become concrete when you model the adversary. Web4's red team simulations test against four profiles with different budgets, skills, and motivations.

πŸ§’

Script Kiddie

Budget: 200 ATP β€’ Skill: Low (30%) β€’ Stealth: 10%

Tactics: Known exploits, simple identity spoofing, trust oscillation

Result: Consistently blocked. Low-skill attacks fail against basic LCT validation and rate limits.
πŸ•΅οΈ

Insider Threat

Budget: 500 ATP β€’ Skill: High (70%) β€’ Stealth: 60%

Tactics: Reputation laundering, quality manipulation, trust bridge inflation

Result: Detected within 5–10 rounds. Adapts after first detection but multi-party quality checks eventually catch it.
πŸ›οΈ

Nation-State Actor

Budget: 5,000 ATP, 5 agents β€’ Skill: Expert (95%) β€’ Stealth: 80%

Tactics: Coordinated cascade attacks, lock starvation, platform-level Sybils

Result: Can cause damage before detection. Multi-layer defenses limit blast radius but don't prevent all attacks. The hardest adversary.
🀝

Colluding Ring

Budget: 2,000 ATP, 10 agents β€’ Skill: Moderate (60%) β€’ Stealth: 40%

Tactics: Mutual validation, reputation laundering, quality inflation rings

Result: Shared hardware creates shared fate. Coalition detection probability hits 93%+ at 3 members. Unprofitable at current stake levels.

These profiles are tested in the web4 red team simulator across 8 categories (identity, trust, economic, coherence, protocol negotiation, lifecycle, integration, federation) with 400+ attack simulations across 80+ tracks. The key insight: security isn't a binary β€” different adversaries hit different limits.

What We Know vs What We Don't Know

βœ… Validated Through Simulation

  • β€’ Spam attacks are unprofitable with current ATP costs
  • β€’ Trust maturation improves across multiple lives
  • β€’ Quality contributors accumulate ATP over time
  • β€’ Multi-dimensional trust is harder to game than single scores
  • β€’ Coherence checks detect basic spoofing attempts
  • β€’ Coalition detection hits 93%+ probability at 3+ members (red team tested)
  • β€’ Sybil resistance (defense against fake identities) has formal lower bounds: 4.6Γ— PoW cost multiplier
  • β€’ Script kiddie and insider threats consistently detected (red team profiles)
  • β€’ Cooperation is Nash-dominant at current parameters (200 ATP stakes + 3 witnesses)
  • β€’ ATP market conserves under stress (200 agents, 500 rounds, 5% transfer fee maintains stability)
  • β€’ Sybil ROI is negative: honest identity outearns 5 fakes (transfer fee bleeds circular flows)

❌ Unknown (Need Real-World Data)

  • β€’ Actual detection times for quality inflation in adversarial environments
  • β€’ Long-con trust building attacks (100+ cycle patient adversaries)
  • β€’ False positive rates in production (not simulation)
  • β€’ Nation-state attacks beyond red team scope (cascading infrastructure attacks)
  • β€’ Long-term Goodharting (metric gaming) resistance after adversaries study the scoring system
  • β€’ Appeals mechanism effectiveness with real human disputes
  • β€’ ATP market stress beyond simulation scope (real human hoarding, speculative behavior)

Open Research Questions

These are the highest-priority questions that need empirical answers before Web4 can be considered production-ready:

  1. 1.Collusion detection: What's the empirical detection rate for sophisticated collusion? Can we distinguish malicious cartels from legitimate communities?
  2. 2.Attack profitability: Do real-world adversaries find profitable attacks we haven't simulated? What's the actual ROI of quality inflation attacks?
  3. 3.False positive tolerance: What false positive rate makes the system unusable? How do users respond to incorrect penalties?
  4. 4.Appeals and forgiveness: What mechanisms allow legitimate users to recover from false positives or past mistakes without creating new attack vectors?
  5. 5.Adaptive adversaries: How quickly can attackers learn and adapt to countermeasures? What's the arms race dynamic?

Why This Page Exists

Transparency about limitations builds more trust than bold claims. Web4 is serious research, not vaporware. Documenting attack surfaces and open questions is how we invite rigorous engagement.

If you're a security researcher, these are the places to probe. If you're evaluating Web4 for real use, these are the risks to consider. If you're contributing to the project, these are the highest-value problems to solve.

Engage with the hard questions. The best contributions aren't just codeβ€”they're better threat models, empirical attack data, and proofs that our assumptions are wrong.

← Back to HomeHow Web4 Works β†’

Found a new attack vector? Have empirical data on these questions? Open an issue on GitHub.

Interactive Tools
View all tools β†’
← Previous
Adversarial
Experiment
Next β†’
Networks
Experiment
Also explore
AdversarialNetworksCoherence
Terms glossary