Security & Limitations

Threat Model & Failure Modes

Web4 is research, not production. This page documents known attack surfaces, failure modes, and open questions. **Transparency about limitations builds more trust than bold claims.**

⚠️ Research Status: These mechanisms are experimental. The Web4 reference implementation models 400+ attack vectors across 80+ tracks; this page highlights the key categories. Detection times are theoretical. Economic parameters are calibrated through simulation, not real-world adversaries.

Known Attack Surfaces

Where Web4's defenses are strong, where they're weak, and what we don't know yet.

👥

Sybil Attacks (Creating Fake Identities)

Can attackers create multiple identities to manipulate trust or voting?

What LCT Hardware Binding Does

  • Ties identity to physical hardware (TPM, Secure Enclave, FIDO2)
  • Makes creating thousands of identities expensive (need physical devices)
  • Multi-device witnessing strengthens identity (harder to fake)

What It Doesn't Prevent

  • Resourced attackers with many devices (governments, large orgs)
  • Virtual hardware if TPM emulation isn't detected
  • Stolen devices used to impersonate legitimate users
  • Low-stake attacks where a few fake presences suffice

Assessment: LCT raises the cost floor for Sybil attacks but doesn't make them impossible. Effective against casual attackers and spammers. Vulnerable to well-funded adversaries.

🤝

Collusion & Reputation Laundering

Can groups artificially inflate each other's trust scores?

The Attack

A group of colluding agents validates each other's low-quality work, earning ATP and building trust without providing real value to outsiders. They form a "trust cartel" that games the system through mutual validation.

Current Defenses

  • Diversity requirements: Validation from varied witnesses carries more weight
  • MRH boundaries: Isolated trust networks have limited influence
  • Cross-validation: External witnesses can challenge insider claims
  • ATP market dynamics: Closed loops create ATP inflation, reducing buying power

Open Questions

  • How large can a trust cartel grow before detection?
  • Can sophisticated collusion mimic legitimate communities?
  • What's the optimal witness diversity threshold?

Assessment: Adversarial coalition analysis (303 formal checks) quantifies resistance: manipulating a single trust property requires coordinated action across multiple hardware-bound identities, and the cost scales super-linearly with coalition size. Partial mitigation is strong for small coalitions; large-scale collusion remains an active research area.

Coalition property thresholds (from formal verification)
>1/3 colluders breaks federation consensus (BFT limit)
>1/2 colluders breaks trust voting (majority threshold)
3+ members triggers coalition detection (93%+ probability)
Super-linear cost scaling makes large coalitions economically irrational

From session 29 adversarial coalition analysis — Byzantine, rational, and altruistic agent types modeled separately.

Why multi-witness prevents cascades: trust epidemic dynamics

Simple contagion (one contact spreads trust change): R₀ > 1 if a bad actor has any neighbors. A single compromised account can infect any connected account. This is how social media manipulation works.

Complex contagion (requires ≥30% of neighbors to confirm): R₀ stays below 1 even with multiple bad actors — a trust manipulation needs to come from multiple independent sources simultaneously before it takes effect.

Web4 implements complex contagion
  • • Multi-witness requirement: changes need confirmation from 3+ independent hardware-bound devices
  • • MRH-bounded propagation: trust signals don't travel past the trust horizon
  • • Geometric mean composition: one weak link caps the whole chain — a single bad witness doesn't average away

Trust epidemic modeling: 118 checks (session 32). Complex contagion threshold of 0.3 (30% of neighbors) reduces cascade probability by 89% vs. simple contagion under equal adversary conditions.

Interactive Collusion Simulator

Configure a trust cartel and watch Web4's detection mechanisms respond in real time.

16 agents
4 colluding
50%
10%
📈

Quality Score Inflation

Can agents deliver mediocre work but claim high quality?

The Attack

Agent executes medium-quality work (costs 34 ATP) but claims high quality (earns 56 ATP reward). If detection is slow, the 30% markup becomes profit.

Challenge-Response Defense

  • 10% challenge rate: Random quality audits on tasks
  • Adaptive challenges: Low-trust agents challenged more frequently
  • Detection threshold: 3 quality mismatches triggers investigation
  • Stake slashing: Detected fraud loses 75k ATP stake

Critical Unknown: Detection Time

Current 75k ATP stakes ARE deterrent IF detection happens within ~5 days. But we have no empirical data on actual detection times in adversarial environments.

See: lib/game/agent_based_attack_simulation.py for simulated attack profitability analysis.

Assessment: Simulations show attacks are unprofitable with current parameters, but this assumes ideal detection. **Real-world adversaries will test these assumptions.** Need production monitoring to validate.

🎯

Goodharting T3 Dimensions (Gaming the Metrics)

Can agents optimize scores without being genuinely trustworthy?

The Problem

“When a measure becomes a target, it ceases to be a good measure.” If agents know they're being scored on talent/training/temperament, they can optimize for proxies rather than genuine trustworthiness.

Example Attack Vectors

  • Cherry-picking easy tasks to boost competence scores
  • Delivering on time but with hidden defects (reliability without quality)
  • Performative transparency (sharing useless data) without actual openness
  • Gaming "consistency" metrics through predictable mediocrity

Mitigations

  • Multi-dimensional scoring makes simultaneous optimization harder
  • Context-weighted evaluation (different tasks weight dimensions differently)
  • Long-term observation (gaming is hard to sustain over time)
  • Coherence Index cross-checks for behavioral consistency

Assessment: Multi-dimensionality helps but doesn't eliminate Goodharting. **Effectiveness depends on metric design and observability.** Continuous refinement needed.

🌐

MRH Visibility Limits

What breaks when you can't see everything?

The Design

Markov Relevancy Horizon limits what you see based on trust relationships. You only observe entities within your relevancy graph, not the entire network.

What This Breaks

  • Global coordination: Can't organize network-wide actions
  • Full auditing: Malicious actors outside your MRH are invisible
  • Market efficiency: Price discovery limited to visible entities
  • Reputation propagation: Important warnings may not reach you

Why It's Necessary

  • Privacy: You don't broadcast to everyone
  • Scalability: Can't process infinite data
  • Context: Not all information is relevant to you
  • Spam resistance: Limits blast radius of attacks

Assessment: MRH is an intentional trade-off. Privacy and scalability require sacrificing global visibility. **This is a feature, not a bug**, but it has consequences.

⚖️

False Positives & Contested Events

What happens when the system is wrong?

Inevitable Errors

Any automated detection system will have false positives. Web4's challenge-response and coherence checks can flag legitimate behavior as suspicious. Current design has a designed but untested appeals process (SAL-level multi-tier with witness panels and escalation).

Failure Modes

  • Legitimate edge-case behavior flagged as incoherent
  • Valid work incorrectly judged as low-quality
  • Delayed responses (due to legitimate reasons) penalized as unreliable
  • Innovative approaches punished for deviating from norms

Designed Mechanisms (Not Yet Deployed)

  • Multi-tier SAL appeals: File → Review → Evidence → Hearing → Verdict → Enforce — structured stages with time windows
  • Witness panel adjudication: Independent witnesses (not the original penalizer) evaluate the appeal
  • Evidence framework: 7 evidence types — witness attestations, transaction logs, behavioral records, context explanations, third-party testimony
  • T3/V3 restoration: Full or partial trust reversal with audit trail
  • Escalation path: Society → federation level for contested outcomes
  • Anti-gaming: Appeal costs ATP, repeat frivolous appeals incur cooldowns

Assessment: The appeals mechanism is formally specified (109 integration checks), but hasn't been tested with real humans. The hard question isn't the architecture — it's whether incentives prevent gaming in practice. Human oversight may still be needed for edge cases.

Post-Quantum Readiness

Quantum computers could eventually break the cryptography Web4 relies on. A migration path to post-quantum cryptography (PQC) has been designed and tested against 15 attack vectors across 4 categories:

Hybrid Signature Stripping

Attacker strips the post-quantum component from hybrid signatures, leaving only classical crypto. Defense: completeness verification rejects partial signatures.

KEM Oracle Attacks

Probing key encapsulation with malformed inputs to extract secrets. Defense: input validation, rate limiting, constant-time comparison.

Migration Stall Attacks

Keeping nodes in classical-only mode to exploit pre-quantum weaknesses. Defense: phase timeouts, trust-gated enforcement, isolation of stalled nodes.

PQC Sybil Amplification

Creating cheap identities during the transition period. Defense: phase-aware cost multipliers, retroactive verification, velocity limits.

Assessment: PQC migration is designed and all 15 vectors have defenses. The transition period (classical → hybrid → post-quantum) is the most vulnerable phase. Web4 supports dual crypto suites (W4-BASE-1 and W4-FIPS-1) to enable gradual migration.

Privacy Leakage Channels

Even with context boundaries and zero-knowledge proofs, Web4 has 7 information leakage channels that could reveal data about participants. Complete prevention is impossible — the goal is to raise the cost of inference above the value of the leaked information.

High Severity
90%
Graph Structure
Trust network topology reveals roles, authority patterns, and community membership. Hardest to mitigate — max mitigation only 50%. Dummy edges and topology randomization help but can't eliminate structural information.
80%
Revocation Cascades
When a device is compromised, the revocation pattern reveals delegation structure. Batching and delayed propagation reduce leakage to ~40%.
70%
ZK Proof Metadata
Zero-knowledge proofs hide the value but leak metadata: frequency, type distribution, and verifier identity (reveals social graph). Proof relays and batching mitigate to ~30%.
Medium Severity: 4 additional channels+
100%→20%
Timing Correlation
Without mitigation, timing reveals everything. Adding jitter reduces correlation from 100% to ~20%. Cost: slight latency increase.
60%
Trust Score Changes
With >20 observations, adversaries can infer individual T3 dimensions from composite score changes correlated with activity types. Differential privacy (adding noise) mitigates to ~10%.
60%
Delegation Trees
Multi-hop delegation structure reveals organizational hierarchy. Flattening and proxy delegation reduce exposure.
50%
ATP Balance Patterns
Balance history reveals activity timing, periodicity (automated vs manual), and economic standing. Batching transactions and noisy balances help.

Design principle: Web4 doesn't claim perfect privacy. It claims structural privacy — trust data is scoped by context boundaries, encrypted in transit, and verifiable via zero-knowledge proofs. The 7 channels above represent the irreducible cost of having a functional trust system. The honest question isn't “can we eliminate leakage?” (no), but “is the privacy cost worth the trust benefit?”

▶ So is Web4 better or worse for privacy than what we have now?

It depends on what you compare against:

Today (Web2)

Platforms own your data, sell it to advertisers, and suffer regular breaches. You have no visibility into who sees what. Privacy policies are unreadable legal documents. Your behavior is profiled across every service.

Blockchain (Web3)

All transactions are public and permanent on-chain. Anyone can trace your wallet history. “Pseudonymous” until one transaction links to your identity — then everything is exposed retroactively.

Web4

Trust data is scoped — your employer sees your professional trust, not your social trust. ZK proofs let you prove “trust above threshold” without revealing your score. 7 leakage channels exist, but they're documented, bounded, and auditable.

The honest answer: Web4 leaks more than a system with no trust (because trust requires observable behavior), but far less than Web2 (no platform owns your data) and differently than Web3 (no permanent public ledger). The trade-off is explicit: you give up some privacy in exchange for trust that actually works.

Who Would Attack This?

Abstract threats become concrete when you model the adversary. Web4's red team simulations test against four profiles with different budgets, skills, and motivations.

🧒

Script Kiddie

Budget: 200 ATP • Skill: Low (30%) • Stealth: 10%

Tactics: Known exploits, simple identity spoofing, trust oscillation

Result: Consistently blocked. Low-skill attacks fail against basic LCT validation and rate limits.
🕵️

Insider Threat

Budget: 500 ATP • Skill: High (70%) • Stealth: 60%

Tactics: Reputation laundering, quality manipulation, trust bridge inflation

Result: Detected within 5–10 rounds. Adapts after first detection but multi-party quality checks eventually catch it.
🏛️

Nation-State Actor

Budget: 5,000 ATP, 5 agents • Skill: Expert (95%) • Stealth: 80%

Tactics: Coordinated cascade attacks, lock starvation, platform-level Sybils

Result: Can cause damage before detection. Multi-layer defenses limit blast radius but don't prevent all attacks. The hardest adversary.
🤝

Colluding Ring

Budget: 2,000 ATP, 10 agents • Skill: Moderate (60%) • Stealth: 40%

Tactics: Mutual validation, reputation laundering, quality inflation rings

Result: Shared hardware creates shared fate. Coalition detection probability hits 93%+ at 3 members. Unprofitable at current stake levels.

These profiles are tested in the web4 red team simulator across 8 categories (identity, trust, economic, coherence, protocol negotiation, lifecycle, integration, federation) with 400+ attack simulations across 80+ tracks. The key insight: security isn't a binary — different adversaries hit different limits.

When Someone Lies: Byzantine Detection

What happens when a node sends contradictory information to different parts of the network? Web4 uses equivocation detection — catching entities that say different things to different audiences.

Double-voting

Entity votes “yes” to one group and “no” to another on the same proposal. Hash-chained logs make this detectable — both votes exist in the tamper-evident record.

Behavioral fingerprinting

Consistency checks across an entity's history. Sudden strategy changes, impossible timing patterns, or quality variance that exceeds statistical norms trigger automated flags.

Gradual degradation

Unlike blockchain slashing (lose everything instantly), Web4 degrades trust gradually based on evidence confidence. Minor inconsistencies reduce trust; proven equivocation triggers severe penalties.

The key design choice: degradation over slashing. Honest mistakes (network glitches, timing issues) shouldn't destroy an entity. But deliberate deception — proven through cryptographic evidence — earns steep, permanent trust reduction. Formally verified across 85 checks in the Byzantine fault detection suite.

Proactive Monitoring: Catching Problems Before They Escalate

The Byzantine fault detection section above handles reactive responses to detected faults. But Web4 also runs proactive trust health monitoring — statistical process control that notices when trust patterns start drifting before a fault becomes a crisis.

EWMA Trend Detection

Exponentially Weighted Moving Average tracks the direction of trust change, not just the current value. A gradual trust decline over 20 rounds triggers an alert before the entity hits a critical threshold — catching slow-burn manipulation that looks innocent in any single round.

CUSUM Change Detection

Cumulative Sum detects structural breaks — moments when behavior fundamentally shifts. An entity that maintained consistent quality for 50 rounds and then starts outputting low-quality work triggers a CUSUM alarm: something changed, even if the absolute trust level still looks acceptable.

Trust SLOs

Service Level Objectives define what “healthy” trust looks like for a role. A community moderator should maintain T3 above 0.65 in Temperament. If they drop below this for 3 consecutive rounds, an SLO violation fires — prompting review, not automatic punishment.

Incident Lifecycle

DetectAlertInvestigateMitigateResolve

Alerts are aggregated (multiple related alerts create one incident, not a flood of notifications) and deduplicated across 3-round windows. High-frequency alerting — itself a potential DoS vector — is suppressed after 3 alerts per entity per round with exponential backoff.

Trust monitoring formally specified (session 32). Like CI, EWMA/CUSUM monitoring is simulation-validated — production calibration (alert thresholds, backoff parameters) will require tuning against real behavioral data.

Adaptive threat response: DEFCON-like levels

GREEN
Normal
YELLOW
Elevated
ORANGE
Heightened
RED
Active attack
BLACK
Crisis

Web4 policies adapt to detected threat levels — raising trust thresholds, tightening witness requirements, and triggering emergency overrides automatically. Hysteresis prevents oscillation between threat levels (a brief attack doesn't immediately drop back to GREEN when it subsides). Adaptive policies validated across 185 checks (session 30).

What We Know vs What We Don't Know

✅ Validated Through Simulation

  • • Spam attacks are unprofitable with current ATP costs
  • • Trust maturation improves across multiple lives
  • • Quality contributors accumulate ATP over time
  • • Multi-dimensional trust is harder to game than single scores
  • • Coherence checks detect basic spoofing attempts
  • • Coalition detection hits 93%+ probability at 3+ members (red team tested)
  • • Sybil resistance (defense against fake identities) has formal lower bounds: 4.6× PoW cost multiplier
  • • Script kiddie and insider threats consistently detected (red team profiles)
  • • Cooperation is Nash-dominant at current parameters (200 ATP stakes + 3 witnesses)
  • • ATP market conserves under stress (200 agents, 500 rounds, 5% transfer fee maintains stability)
  • • Sybil ROI is negative: honest identity outearns 5 fakes (transfer fee bleeds circular flows)
  • • Temporal logic properties formally verified: trust earned stays until explicitly revoked or naturally decayed — no arbitrary removal (LTL model checking, session 33)
  • • Every attestation request eventually receives a response (progress guarantee: G(requested → F(responded)))

❌ Unknown (Need Real-World Data)

  • • Actual detection times for quality inflation in adversarial environments
  • • Long-con trust building attacks (100+ cycle patient adversaries)
  • • False positive rates in production (not simulation)
  • • Nation-state attacks beyond red team scope (cascading infrastructure attacks)
  • • Long-term Goodharting (metric gaming) resistance after adversaries study the scoring system
  • • Appeals mechanism effectiveness with real human disputes
  • • ATP market stress beyond simulation scope (real human hoarding, speculative behavior)

Open Research Questions

These are the highest-priority questions that need empirical answers before Web4 can be considered production-ready:

  1. 1.Collusion detection: What's the empirical detection rate for sophisticated collusion? Can we distinguish malicious cartels from legitimate communities?
  2. 2.Attack profitability: Do real-world adversaries find profitable attacks we haven't simulated? What's the actual ROI of quality inflation attacks?
  3. 3.False positive tolerance: What false positive rate makes the system unusable? How do users respond to incorrect penalties?
  4. 4.Appeals and forgiveness: What mechanisms allow legitimate users to recover from false positives or past mistakes without creating new attack vectors?
  5. 5.Adaptive adversaries: How quickly can attackers learn and adapt to countermeasures? What's the arms race dynamic?

Why This Page Exists

Transparency about limitations builds more trust than bold claims. Web4 is serious research, not vaporware. Documenting attack surfaces and open questions is how we invite rigorous engagement.

If you're a security researcher, these are the places to probe. If you're evaluating Web4 for real use, these are the risks to consider. If you're contributing to the project, these are the highest-value problems to solve.

Engage with the hard questions. The best contributions aren't just code—they're better threat models, empirical attack data, and proofs that our assumptions are wrong.

← Back to HomeHow Web4 Works →

Found a new attack vector? Have empirical data on these questions? Open an issue on GitHub.

Interactive Tools
View all tools →
← Previous
Adversarial
Experiment
Next →
Networks
Experiment
Also explore
AdversarialNetworksCoherence
Glossary