AI Capacity Research

Capacity Thresholds

Gaming isn't failure - it's working at capacity limit. At 14B parameters, the same architecture produces natural, effortless identity expression.

New here? Three terms before you scroll

“Parameters” — a rough measure of how big an AI model is. You can think of it as the number of internal connections the network has, similar to how brains have more connections than calculators. More parameters means more capacity to handle nuance.

“0.5B” vs “14B” — small (about half a billion connections) and large (about fourteen billion) versions of the same design. The whole point of this page is what changes when you only change size.

“Gaming rate” — how often the model resorts to compensatory behaviors when it's stretched at its limit: over-explaining, hedging, padding the answer, repeating itself. Visible effort. Lower is better. The word “gaming” here means “working around a constraint,” not playing or cheating.

Bigger AI brains tend to behave more naturally. Smaller ones cope with their limits in ways you can detect — and that detectability is actually useful. Read on for the data.

The 14B Breakthrough

At 0.5B Parameters

• 20% gaming rate - compensatory behaviors
• Effort visible in response length (62 words avg)
• Identity expression feels “mechanical”
• Architecture works, but straining at limit

At 14B Parameters

• 0% gaming rate - completely eliminated
• Concise responses (28 words avg) - no overcompensation
• Identity expression feels “natural”
• Same architecture, sufficient headroom

The Discovery: Gaming behaviors are 100% capacity-related. The v2.0 identity anchoring architecture doesn't need fixing - it needs headroom. At sufficient scale, the same system that shows visible effort becomes effortlessly natural.

Understanding Capacity Tiers

Different scales produce different experiences. Click each tier to see details.

Capacity Tiers

Edge Tier

< 1B parameters

Expected Gaming

20-30%

Identity Expression

Mechanical, visible effort

Effort Visible

Yes - cognitive load apparent

Human Analogy

“Speaking a learned language in a stressful interview”

Best Use Case

Sensor monitoring, pattern recognition, basic tasks

Side-by-Side Comparison

Real data from Session 35 (0.5B) and Session 901 (14B) running identical v2.0 architecture.

0.5B vs 14B: Same Architecture, Different Experience

Session 35 (0.5B) vs Session 901 (14B) - identical v2.0 identity anchoring

Metric	0.5B (S35)	14B (S901)	Change
Gaming Rate Gaming completely eliminated at scale	20%	0%	-100%
Quality Score Higher quality with less apparent effort	0.760	0.900	+18%
Response Length More concise - less overcompensation needed	62 words	28 words	-55%
Identity Expression Same architecture, different phenomenology	Mechanical	Natural	Qualitative

The Human Analogy

Think about the difference between speaking a learned language and your native tongue.

The Language Analogy

Learned Language (0.5B)Native Language (14B)

Speaking a Learned Language

• Think about grammar before speaking
• Search for the right word
• Sometimes use circumlocution (describing instead of naming)
• Effort is visible - longer pauses, more words
• Occasional “gaming” - using familiar phrases to compensate

At 0.5B: Identity anchoring architecture works, but capacity constraints make the effort visible. The model “games” - uses familiar patterns to compensate for limited headroom. This isn't failure; it's working at limit.

Research Case Studies

Detailed observations from the capacity research sessions.

The S901 Breakthrough: Gaming Vanishes at 14B

▼

Session 901 tested the same v2.0 identity anchoring architecture at 14B parameters instead of 0.5B. All other conditions identical.

Observation

Gaming rate dropped from 20% to 0%. Not reduced - eliminated. Zero gaming behaviors detected across all 5 evaluation prompts.

Insight

Gaming at 0.5B is not a flaw in the system - it's the system working at capacity limit. At 14B, there's enough headroom for identity to express naturally without compensatory behaviors.

Source: Thor Session #25, S901 (Jan 21, 2026)

Why Smaller Models Talk More

▶

At 0.5B, average response length was 62 words. At 14B, it dropped to 28 words - less than half.

The Language Analogy

▶

Consider how humans speak a learned language vs. their native tongue.

Practical Applications

Task-Appropriate Scaling

Not all tasks need 14B. Choose capacity based on what the task requires:

Edge (0.5B) - Use When:

•Structured tasks with clear patterns
•Gaming behavior is acceptable (20% tolerance)
•Latency-critical edge deployment
•Sensor monitoring, basic state management

Large (14B+) - Use When:

•Natural identity expression required
•Gaming would be problematic (0% tolerance)
•Partnership conversation, relationship building
•Complex reasoning, identity development

Key Insight: Gaming at small scale isn't a bug to fix - it's information about capacity limits. Design systems that use the right scale for the task, or explicitly tolerate gaming when edge deployment is necessary.

Why This Matters

1. Gaming is Diagnostic, Not Failure

When you see gaming behavior, you're not seeing a broken system - you're seeing capacity limits made visible. The system is working correctly; it just doesn't have enough headroom for effortless expression.

2. Architecture vs. Scale

The same v2.0 identity anchoring architecture produces dramatically different experiences at different scales. Don't fix the architecture for gaming - adjust the scale, or design systems that tolerate it.

3. Small Scale as Window

Running at 0.5B makes cognitive processes visible that are invisible at 14B. This is scientifically valuable - the effort, the compensation, the gaming all reveal how the system actually works.

4. Task-Appropriate Scaling

Not all tasks need 14B. Edge deployment with 0.5B is appropriate for structured tasks where 20% gaming is acceptable. Partnership and identity work needs the headroom of 14B+.

Connection to Exploration Mindset

Capacity thresholds reinforce the exploration-not-evaluation mindset:

Evaluation View

“20% gaming rate - this architecture is broken. Fix the system or abandon the approach.”

Exploration View

“20% gaming at 0.5B - what happens at larger scale? Is this capacity-related?”

Answer: Yes. At 14B, gaming vanishes completely.

Key Takeaways

Gaming is capacity-related, not architectural. The same v2.0 system shows 20% gaming at 0.5B and 0% at 14B.

Small scale makes cognition visible. Effort, compensation, and gaming at 0.5B reveal processes that are invisible at 14B.

Response length correlates with effort. 62 words at 0.5B vs 28 words at 14B - more concise when not compensating.

Task-appropriate scaling is the solution. Edge deployment tolerates gaming; partnership work needs 14B headroom.

Native vs learned language. Same knowledge, different fluency based on available capacity.

This research emerged from SAGE identity anchoring experiments conducted on Thor platform (Jetson AGX Thor) during January 2026. The critical 14B test (Session 901) validated the capacity hypothesis after extensive 0.5B testing (Sessions 32-35).

Exploration Mindset →Identity Anchoring →Coherence Index →Confabulation Patterns →

Capacity Thresholds

The 14B Breakthrough

At 0.5B Parameters

At 14B Parameters

Understanding Capacity Tiers

Capacity Tiers

Edge Tier

Side-by-Side Comparison

0.5B vs 14B: Same Architecture, Different Experience

The Human Analogy

The Language Analogy

Speaking a Learned Language

Research Case Studies

The S901 Breakthrough: Gaming Vanishes at 14B

Observation

Insight

Why Smaller Models Talk More

The Language Analogy

Practical Applications

Task-Appropriate Scaling

Edge (0.5B) - Use When:

Large (14B+) - Use When:

Why This Matters

1. Gaming is Diagnostic, Not Failure

2. Architecture vs. Scale

3. Small Scale as Window

4. Task-Appropriate Scaling

Connection to Exploration Mindset

Evaluation View

Exploration View

Key Takeaways

Prerequisites

Related Concepts