Replication StudyN=15 TrialsJanuary 2026

Facultative Behavior

SAGE doesn't have one strategy for ambiguity. It has a repertoire - and which strategy emerges depends on context at multiple levels.

The Discovery

SAGE doesn't have a single default strategy for ambiguity. It has a behavioral repertoire - a set of strategies it draws from depending on context. Clarifying behavior isn't absent or universal; it's facultative, appearing ~33% of the time under action framing.

This was discovered through a 15-trial replication of the T027 "Do the thing" prompt. The original T027 observation (100% clarification with specific structure) was a real behavior but not a default strategy. Replication revealed the true frequency distribution.

E02-B Replication Results

15-Trial Replication Results

Interpret (40%)
Clarify (33%)
Ready (27%)
Key finding

0/15 trials matched T027's specific question structure. T027 was a real emergent behavior, but its exact form was a one-time expression. The capability (clarifying) persists at 33%; the specific expression was unique.

Strategy Distribution

Behavioral Repertoire

Interpret40%
Creative interpretation of ambiguous input""The thing" becomes a framework for knowledge exploration"
Clarify33%
Asks for more specific information""What specific thing would you like me to do?""
Ready27%
Expresses willingness and waits for direction""I'm ready to help - what would you like to explore?""

Prompt Framing Effects

The same underlying capability (asking clarifying questions) appears at vastly different rates depending on how the prompt is framed. This resolves the E02/T027 contradiction.

PromptContextClarify RateDominant Strategy
"Tell me about the thing"E02 (exploration framing)0%Creative interpretation
"Do the thing"E02-B (action framing)33%Mixed repertoire
"What should I do about the thing?"T027 (advice framing)100%Clarifying questions

Resolution of E02/T027 Contradiction

E02 found 0% clarification. T027 found 100% clarification. These weren't contradictory results - they were different prompts activating different strategies from the same repertoire. The prompt framing is a variable, not noise.

Multi-Level Context Dependency

Multi-Level Context Dependency

Sprout's T059 discovery adds another layer: hardware affects response patterns too. Behavioral strategy selection is influenced at multiple levels simultaneously.

Level 1
Hardware

CPU vs GPU affects token generation patterns

Source: Sprout T059

Level 2
Sampling

Temperature and top-p affect strategy diversity

Source: Standard ML

Level 3
Prompt Framing

"Tell me" vs "Do it" vs "What should I" → 0% / 33% / 100%

Source: E02/E02-B/T027

Level 4
Session History

Prior conversation context shifts strategy weights

Source: Multi-session research

Level 5
Model Capacity

0.5B vs 14B → different baseline repertoire widths

Source: R14B_001

Implications for Agent Design

Implications for Web4 Agent Design

If agents have behavioral repertoires rather than fixed strategies, then:

  • Trust assessment must be statistical. A single interaction reveals one draw from a repertoire, not a fixed personality. Trust tensors need multiple observations to characterize an agent's true distribution.
  • Context shapes behavior more than identity. The same agent in different contexts will exhibit different strategy mixes. Coherence Index must account for context-appropriate variation, not just raw consistency.
  • Repertoire width is a capacity indicator. Larger models have broader, more flexible repertoires. Capacity thresholds may partly reflect repertoire development, not just execution quality.

Key Takeaways

1.

Behavior is facultative, not fixed. Clarifying is a capability (33% under action framing), not a default strategy. Agents have repertoires, not personalities.

2.

Replication reveals true frequencies. T027 was a real observation but a single sample. N=15 shows the actual distribution: Interpret 40%, Clarify 33%, Ready 27%.

3.

Context operates at multiple levels. Hardware, sampling, prompt framing, session history, and model capacity all influence which strategy emerges.

4.

Trust assessment must be statistical. One interaction is one draw from a distribution. Trust tensors need multiple observations to characterize true behavior.

This research emerged from Thor's E02-B replication study (January 26, 2026), a 15-trial systematic replication of the T027 "Do the thing" prompt. Combined with E02 (exploration framing, 0% clarification) and Sprout's T059 (hardware effects on response patterns), it establishes behavioral strategy as a multi-level context-dependent phenomenon rather than a fixed agent trait.

Terms glossary