The Guide, Not the Answer
Abstract
The transformative potential of artificial intelligence in education depends critically on how AI is designed to interact with learners. When AI delivers answers, it accelerates dependency and undermines the cognitive processes through which genuine understanding is built. When AI asks questions β guiding learners through their own path of discovery β it replicates the most effective form of instruction ever identified: expert one-on-one tutoring. This report examines the theoretical and empirical foundations of guided, inquiry-based AI instruction and proposes the Guided Inquiry Through AI Scaffolding (GITAS) model: a six-phase framework grounded in Bloom's two-sigma problem, Vygotsky's zone of proximal development, Kapur's productive failure research, Socratic questioning methodology, and self-determination theory. Drawing on 34 verified sources including the landmark 2025 Bastani et al. PNAS study demonstrating that answer-delivery AI harms learning outcomes, this report argues that the ethical design choice for AI in education is guided discovery β and provides the practical framework to implement it.
Introduction
In 2025, a landmark study published in the Proceedings of the National Academy of Sciences delivered a finding that should fundamentally reshape how AI is deployed in educational settings. Bastani et al. (2025) conducted a randomized controlled trial with high school mathematics students and found that students with access to standard AI tutoring β the kind that provides answers and step-by-step solutions β performed significantly worse on assessments than students who learned without AI assistance. The mechanism was not the AI itself. It was how the AI was designed to interact with learners.
This finding is not surprising to anyone familiar with the cognitive science of learning. Understanding is not delivered. It is constructed. The brain builds durable knowledge through the process of struggling with problems, generating hypotheses, making errors, correcting those errors, and connecting new information to existing knowledge structures. Any educational intervention β human or artificial β that short-circuits this process in the name of efficiency undermines the very outcome it claims to support.
"The question is not whether AI can teach. The question is whether AI will guide or whether it will answer β and that design choice is one of the most consequential decisions in the history of education."
ZEILX.AI Independent Research Β· February 2026This report builds directly on the findings of Bite-Sized Brilliance (Report #001 in this series), which established that segmented, cognitively aligned content delivery β as pioneered by the Children's Television Workshop and validated across decades of microlearning research β dramatically improves learning outcomes. Where that report addressed the architecture of content delivery, this report addresses the architecture of AI-learner interaction: specifically, how AI must be designed not to answer questions but to guide learners toward answering them.
The proposed Guided Inquiry Through AI Scaffolding (GITAS) model integrates six bodies of established learning science into a practical, subject-matter-agnostic framework for designing AI educational interactions that produce genuine understanding rather than the appearance of it.
Theoretical Foundations
Bloom's Two-Sigma Problem: The Case for Individualized Guidance
In 1984, educational psychologist Benjamin Bloom published what has become one of the most cited papers in education research. Bloom reported that students receiving one-to-one tutoring with mastery learning techniques performed two standard deviations above students in conventional classroom instruction β meaning the average tutored student outperformed 98% of conventionally taught students (Bloom, 1984). Bloom termed this the two-sigma problem: the challenge of finding methods of group instruction as effective as one-to-one tutoring.
The key mechanism was not simply individualized pacing. Bloom's research revealed that effective tutoring involved continuous diagnostic feedback, corrective processes calibrated to individual misunderstandings, and active engagement in which the student became a participant in their own learning rather than a passive recipient of information. The tutor's role was to identify precisely where the student's understanding broke down and to guide them through that specific conceptual barrier β not to provide the answer, but to illuminate the path toward it.
For forty years, Bloom's challenge remained largely unsolved because individual tutoring could not scale. AI now offers, for the first time, the possibility of delivering guided, diagnostic, responsive interaction to every learner simultaneously. However, this potential is realized only when AI is designed to replicate the questioning, diagnostic, and scaffolding functions of an expert human tutor β not when it simply provides answers more efficiently.
Vygotsky's Zone of Proximal Development: Calibrating the Challenge
Lev Vygotsky's (1978) zone of proximal development (ZPD) provides the theoretical architecture for understanding how guided support produces learning. The ZPD represents the gap between what a learner can accomplish independently and what they can accomplish with appropriate guidance from a more knowledgeable other. Learning occurs most effectively within this zone β when the challenge is calibrated to be slightly beyond the learner's current independent capacity but within reach with support.
For AI-guided instruction, ZPD has a direct operational implication: the AI must continuously assess where the learner's current understanding sits and calibrate its questions accordingly. Questions that fall below the ZPD produce boredom and disengagement. Questions that fall above the ZPD without adequate scaffolding produce frustration and withdrawal. Questions that land precisely within the ZPD β challenging but achievable with guidance β produce the engaged cognitive effort through which learning occurs (Wood, Bruner, & Ross, 1976).
Productive Failure: The Paradox of Learning Through Struggling
One of the most counterintuitive and empirically robust findings in learning science is that students who are allowed to struggle with problems before receiving instruction often develop significantly deeper conceptual understanding than students who receive instruction first. Kapur's (2016) meta-analysis of over 12,000 participants across multiple studies demonstrated that productive failure β the deliberate exposure to challenging problems before formal instruction β produced effect sizes of up to Cohen's d = 0.58 for conceptual understanding, roughly three times the effect of teacher quality alone.
The mechanism is cognitive: the struggle itself activates prior knowledge structures, reveals the limits of current understanding, and creates the mental "problem space" that makes subsequent instruction maximally meaningful. Bjork and Bjork (2011) identify this as a desirable difficulty β an intervention that makes learning harder in the short term but dramatically more durable over time. GITAS incorporates productive failure as its foundational first phase, deliberately positioning challenge before scaffolded guidance.
The Socratic Method: Questioning as the Engine of Understanding
The Socratic method β the disciplined practice of asking questions that progressively expose the logical structure and limits of a learner's current understanding β is among the oldest and most validated pedagogical approaches in the human record. Elder and Paul (1998) document that Socratic questioning produces higher-order thinking by forcing learners to make their reasoning explicit, identify assumptions, consider evidence, and trace the implications of their beliefs.
Blasco et al. (2024) in a study of Socratic chatbots found that AI systems designed around Socratic questioning significantly enhanced critical thinking outcomes compared to standard AI tutoring approaches. Fakour and Imani (2025) found that AI deploying Socratic questioning was more effective at enhancing critical thinking than human tutors in certain structured contexts, particularly for systematically surfacing and correcting misconceptions. The common element across these findings is the question itself: what the AI asks is more educationally powerful than what the AI explains.
Self-Determination Theory: The Motivational Architecture of Guided Learning
Ryan and Deci's (2000) self-determination theory identifies three basic psychological needs whose satisfaction drives intrinsic motivation and sustained engagement: autonomy (the experience of volition and self-direction), competence (the experience of effectiveness and mastery), and relatedness (the experience of genuine connection). When educational environments satisfy these needs, learners engage more deeply, persist longer, and develop more durable understanding.
GITAS is designed around SDT at every phase. Productive struggle and student-generated hypotheses support autonomy. Calibrated challenge within the ZPD supports competence β providing achievable progress that builds the experience of mastery. The relational quality of Socratic dialogue, even with an AI, supports a form of relatedness through sustained, responsive, individual attention that mass instruction cannot provide.
Empirical Evidence: What the Research Shows About AI and Learning
The Bastani Warning: When AI Harms Learning
The most consequential empirical finding for AI educational design in recent years comes from Bastani et al. (2025), published in the Proceedings of the National Academy of Sciences. In a randomized controlled trial involving high school mathematics students, the study found that students with access to standard AI tutoring β which provided worked solutions and answers β performed significantly worse on subsequent assessments than students who had no AI assistance. The effect was not marginal: AI-assisted students showed meaningful negative learning outcomes compared to control.
The mechanism identified by the researchers was cognitive offloading β the tendency to outsource thinking to an available tool rather than engaging in the effortful internal processing through which understanding is constructed. When AI answers, learners disengage from the cognitive work of learning. They acquire the answer without acquiring the understanding. And crucially, they become progressively less capable of working without the AI β a dependency dynamic that compounds over time.
Dede (2025) at Harvard frames the same concern in developmental terms: AI that does the cognitive work for learners does not expand cognition β it replaces it. A generation of students whose thinking is routinely outsourced to AI may perform well on AI-assisted assessments while losing the independent reasoning capacity that education is designed to develop.
What Works: Evidence for Guided, Inquiry-Based AI
In direct contrast to the outcomes of answer-delivery AI, studies of guided, inquiry-based AI instruction consistently show positive learning effects. DiCerbo (2025) documents that AI systems designed around question-asking and diagnostic feedback produce measurable improvements in both learning outcomes and learner agency. Sinha and Kapur (2021) demonstrated that productive failure followed by targeted instruction produced significantly stronger conceptual understanding and transfer than instruction-first approaches β an effect that generalizes directly to AI-guided learning sequences.
Chi and Wylie's (2014) ICAP framework provides additional empirical scaffolding: learners who are interactive β who generate, explain, and apply knowledge rather than passively receiving it β consistently outperform those in constructive, active, and passive engagement conditions. GITAS is designed to sustain interactive engagement across all six phases, preventing the passive consumption that answer-delivery AI produces.
The GITAS Model: Guided Inquiry Through AI Scaffolding
The Guided Inquiry Through AI Scaffolding (GITAS) model is a six-phase framework for designing AI educational interactions that produce genuine understanding. Each phase is grounded in established learning science and operationalizes a specific set of psychological and pedagogical principles.
Phase 1 β Activation: Diagnostic Knowledge Mapping
Before any instruction begins, the AI asks diagnostic questions to surface what the learner already knows about the topic β building a personalized knowledge profile. Grounded in Ausubel's (1968) principle that prior knowledge is the single most important factor in learning, this phase ensures that subsequent guidance connects to existing knowledge structures. AI identifies gaps, misconceptions, and anchors without revealing the target concepts.
Phase 2 β Productive Struggle: Challenge Before Instruction
The AI presents a challenge problem before any formal instruction β deliberately allowing the learner to attempt, fail, and generate hypotheses without assistance. Grounded in Kapur's (2016) productive failure research, this phase activates prior knowledge, reveals the limits of current understanding, and creates the cognitive problem space that makes subsequent guidance maximally meaningful. The AI withholds answers during this phase, providing only encouragement and process prompts.
Phase 3 β Guided Discovery: Socratic Scaffolding Within the ZPD
The Socratic core of the framework. The AI uses calibrated questions within Vygotsky's zone of proximal development β never providing the answer, only guiding toward it through progressively more targeted questions. The AI tracks the learner's reasoning, identifies where understanding breaks down, and constructs questions that illuminate the specific conceptual gap. This phase replicates the most valuable function of expert one-on-one tutoring.
Phase 4 β Consolidation: Self-Explanation and Edge-Case Testing
After the learner arrives at understanding through guided discovery, the AI prompts self-explanation β asking the learner to articulate the concept in their own words, identify its boundaries, and apply it to novel edge cases. Grounded in Chi and Wylie's (2014) ICAP framework and Chi's (2009) constructive engagement research, self-explanation moves the learner from passive reception to active knowledge construction and significantly improves retention and transfer.
Phase 5 β Retrieval and Transfer: Spaced Practice in Novel Contexts
The AI schedules retrieval practice at calibrated intervals β reintroducing the concept in progressively varied contexts that require transfer rather than rote recall. Grounded in Roediger and Karpicke's (2006) testing effect research and Cepeda et al.'s (2006) spaced repetition findings, this phase converts short-term understanding into durable long-term knowledge. Barnett and Ceci's (2002) taxonomy of far transfer informs the selection of novel application contexts.
Phase 6 β Metacognitive Reflection: Building Self-Regulated Learning
The AI guides the learner to examine their own thinking process β asking how they approached the problem, what strategies worked, where they got stuck, and what they would do differently. Grounded in Flavell's (1979) metacognition research and Dunlosky et al.'s (2013) study of effective learning techniques, this phase builds self-regulated learning capacity: the ability to monitor one's own understanding and adapt one's approach. This is the phase most directly opposed to the metacognitive laziness that answer-delivery AI produces.
Design Constraints: What AI Must Not Do
The GITAS model is defined as much by what it prohibits as by what it prescribes. The following design constraints specify the behaviors that answer-delivery AI exhibits and that guided inquiry AI must structurally prevent.
| Prohibited Behavior | Why It Harms Learning | GITAS Alternative |
|---|---|---|
| Providing direct answers to questions | Eliminates the productive struggle through which understanding is constructed (Bastani et al., 2025) | Ask questions that guide the learner toward the answer |
| Providing step-by-step worked solutions unprompted | Produces cognitive offloading without comprehension (Dede, 2025) | Request that the learner attempt each step independently first |
| Confirming learner answers without probing understanding | Allows surface-level pattern matching to masquerade as understanding | Ask the learner to explain why the answer is correct |
| Moving to new material before current material is consolidated | Violates mastery learning principles (Bloom, 1984) | Require self-explanation and edge-case application before advancing |
| Providing encouragement based on correct answers alone | Incentivizes answer-seeking over understanding-building | Provide encouragement based on reasoning quality and engagement |
Summary Tables
| Phase | Name | Core Mechanism | Theoretical Basis | Expected Outcome |
|---|---|---|---|---|
| 1 | Activation | Diagnostic knowledge mapping | Ausubel (1968) β prior knowledge | Personalized knowledge profile; misconception identification |
| 2 | Productive Struggle | Challenge before instruction | Kapur (2016) β productive failure | Activated prior knowledge; defined problem space |
| 3 | Guided Discovery | Socratic questioning in ZPD | Vygotsky (1978); Elder & Paul (1998) | Learner-constructed understanding; conceptual depth |
| 4 | Consolidation | Self-explanation; edge-case application | Chi & Wylie (2014) β ICAP | Knowledge construction; improved retention and transfer |
| 5 | Retrieval & Transfer | Spaced retrieval in novel contexts | Roediger & Karpicke (2006); Cepeda et al. (2006) | Long-term durable knowledge; transfer capacity |
| 6 | Metacognitive Reflection | Self-regulated learning development | Flavell (1979); Dunlosky et al. (2013) | Self-regulation; metacognitive capacity; learning independence |
| Subject Area | Phase 2 Challenge Example | Phase 3 Socratic Question Example |
|---|---|---|
| Mathematics | Attempt to solve a novel problem type before any instruction | "What do you notice about the relationship between these two quantities?" |
| History | Explain why a historical event occurred using only prior knowledge | "What evidence would change your interpretation?" |
| Writing | Draft a paragraph on a topic without guidance | "What does your reader need to know that you haven't told them yet?" |
| Science | Predict the outcome of an experiment before seeing the data | "What assumption are you making about the relationship between X and Y?" |
| Language Learning | Attempt to construct a sentence using new grammar structure | "What happens to the verb when the subject changes to plural?" |
Discussion
The GITAS model described in this report is not a novel pedagogical theory. Every one of its six phases is grounded in established, replicated, widely-cited learning science that has been validated across decades of educational research. What is new is the application of these principles to the specific challenge of AI-learner interaction design β and the recognition that this application is now urgent.
AI educational tools are already being deployed at scale in classrooms, homes, and tutoring contexts worldwide. The design choice between answer-delivery and guided inquiry is not hypothetical β it is being made right now, in product development decisions, curriculum frameworks, and school purchasing agreements. The Bastani et al. (2025) findings demonstrate that these design choices have measurable, significant consequences for student learning outcomes.
The GITAS model offers a practical alternative architecture β one that uses the same AI capabilities that currently produce answer-delivery but redirects them toward the functions of expert human tutoring: diagnostic assessment, calibrated questioning, productive challenge, and metacognitive reflection. The model is subject-matter agnostic and can be implemented across educational contexts from elementary mathematics to graduate-level professional development.
The ethical dimension of this design choice deserves explicit acknowledgment. When AI is designed to answer questions rather than guide understanding, it produces a generation of learners who are dependent on AI for cognition they could otherwise develop independently. This is not a value-neutral technical decision. It is a choice about what kind of minds the next generation will have.
Conclusion
The guide, not the answer. This principle β deceptively simple, deeply grounded in learning science, and systematically violated by most current AI educational deployments β is the organizing insight of this report. The research reviewed here converges on a clear finding: AI that provides answers undermines learning. AI that asks questions, calibrates challenge, guides discovery, and builds metacognitive capacity produces learning that lasts.
The GITAS model operationalizes this principle across six phases grounded in Bloom's two-sigma problem, Vygotsky's zone of proximal development, Kapur's productive failure research, the Socratic method, self-determination theory, and the ICAP engagement framework. It specifies not only what guided inquiry AI should do but what it must structurally prevent β because the design constraints are as important as the design principles.
The companion report in this series β Reclaiming the Classroom (Report #005) β extends the GITAS framework to address the systemic challenges facing American education: historic NAEP score declines, erosion of instructional time, and the need for personalized learning pathways built around individual cognitive strengths. The question this series has been building toward is not whether AI can be a powerful educational tool. The evidence answers that question. The question is whether those who design, deploy, and govern AI in education will choose to use that power responsibly.
Sources
- Ausubel, D. P. (1968). Educational psychology: A cognitive view. Holt, Rinehart & Winston.
- Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. Psychological Bulletin, 128(4), 612β637. https://doi.org/10.1037/0033-2909.128.4.612
- Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakci, Γ., & Mariman, R. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences, 122, e2422633122. https://doi.org/10.1073/pnas.2422633122
- Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher et al. (Eds.), Psychology and the real world (pp. 56β64). Worth Publishers.
- Blasco, M., et al. (2024). The effect of Socratic chatbots on student learning. SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4898213
- Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(6), 4β16. https://doi.org/10.3102/0013189X013006004
- Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354β380. https://doi.org/10.1037/0033-2909.132.3.354
- Chi, M. T. H. (2009). Active-constructive-interactive: A conceptual framework for differentiating learning activities. Topics in Cognitive Science, 1(1), 73β95. https://doi.org/10.1111/j.1756-8765.2008.01005.x
- Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219β243. https://doi.org/10.1080/00461520.2014.965823
- Dede, C. (2025). Is AI dulling our minds? Harvard Gazette. https://news.harvard.edu/gazette/story/2025/11/is-ai-dulling-our-minds/
- DiCerbo, K. (2025). Three questions for K-12 leaders to consider amid the AI tutoring boom. K-12 Dive. https://www.k12dive.com/
- Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students' learning with effective learning techniques. Psychological Science in the Public Interest, 14(1), 4β58. https://doi.org/10.1177/1529100612453266
- Elder, L., & Paul, R. (1998). The role of Socratic questioning in thinking, teaching, and learning. The Clearing House, 71(5), 297β301. https://doi.org/10.1080/00098659809602729
- Fakour, L.-Y., & Imani, A. (2025). Socratic wisdom in the age of AI: A comparative study of ChatGPT and human tutors in enhancing critical thinking skills. Frontiers in Education, 10, 1528603. https://doi.org/10.3389/feduc.2025.1528603
- Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34(10), 906β911. https://doi.org/10.1037/0003-066X.34.10.906
- Kapur, M. (2016). Examining productive failure, productive success, unproductive failure, and unproductive success in learning. Educational Psychologist, 51(2), 289β299. https://doi.org/10.1080/00461520.2016.1155457
- Roediger, H. L., III, & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249β255. https://doi.org/10.1111/j.1467-9280.2006.01693.x
- Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68β78. https://doi.org/10.1037/0003-066X.55.1.68
- Sinha, T., & Kapur, M. (2021). When problem solving followed by instruction works: Evidence for productive failure. Review of Educational Research, 91(4), 505β542. https://doi.org/10.3102/00346543211019105
- Sweller, J., van MerriΓ«nboer, J. J. G., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31(2), 261β292. https://doi.org/10.1007/s10648-019-09465-5
- Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
- Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2), 89β100. https://doi.org/10.1111/j.1469-7610.1976.tb00381.x