When your AI tutor doesn’t know it’s wrong, ETEducation

By Wei Xing

A student in my machine learning class chose a neural network for their final project. The reasoning seemed sound: the AI they had consulted explained that neural networks handle complex, non-linear patterns well, and provided working code to support the recommendation. The model ran without errors. The accuracy numbers looked convincing. What neither the student nor the AI flagged was that the dataset contained only 300 samples. A logistic regression would have been more appropriate, more interpretable, and likely just as accurate. When asked in the project review to justify the model choice, the student cited the AI’s recommendation without hesitation. They were not being evasive. They genuinely believed they had made an informed decision, because nothing in their process had suggested otherwise.

The problem is not that AI makes mistakes

By 2026, saying that AI systems produce errors is not news. Every educator, student, and policymaker already knows this in some abstract sense. The more specific and troubling reality is that AI errors and correct answers are delivered in exactly the same tone, with exactly the same confidence, in exactly the same format. A recent large-scale study by the BBC and EBU found that 45% of AI responses contained at least one significant inaccuracy. The problem is that users have no way to identify which 45% from the output itself. When a student receives a fluent, well-structured response, there is no signal indicating whether that response belongs to the accurate half or the inaccurate one.This is not purely a technical limitation. A 2025 OpenAI research paper demonstrated mathematically that AI benchmarks actively punish models for expressing uncertainty: nine out of ten major evaluation systems award zero points when an AI says, “I don’t know,” the same score as a completely wrong answer. When the training incentive is always to guess, the result is a system that has learned to project confidence regardless of whether that confidence is warranted.

Why education is the most dangerous context

The confidence problem matters more in education than in most other settings, because good teaching and persuasive AI outputs are optimised for opposite outcomes. A skilled teacher, when a student reaches a wrong conclusion, will often slow the process down rather than resolve it immediately. The point of pausing at a difficult step is not to deliver the answer but to make the confusion productive, because confusion that resolves through effort produces understanding that sticks. AI tutoring tools are optimised to resolve confusion immediately, because immediate resolution is what users rate highly and what drives engagement.The consequence of this mismatch is subtle but serious. In the same course, a student submitted code that ran cleanly and returned strong accuracy numbers. The implementation contained a quiet error: data normalisation had been applied before the train-test split, a mistake that inflated the results without triggering any warning. The AI that helped write the code had no mechanism to flag that this particular step was worth pausing on. It produced working code in the same register it always uses. When an exam question later asked the student to explain why the order of those operations matters, they could not answer. Not because they had been careless, but because their process had never required them to stop and think about it. A Duke University survey found that 90% of students using AI tools wanted those tools to be more transparent about their own limitations. Students already sense the problem. The tools have not caught up.

A problem of scale, not just individual practice

When one student is misled by an AI recommendation, the situation is recoverable. When the same tool is used by millions of students making the same query in the same week, the error propagates at a scale that no individual teacher can correct. India’s National Education Policy 2020 has accelerated the integration of digital tools across universities and schools, often faster than curricula can adapt to the risks involved. For first-generation university students who lack family or peer networks to cross-check what they are learning, an AI tutor that presents wrong information with authority can go unchallenged for an entire semester. This is not a question of digital literacy alone. It is a question of whether the tools being deployed in classrooms were designed with learning outcomes in mind, or with user retention metrics.

What educators and policymakers can do now

The technical capacity to build AI systems that express calibrated uncertainty already exists and has been documented in the research literature for decades. The reason consumer AI tools do not express uncertainty is economic rather than technical: confident answers retain users, and uncertain answers do not. Waiting for AI companies to solve this voluntarily is not a realistic strategy for educators working under NEP 2020 or any other framework that has already committed to AI integration.Teachers can address part of this directly by helping students understand that different types of AI queries carry different levels of risk. Asking an AI to retrieve a well-documented fact is different from asking it to reason through a novel problem, and students who understand this distinction are better placed to apply appropriate scepticism. At the institutional level, universities and schools procuring AI tools should require that accuracy and uncertainty communication have been evaluated in actual learning contexts, not only in consumer benchmarks designed to reward confident answers.

The thing AI cannot yet model

A good teacher knows when to say: I am not certain about this, let us work through it together. That sentence is itself an act of teaching. It demonstrates that uncertainty is a normal part of reasoning and shows students what intellectual honesty looks like in practice. AI tutoring tools do not yet do this reliably. Until they do, the most important skill we can give students is the ability to recognise the difference between a confident answer and a correct one.

Wei Xing is a Lecturer in Mathematics, School of Mathematical and Physical Sciences, University of Sheffield, UK.

DISCLAIMER: The views expressed are solely of the author and ETEDUCATION does not necessarily subscribe to it. ETEDUCATION will not be responsible for any damage caused to any person or organisation directly or indirectly.