Language Models Exhibit Emergent Abilities in Complex Reasoning Tasks
Jason Wei, Yi Tay, Rishi Bommasani, et al.
We investigate the emergence of abilities in language models, finding that certain complex reasoning capabilities appear suddenly at specific model scales rather than gradually improving. This suggests fundamental phase transitions in AI capabilities that could lead to rapid improvements in performance across many domains.
AI Meets the Classroom: When Do Large Language Models Harm Learning?
Matthias Lehmann, Philipp B. Cornelius & Fabian J. Sting (2025)
Why the Study Matters
Educators debate whether large‑language‑model (LLM) tools such as ChatGPT help or hinder real learning. Prior studies show mixed results, often ignoring how students actually use the AI. This paper asks: When do LLMs substitute for, and when do they complement, meaningful study—and with what consequences?
Research Design at a Glance
- Two pre‑registered, incentivized lab experiments (coding tasks) compare students with and without GPT‑4 access.
- Field study tracks a university programming course during sudden campus‑wide LLM availability.
- Usage data (prompts, copy‑paste activity) allow the authors to classify substitutive vs. complementary behavior.
Key Findings
| Theme | What Happens? |
|---|---|
| Average effect | Across the whole sample, LLM access does not change total learning gains. |
| Substitution | Students cover more topics but understand each one less. |
| Complementarity | Topic volume unchanged, depth of understanding rises. |
| Equity impact | LLMs widen the gap: students with lower prior knowledge learn less when allowed to rely on LLMs. |
| Copy‑paste affordance | When copy‑paste is enabled, students request “full solutions” far more often, fueling substitution and longer‑term decline. |
| Perceived vs. actual learning | Access inflates students’ sense of how much they’ve learned beyond measured gains. |
Practical Takeaways for Instructors
- Guide the usage mode. Frame LLMs explicitly as explainers, not answer‑generators.
- Disable or limit copy‑paste during formative work to discourage shortcutting.
- Extra scaffolding for novices. Lower‑prepared students need structured prompts or human feedback to avoid superficial learning.
- Monitor metacognition. Pair AI support with reflective checks so students calibrate their self‑assessment.
Contributions to the Debate
- Clarifies why prior studies reached opposite conclusions: the behavioral pathway (substitute vs. complement) determines the outcome.
- Introduces a two‑dimensional view of learning—topic volume and topic understanding—as a lens for evaluating educational technology.
Limitations & Future Work
- Lab tasks focused on programming; effects may differ in concept‑driven disciplines.
- Field data observed only substitutive use; complementary scenarios need real‑class validation.
- Future research should test interface nudges, prompt‑engineering lessons, and longer semesters to see if complementary use can close (rather than widen) equity gaps.
Bottom line: LLMs are neither panacea nor poison; they magnify whatever study habits students bring to them. Design learning environments that channel AI toward explanation and reflection, not quick fixes, to unlock their real educational value.