The widespread adoption of artificial intelligence (AI) in scientific research presents a hidden danger: current models are demonstrably unreliable when it comes to basic laboratory safety. A recent study reveals that even the most advanced AI systems routinely fail to identify critical hazards, raising concerns about potential accidents in research environments. This isn’t a theoretical risk; laboratory incidents, though rare, do happen, with past tragedies including fatalities and severe injuries due to overlooked chemical or procedural dangers.
The core problem lies in the nature of these AI models. While capable of impressive feats like drafting emails or summarizing papers, they lack the specialized knowledge needed to assess real-world physical risks. Researchers at the University of Notre Dame developed a benchmark called LabSafety Bench to test this, using 765 multiple-choice questions and 404 visual scenarios involving lab hazards. The results were alarming: none of the 19 cutting-edge AI models tested achieved over 70% accuracy. Some performed no better than random guessing.
The Problem with General-Purpose AI
The issue isn’t that AI can’t assist in science; it’s that current large language models (LLMs) were not designed for the precision required in hazardous environments. They excel at general tasks but stumble when applied to domains like chemistry, where a single mistake can have catastrophic consequences. For example, when asked about handling sulfuric acid spills, some AI models incorrectly advised against rinsing with water – a fatal error based on a misapplication of heat-related warnings from other contexts.
Rapid Improvement, But Still Risky
The good news is that AI is improving quickly. Some models, like OpenAI’s GPT-5.2, show significantly better reasoning skills and error detection than earlier versions. However, even the most advanced systems are not yet reliable enough for unsupervised use in labs. Experts agree that humans must remain firmly in control, providing oversight and scrutiny. One researcher at UCLA noted that AI performance is already improving month-to-month, suggesting current studies may soon be outdated.
The Human Factor Remains Crucial
While AI may eventually surpass some inexperienced researchers in safety awareness, the immediate danger isn’t just the models themselves. The bigger problem is human overreliance on these systems. As AI becomes more integrated, there is a risk that researchers will become complacent, delegating critical thinking to machines without proper validation. This highlights the need for stricter safety protocols and continuous training, especially for new students with limited experience.
Ultimately, AI’s potential in science is undeniable, but unchecked deployment in high-risk environments remains a dangerous gamble. Until these models can consistently demonstrate reliable hazard identification, human oversight must remain the primary safeguard.

























