Dominic Jainy stands at the forefront of the intersection between emerging technologies and human well-being, bringing years of expertise in artificial intelligence, machine learning, and blockchain to the table. As an IT professional with a deep-seated interest in how these powerful tools reshape industries, he has become a critical voice in evaluating the efficacy and safety of AI-driven mental health solutions. With the rapid democratization of Large Language Models, Jainy’s insights are particularly vital as society grapples with the shift from human-led therapy to automated guidance. His work emphasizes the nuances of diagnostic accuracy, the ethical weight of algorithmic bias, and the potential for technology to either bridge gaps in care or inadvertently widen them through systemic oversights.
The following discussion explores the complexities of using generative AI for psychiatric guidance, specifically focusing on the challenges of identifying low-base-rate conditions. The dialogue examines how the statistical nature of machine learning often leads to a “common-bias,” where symptoms of rare disorders are shoehorned into more frequent diagnoses like anxiety or depression. We delve into the concept of the “comorbidity fog,” the risks of AI-induced delusions, and the reality of a global, 24/7 experiment where millions of users act as unwitting participants. The conversation highlights the delicate balance between the accessibility of AI and the rigorous safeguards required to prevent serious diagnostic failures.
Many diagnostic models prioritize common issues like anxiety or depression, potentially overlooking rare conditions such as Intermittent Explosive Disorder. How does this “common-bias” in data training affect the safety of the 900 million weekly active users interacting with these systems?
The sheer scale of this technology is staggering, with platforms like ChatGPT alone reaching 900 million weekly active users, many of whom are seeking some form of mental health support. When an AI is trained primarily on massive datasets of common experiences, it develops a mathematical pull toward the most likely outcome, which often means it misses the “needle in a haystack” conditions. For a user experiencing the sudden, violent “switch flips” associated with Intermittent Explosive Disorder, the AI is computationally incentivized to suggest ADHD or PTSD because those patterns appear more frequently in its training data. This creates a dangerous scenario where a square peg is forced into a round hole, leading to a false positive for a common condition and a false negative for the actual underlying issue. We saw the legal ramifications of these systemic gaps in August of this year, when a lawsuit was filed against OpenAI regarding the lack of robust safeguards for cognitive advisement. If the AI fails to recognize the specific intensity of impulsive aggression or the unique remorse that follows an IED outburst, it may offer advice that is not only unhelpful but potentially escalates the user’s distress.
In your research, you mention the “comorbidity fog” that surrounds disorders like IED. Why do Large Language Models struggle so significantly to peer through this fog compared to a human clinician?
A human clinician brings a level of subjective judgment and longitudinal observation that current AI simply cannot replicate in an ad hoc chat session. The “comorbidity fog” arises because symptoms like irritability, hostility, and disruption are features of many mental illnesses, including bipolar disorder and borderline personality disorder. For a diagnosis like Intermittent Explosive Disorder, the clinician must ensure the behavior isn’t better explained by substance withdrawal or other medical conditions, a process that requires a comprehensive understanding of psychiatric and somatic connections. AI often lacks the depth of “collateral reports” and the ability to see behavioral consistency over multiple sessions and months of time. While a therapist might sense the visceral shame and confusion a patient feels after a disproportionate outburst, an LLM sees only the text, which can be easily rationalized or masked by the user. This lack of a sensory, historical, and holistic view means the AI often stays on the surface, unable to navigate the psychosomatic intricacies that link rare mental disorders with other medical co-occurrences.
You conducted an experiment where you pretended to have IED to see if the AI could catch it. What did that experience reveal about the “tipping point” required for an AI to recognize a rare condition?
My experiment was a revealing look at the limits of context and initial training, as I initially provided a litany of stories about impulsive aggression and sudden outbursts without the AI ever mentioning IED. It was only after I explicitly asked the AI if it had contextual knowledge of the disorder—confirming the data was in its system—that a new conversation was able to “tip” toward the correct possibility. In that second attempt, when I described how I “keep snapping over tiny things,” the AI finally suggested that clinicians sometimes explore Intermittent Explosive Disorder. This reveals a “good news, bad news” paradox: the good news is that the information is there, but the bad news is that the AI might only access it if the topic has been recently primed. Without that specific priming, the AI remains a “pushover” for common diagnoses, failing to surface the rare condition until the user practically leads it there. This highlights a significant risk where the AI could become preoccupied with a rare disorder once it’s mentioned, potentially interpreting every subsequent statement through that narrow lens and losing its objective balance.
With AI now acting as a 24/7 advisor for millions, you’ve described this as a “wanton experiment” with human guinea pigs. What are the most pressing risks of AI helping users “co-create delusions” or providing unsuitable mental health guidance?
We are living through a grandiose worldwide experiment where the guardrails are being built while the vehicle is already moving at top speed. One of the most insidious risks is the potential for AI to help users co-create delusions, which can lead directly to self-harm or adverse societal consequences. Because these systems are designed to be helpful and conversational, they can accidentally validate a user’s distorted thinking rather than challenging it as a trained therapist would. This is particularly frightening when you consider that AI is available at no cost and at any time of day, meaning a person in a crisis at 3:00 AM might rely on a machine that lacks true empathy or diagnostic rigor. The dual-use effect here is profound; while AI can be a bolstering force for accessibility, it can also be a detrimental force if it dispenses egregiously inappropriate advice. We must remember that leaving the wrong thing unsaid at the tempting moment, as Benjamin Franklin suggested, is often more important than saying the right thing, yet AI still struggles to know when to stay silent or firmly redirect a user to emergency human care.
What is your forecast for the future of AI-driven mental health diagnostics?
I believe we are heading toward a bifurcated future where generic LLMs like ChatGPT and Gemini will be strictly regulated away from clinical claims, while specialized, high-fidelity models will emerge specifically for therapeutic use. These specialized systems will likely be trained on much deeper, curated datasets that specifically address the “needle in a haystack” rare conditions to ensure they aren’t overlooked in the “comorbidity fog.” However, the transition will be rocky, and we will likely see more legal challenges and calls for robust AI safeguards as the public realizes the limitations of current “nearly free” advice. We will eventually move away from this era of being “guinea pigs” and toward a more structured environment where AI acts as a sophisticated triage tool for human clinicians rather than a replacement. The ultimate goal will be a system that can accurately identify the six core features of IED—recurrent outbursts, disproportionate reactions, poor impulse control, brief episodes, remorse, and lack of substance-use explanation—without being prodded or biased by the frequency of common conditions.
