MHINDR -- a DSM5 based mental health diagnosis and recommendation framework using LLM

📝 Paper Summary

User modeling Mental Health NLP Clinical Decision Support

MHINDR aggregates unstructured social media history into clinical profiles by separating temporal symptom progression from psychological features to generate DSM-5 aligned diagnoses and recommendations.

Core Problem

Social media data is noisy and unstructured, making it difficult to extract the temporal context (duration, frequency) required for accurate clinical diagnosis according to standardized criteria like DSM-5.

Why it matters:

Mental health professionals lack tools to efficiently process vast patient-generated text for insights
Existing methods often classify posts into disorders without capturing the temporal dynamics (e.g., how long symptoms have persisted) crucial for distinguishing between transient distress and clinical disorders
Subjective manual judgments and limited data integration hinder personalized treatment planning

Concrete Example: A user might post about 'feeling down' and 'losing sleep' in separate posts months apart. A standard sentiment classifier might just label these as 'negative', missing the progression. MHINDR aggregates these to identify a '6-month duration' of symptoms (temporal), maps them to 'insomnia' and 'depressive mood' (DSM-5), and suggests a diagnosis of Major Depressive Disorder.

Key Novelty

Dual-Stream Clinical Profiling (MHINDR)

Separates feature extraction into 'Non-temporal' (symptoms, triggers, tone) and 'Temporal' (duration, frequency, recurrence) streams to explicitly capture the time-dimension required by DSM-5
Aggregates fragmented posts into a cohesive user chronology before feeding them to an LLM for final diagnosis, ensuring the model sees the full progression of the condition

Architecture

The MHINDR framework workflow from data ingestion to final recommendation.

Evaluation Highlights

Generated comprehensive temporal summaries for 92.46% of users, successfully aggregating sparse time clues despite only 10.65% of individual posts containing explicit temporal references
Identified Cognitive Behavioral Therapy (CBT) as the appropriate intervention for 92.5% (185/200) of profiled users based on automated DSM-5 analysis
Categorized 67.4% of analyzed social media entries as 'Severe' and 19.4% as 'Moderate', demonstrating the framework's ability to stratify risk levels without human intervention

Breakthrough Assessment

6/10

Proposes a solid framework for integrating DSM-5 criteria with LLMs, specifically addressing the temporal aspect of diagnosis. However, the evaluation lacks ground-truth validation against human clinicians, limiting claims of diagnostic accuracy.

⚙️ Technical Details

Problem Definition

Setting: Automated mental health profiling and recommendation from user-generated text

Inputs: Sequence of user posts and comments with timestamps

Outputs: Comprehensive mental health summary, DSM-5 diagnosis, and personalized therapeutic/behavioral recommendations

Pipeline Flow

Data Pre-processing: Clean → Filter (LLM)
Feature Extraction: Temporal Extraction || Non-temporal Extraction
Aggregation: User Profiling
Clinical Inference: Diagnosis → Recommendation

System Modules

Data Filtering

Filter out content not genuinely relevant to mental health using a binary classification prompt

Model or implementation: Llama 3.1

Non-Temporal Feature Extractor (Feature Extraction)

Extract static psychological features: Severity, Causal Factors, Language/Tone, DSM-5 Classifications

Model or implementation: Llama 3.1

Temporal Feature Extractor (Feature Extraction)

Extract time-related information: explicit timestamps and in-text references to duration or frequency

Model or implementation: Llama 3.1

User Aggregator

Consolidate all posts and extracted features for a single user into a chronological profile

Model or implementation: Rule-based aggregation

Diagnostician & Recommender

Generate final DSM-5 diagnosis and suggest therapies/behavior changes based on the aggregated profile

Model or implementation: Llama 3.1

Novel Architectural Elements

Dual-stream summarization architecture that processes temporal and non-temporal mental health features in parallel before final integration
Explicit integration of LLM-extracted in-text temporal cues with metadata timestamps to construct clinical timelines

Modeling

Base Model: Llama 3.1

Compute: Inference only via Groq API (sub-second processing reported)

Comparison to Prior Work

vs. Illuminate: Explicitly extracts temporal features (duration, frequency) to align with DSM-5 duration criteria vs. general summarization
vs. MentSum: Generates structured clinical outputs (Diagnosis, Therapy) vs. general text summaries
vs. Standard Sentiment Analysis [not cited in paper]: Maps extracted features to specific DSM-5 disorders (e.g., Major Depressive Disorder) rather than generic negative sentiment scores

Limitations

Lack of ground truth validation: Diagnoses were generated by the LLM and analyzed for distribution, but not verified against human expert assessments.
Dependence on self-reported data: relies entirely on user honesty and clarity in social media posts, which may be exaggerated or incomplete.
High rate of 'No Timeline' data: Only 10.65% of individual posts contained explicit temporal references, heavily relying on the 92% user-level aggregation success.
Ethical risks: Automated diagnosis from public data raises privacy and safety concerns, especially regarding self-harm detection.

Reproducibility

No replication artifacts mentioned in the paper. Code, prompt templates, and specific model weights (beyond 'Llama 3.1') are not provided. The dataset was sourced from public Reddit forums (r/mentalhealth, r/depression) using Pushshift API.

📊 Experiments & Results

Evaluation Setup

Descriptive analysis of automated diagnoses generated for active Reddit users

Benchmarks:

Reddit Mental Health Dataset (Custom) (Clinical Profiling & Diagnosis) [New]

Metrics:

Prevalence of Severity Levels (Mild/Moderate/Severe)
Distribution of DSM-5 Disorders
Therapy Recommendation Frequency
Temporal Extraction Coverage
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

The framework successfully generated temporal summaries for 92.46% of users, overcoming the sparsity of time-related information in individual posts (only 10.65% had explicit cues).
Automated analysis classified a majority of the dataset (67.4%) as 'Severe', suggesting that social media forums are primarily used by individuals in significant distress.
Cognitive Behavioral Therapy (CBT) was the most universally recommended intervention (92.5% of users), reflecting its broad applicability in automated guidelines.
The system identified Major Depressive Disorder and Borderline Personality Disorder as the most frequent conditions, aligning with common themes in the source subreddits (r/depression).

📚 Prerequisite Knowledge

Prerequisites

Understanding of Large Language Models (LLMs) and prompting
Familiarity with DSM-5 diagnostic criteria
Basics of Natural Language Processing (NLP) for social media

Key Terms

DSM-5: Diagnostic and Statistical Manual of Mental Disorders, 5th Edition—the standard classification of mental disorders used by mental health professionals in the U.S.

CBT: Cognitive Behavioral Therapy—a psycho-social intervention that aims to improve mental health by changing unhelpful cognitive distortions and behaviors

DBT: Dialectical Behavior Therapy—a type of cognitive behavioral therapy that tries to identify and change negative thinking patterns and pushes for positive behavioral changes

Non-temporal features: Static psychological aspects extracted from text, such as specific symptoms (e.g., anxiety), triggers (e.g., job loss), and emotional tone

Temporal features: Time-related aspects of mental health, including the duration of symptoms, frequency of episodes, and recurrence patterns, which are essential for clinical diagnosis

Prompt engineering: The process of structuring text that can be interpreted and understood by a generative AI model