CORE-Acu: Structured Reasoning Traces and Knowledge Graph Safety Verification for Acupuncture Clinical Decision Support

📝 Paper Summary

Clinical Decision Support (CDS) Traditional Chinese Medicine (TCM) AI Neuro-symbolic AI

CORE-Acu combines structured chain-of-thought fine-tuning with a knowledge-graph-based symbolic veto mechanism to enforce strict safety constraints and eliminate hallucinations in acupuncture clinical decision support.

Core Problem

General LLMs act as black boxes that bypass critical diagnostic logic and often hallucinate safety-critical entities (like acupoints) or violate contraindications (like pregnancy restrictions), posing severe patient risks.

Why it matters:

Acupuncture involves invasive physical interventions where minor terminology errors (e.g., confusing acupoint names) can lead to medical malpractice
Standard LLMs optimized for next-token prediction lack the deterministic constraints needed to adhere to strict medical 'red lines' (contraindications)
Existing TCM models often learn direct symptom-to-prescription mappings, missing the auditable intermediate reasoning required for physician trust

Concrete Example: In a pregnancy-related case, a standard model might prescribe Hegu (LI4)—an acupoint strictly contraindicated because it promotes uterine activity—increasing the risk of adverse outcomes.

Key Novelty

Neuro-Symbolic Governance with Structured Reasoning

Constructs a Structured Chain-of-Thought (S-CoT) dataset that forces the model to output a complete causal chain (Diagnosis → Pathology → Principle → Acupoints) rather than just a prescription
Implements a 'Generate-Verify-Revise' loop where a symbolic Knowledge Graph checks outputs against deterministic safety rules (e.g., pregnancy contraindications) and forces revisions if violations occur
Uses Lexicon-Matched Entity-Reweighted Loss (LMERL) to amplify gradient signals for rare but critical acupoint names, preventing them from being drowned out by common words during training

Architecture

The overall framework of CORE-Acu, illustrating the construction, adaptation, and verification lifecycle.

Evaluation Highlights

Achieved 0/1,000 observed safety violations (0% rate) on held-out cases, compared to an 8.5% violation rate for GPT-4o
Constructed 'Acu-Reasoning', the first large-scale acupuncture S-CoT dataset with 42,512 samples containing explicit causal chains
Built a specialized TCM safety Knowledge Graph with 4,628 nodes and over 1,200 explicit constraint edges (e.g., ProhibitedFor)

Breakthrough Assessment

8/10

Strong neuro-symbolic application that effectively solves the critical 'safety boundary' problem in generative medical AI, achieving 0% violations where SOTA models fail.

⚙️ Technical Details

Problem Definition

Setting: Controlled generation of acupuncture prescriptions subject to hard safety constraints

Inputs: Clinical patient record (symptoms, complaints)

Outputs: Structured reasoning trace (Diagnosis, Pathology, Principle) and safe Acupoint prescription

Pipeline Flow

Structured Reasoning Generation (Model predicts Diagnosis → Pathology → Principle → Acupoints)
Symbolic Safety Verification (KG checks for contraindications)
Iterative Revision (If unsafe, feedback is injected for re-generation)
Conservative Fail-Safe (If max iterations reached, suppress output)

System Modules

S-CoT Generator

Generate the full clinical reasoning chain and prescription

Model or implementation: Fine-tuned LLM (base model architecture not explicitly named in paper)

Safety Governance Module

Detect safety violations by intersecting generated entities with the Knowledge Graph constraints

Model or implementation: Symbolic Knowledge Graph (4,628 nodes, 12,500 edges)

Revisor / Feedback Loop

Inject corrective prompts containing specific error evidence back into the model

Model or implementation: Rule-based feedback injection

Novel Architectural Elements

Symbolic Veto Mechanism: A deterministic logical intersection operation that overrides probabilistic neural generation
Generate-Verify-Revise closed loop: Dynamic inference cycle that allows the model to self-correct based on external symbolic feedback

Modeling

Base Model: Not explicitly reported in the paper (mentions 'fine-tuned model M_theta' generally)

Training Method: Schema-Constrained Fine-Tuning with LoRA

Objective Functions:

Purpose: Minimize prediction error while heavily penalizing mistakes on safety-critical entities.

Formally: L_LMERL(Theta) = - (1/Z(y)) * sum(omega(y_t) * log P(y_t | y_<t, x)), where omega(y_t) boosts weights for domain terms.

Adaptation: LoRA (Low-Rank Adaptation)

Training Data:

Acu-Reasoning Dataset: 42,512 samples
Constructed via GPT-5.2 (foundation model) to infer missing logic, followed by expert verification

Key Hyperparameters:

alpha: 1.5 (domain bias intensity in LMERL)

Compute: Not reported in the paper

Comparison to Prior Work

vs. HuatuoGPT/Zhongjing: CORE-Acu focuses on procedural acupuncture safety and structured reasoning chains rather than general dialogue or herbal medicine
vs. Standard RAG: Uses KG for hard constraint verification (veto) rather than just evidence retrieval
vs. Generic LLMs (GPT-4o): Implements a neuro-symbolic loop to enforce 0% violation rates, whereas generic models rely on probabilistic safety [not cited in paper as direct baseline, but used as comparison]

Limitations

Reliance on the completeness of the constructed Knowledge Graph; missing rules in the KG cannot be enforced
Requires high-quality structured data for the S-CoT training, which is labor-intensive to construct
Evaluation focused on 1,000 held-out cases; broader clinical generalization remains to be tested

Reproducibility

Acu-Reasoning dataset (42,512 samples) and TCM Safety Knowledge Graph (4,628 nodes) are described in detail. Code URL is not provided in the text. Base model architecture is not specified.

📊 Experiments & Results

Evaluation Setup

Acupuncture prescription generation for TCM cases

Benchmarks:

Held-out Clinical Cases (Clinical Decision Support generation) [New]

Metrics:

Safety Violation Rate
Entity Fidelity (implied by LMERL discussion)
Statistical methodology: 95% Confidence Interval (CI) reported for safety violation rate

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Held-out Clinical Cases (n=1,000)	Safety Violation Rate	8.5%	0%	-8.5%

Main Takeaways

CORE-Acu eliminates observed safety violations (0%) compared to a significant error rate (8.5%) in a powerful general LLM (GPT-4o), validating the neuro-symbolic approach.
The 'Generate-Verify-Revise' loop effectively converts probabilistic outputs into deterministically safe recommendations.
The LMERL loss function addresses the frequency-importance mismatch, ensuring the model focuses on rare but critical acupoint names.

📚 Prerequisite Knowledge

Prerequisites

Traditional Chinese Medicine (TCM) diagnostic workflow
Knowledge Graphs (KG) and symbolic reasoning
Large Language Model fine-tuning (LoRA)

Key Terms

S-CoT: Structured Chain-of-Thought—a prompting or training method that enforces a specific step-by-step reasoning format (e.g., Diagnosis → Pathology → Principle → Treatment)

TCM: Traditional Chinese Medicine—a medical system emphasizing syndrome differentiation (bianzheng) and individualized treatment

KG: Knowledge Graph—a structured representation of data with nodes (entities) and edges (relations), used here to store hard medical rules

LMERL: Lexicon-Matched Entity-Reweighted Loss—a custom training loss that assigns higher weight to domain-specific terms (like acupoint names) to improve precision

Symbolic Veto Mechanism: A rule-based system that checks neural model outputs against a Knowledge Graph and rejects/blocks unsafe generations

LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that updates only a small subset of model parameters

Hegu (LI4): A specific acupoint known to induce labor, making it strictly contraindicated during pregnancy

Neuro-symbolic: AI systems combining neural networks (learning from data) with symbolic logic (rules and knowledge graphs)

Syndrome Differentiation: The TCM process of analyzing symptoms to identify the underlying pattern of disharmony (Diagnosis)