_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
Knowledge Tracing: The task of modeling a student's changing knowledge state over time to predict future performance on exercises.
XGBoost: eXtreme Gradient Boosting—a scalable machine learning system for tree boosting that uses a regularized objective function to prevent overfitting.
DKT: Deep Knowledge Tracing—a method using Recurrent Neural Networks (RNNs) or LSTMs to model student knowledge states as a dynamic time-series.
AUC: Area Under the Receiver Operating Characteristic Curve—a performance metric for classification problems indicating how well the model distinguishes between classes.
AutoInt: Automatic Feature Interaction—a deep learning model that automatically learns high-order feature interactions using a multi-head self-attentive neural network.
FM: Factorization Machines—a supervised learning algorithm that models interactions between variables using factorized parameters, good for sparse data.
DeepFM: A model combining Factorization Machines for low-order feature interactions and deep neural networks for high-order interactions.
Logloss: Logarithmic Loss—a loss function used in binary classification that penalizes confident but wrong predictions.
BKT: Bayesian Knowledge Tracing—a classic statistical model using Hidden Markov Models to track binary knowledge states (learned/not learned).