Using Contrastive Learning to Improve Two-Way Reasoning in Large Language Models: The Obfuscation Task as a Case Study

📝 Paper Summary

Code Understanding LLM Evaluation Fine-tuning Techniques

The paper proposes bidirectional reasoning (performing both obfuscation and deobfuscation) as a test for true understanding and introduces Contrastive Fine-Tuning to overcome the 'cognitive specialization' that limits current models to unidirectional pattern matching.

Core Problem

Standard fine-tuning creates 'cognitive specialization,' where models learn to perform a forward task (like obfuscation) but lose or fail to develop the ability to reverse it (deobfuscation), indicating mere pattern matching rather than genuine semantic understanding.

Why it matters:

Models deployed in high-stakes software engineering require genuine semantic understanding, not just surface-level pattern replication, to be reliable and robust
Current evaluation benchmarks often fail to distinguish between sophisticated memorization of training patterns and true comprehension of underlying logic
Adversarial robustness studies show code models are brittle to simple semantic-preserving transformations, limiting their generalizability

Concrete Example: A model fine-tuned to obfuscate variable names (changing 'userIndex' to 'i') achieves 81% success, but when asked to reverse this process (deobfuscate 'i' back to 'userIndex' or a meaningful name), it fails completely (~0% success), even though the transformation is logically reversible.

Key Novelty

Bidirectional Reasoning Hypothesis & Contrastive Fine-Tuning

Proposes that true understanding implies reversibility: if a model understands a transformation (like obfuscation), it should naturally be able to reverse it (deobfuscate) without explicit training
Identifies 'cognitive specialization' as a pathology where models optimize for one direction at the expense of the other
Adapts Contrastive Fine-Tuning (CFT) from vision learning to code, using triplets (original, obfuscated, and negative examples) to force the model to learn deep semantic representations rather than surface patterns

Evaluation Highlights

Standard fine-tuning achieves ~0% success on reverse (deobfuscation) tasks despite high forward performance (>80% for some models), confirming cognitive specialization
Contrastive Fine-Tuning (CFT) enables 39-52% reverse performance (deobfuscation) across multiple models without explicit reverse training, compared to 0% for standard fine-tuning
CFT maintains forward task capabilities while unlocking bidirectional reasoning, effectively bridging the gap between pattern matching and semantic understanding

Breakthrough Assessment

8/10

Identifies a fundamental 'cognitive specialization' failure mode in LLMs and provides a successful training fix (CFT) that unlocks zero-shot reversibility. Significant for understanding vs. memorization debates.

⚙️ Technical Details

Problem Definition

Setting: Evaluating semantic understanding via reversible code transformations

Inputs: Original code snippets (Code_original) or Obfuscated code snippets (Code_transformed)

Outputs: Transformed code (Code_transformed) or Recovered original code (Code_original~)

Pipeline Flow

Dataset Generation (Original → Obfuscated pairs via Tool)
Forward Fine-Tuning (Standard vs. CFT)
Evaluation (Forward Generation & Reverse Recovery)

System Modules

Obfuscator Tool

Generate ground-truth obfuscated code for training and evaluation

Model or implementation: Java Obfuscator tool

LLM Fine-Tuner

Adapt base models to the transformation task

Model or implementation: Various (7B-15B open weights, GPT-family via API)

Iterative Auto-Correction

Refine model outputs when they fail execution or tests

Model or implementation: Same as Fine-tuned LLM

Novel Architectural Elements

Contrastive Fine-Tuning (CFT) applied to code comprehension: training on triplets (semantic-preserving, semantic-altering, and forward-obfuscation examples) to enforce bidirectional representation learning

Modeling

Base Model: Qwen2.5-7B, Qwen2.5-Coder-7B, DeepSeek-R1.5-7B, Mistral-7B, StarCoder-15B, GPT-3.5-Turbo, GPT-4.1-mini

Training Method: Contrastive Fine-Tuning (CFT) vs. Standard Fine-Tuning

Objective Functions:

Purpose: Standard Fine-Tuning (SFT).

Formally: Minimize negative log-likelihood of generating target code given input code.
Purpose: Contrastive Fine-Tuning (CFT).

Formally: Optimize utilizing triplets (positive semantic pairs, negative semantic pairs, forward task examples) to align representations.

Adaptation: LoRA (Low-Rank Adaptation) for open-source models; Provider-optimized fine-tuning for API models

Training Data:

10,000 Java programs from CodeNet for open-source models
1,000 Java programs from CodeNet for API-based models
3 distinct obfuscation types per program: Variable Renaming, Dead Code Insertion, String Encryption

Key Hyperparameters:

trainable_parameters_reduction: 10,000x (via LoRA)

Compute: Not reported in the paper

Comparison to Prior Work

vs. Standard Fine-Tuning: CFT enables reverse capability (deobfuscation) where SFT fails completely (~0%)
vs. Iterative Self-Correction: CFT addresses the root cause of semantic understanding, whereas self-correction often fixes syntax but fails to correct semantic logic in complex transformations

Limitations

Evaluation limited to Java programming language
Focuses on three specific obfuscation types; may not generalize to all code transformations
CFT details (exact loss function, triplet construction) are described conceptually but lack mathematical formulation in the provided text
Self-correction analysis suggests performance plateaus after 5 iterations

Reproducibility

Available: 10,000 Java programs selected from CodeNet, obfuscation logic via open-source tools. Missing: Explicit hyperparameters (learning rate, batch size) for the fine-tuning process, specific CFT loss formulation details in text, code repository URL not explicitly provided in text.

📊 Experiments & Results

Evaluation Setup

Code Obfuscation (Forward) and Deobfuscation (Reverse) tasks

Benchmarks:

CodeNet (subset) (Code Transformation (Java))
ConDefects (subset) (Evaluation Set (300 samples))

Metrics:

CodeBLEU (Syntactic Similarity)
Readability Score (0-1 scale)
compilation rate
execution success
test case pass rate
Statistical methodology: Spearman's correlation for relationship analysis; Chi-square tests for pattern distribution

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Standard Fine-Tuning (Forward Task) Performance: Models learn to obfuscate well, especially on simpler tasks, but struggle with algorithmic complexity.
CodeNet (Obfuscation)	Success Rate (Variable Renaming)	100.0	81.3	-18.7
CodeNet (Obfuscation)	Success Rate (String Encryption)	100.0	29.7	-70.3
Reverse Task (Deobfuscation) Performance: Standard fine-tuning results in complete failure to deobfuscate, proving cognitive specialization.
CodeNet (Reverse)	Success Rate (Reverse)	High	0	Not applicable
Contrastive Fine-Tuning (CFT) Results: CFT enables bidirectional capabilities.
CodeNet (Reverse)	Success Rate (Deobfuscation)	0	52	+52

Experiment Figures

Performance hierarchy across transformation types (Renaming > Dead Code > Encryption) and failure analysis (Wrong Output vs Crash vs Compilation Error)

Deobfuscation failure analysis showing high syntactic similarity to the *obfuscated* input rather than the original target

Main Takeaways

Cognitive Specialization is real: Fine-tuning on forward tasks (obfuscation) actively degrades the ability to reason in reverse, even for reversible logic.
Models prioritize surface-level pattern matching over semantic understanding; this is evidenced by their ability to mimic syntax (CodeBLEU) while failing execution tests.
Self-correction has limits: It helps fix syntax errors (compilation) but often leads to 'wrong outputs' (logic errors) rather than true fixes, plateauing after ~5 iterations.
Contrastive Fine-Tuning successfully mitigates cognitive specialization, allowing models to learn the underlying principle of a transformation rather than just the forward mapping.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Large Language Models and fine-tuning (LoRA)
Knowledge of code obfuscation techniques (variable renaming, dead code insertion)
Familiarity with contrastive learning concepts

Key Terms

Bidirectional Reasoning: The ability to perform both a forward transformation and its inverse (e.g., obfuscation and deobfuscation) without explicit training on the reverse direction

Cognitive Specialization: A learning pathology where training on a forward task creates a directional bias, improving forward performance while degrading reverse reasoning capabilities

Contrastive Fine-Tuning (CFT): A training method using triplets of examples (positive, negative, and anchor) to force the model to learn semantic distinctions rather than just surface patterns

Obfuscation: Transforming code to make it difficult for humans to read while preserving its computational logic (semantics)

Deobfuscation: The reverse process of obfuscation; restoring code to a readable state while maintaining its logic

CodeBLEU: A metric for evaluating code generation that considers syntactic and semantic features (like data flow) alongside n-gram matching

LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices

Dead Code Insertion: An obfuscation technique where non-functional code (code that doesn't affect the program's output) is added to confuse readers

Chain-of-Thought: A prompting strategy where the model is encouraged to generate intermediate reasoning steps before the final answer