ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts

📝 Paper Summary

Smart Contract Security Automated Program Repair (APR) LLM for Code

ContractTinker repairs complex smart contract vulnerabilities by decomposing the task via Chain-of-Thought reasoning and grounding the LLM with static analysis (dependency graphs and program slicing).

Core Problem

Existing repair tools rely on predefined patterns that fail on high-level business logic bugs, while standard LLMs suffer from hallucinations and lack context when repairing complex real-world contracts.

Why it matters:

Smart contracts manage significant financial assets, making them high-value targets for attackers
Real-world vulnerabilities often involve complex business logic (e.g., price manipulation) rather than simple low-level bugs like re-entrancy
Manual repair is labor-intensive and requires deep security expertise that many developers lack

Concrete Example: A contract might have a price manipulation vulnerability where a function calculates an asset price insecurely. Pattern-based tools miss this because it's logic-specific. A standard LLM might try to fix it but hallucinate variables not present in the code or miss dependencies in other contracts.

Key Novelty

Context-Aware Chain-of-Thought for Repair

Decomposes the repair process into steps simulating a security expert: Attack Analysis -> Strategy Generation -> Code Patching
Injects static analysis results (Dependency Graphs, Program Slices) at each reasoning step to ground the LLM's logic in the actual codebase structure

Architecture

Workflow of ContractTinker: From Audit Report/Project input -> Dependency Analysis -> Vulnerability Localization -> CoT Patch Generation -> Refinement.

Evaluation Highlights

Repairs 23 out of 48 (48%) high-risk real-world vulnerabilities with valid patches
Generates patches requiring only minor modifications for an additional 10 vulnerabilities (21%)
Achieves a high success rate in generating correct mitigation strategies before coding

Breakthrough Assessment

7/10

Effective integration of static analysis with LLM CoT for a hard domain (smart contracts). Sample size (48) is small but realistic for this domain due to data scarcity.

⚙️ Technical Details

Problem Definition

Setting: Given a smart contract project and a vulnerability audit report, generate a valid code patch that fixes the vulnerability.

Inputs: Project source code, Audit Report

Outputs: Patched smart contract code

Pipeline Flow

Input Processing (Parsing Report & Project Analysis)
Vulnerability Localization (Slicing & Graph Construction)
CoT Patch Generation (Attack Analysis → Strategy → Code)
Refinement (Validation & Feedback)

System Modules

Program Analyzer

Constructs dependency graph (DG) using Slither (Call Graph, AST, Data Flow)

Model or implementation: Slither (Static Analysis Tool)

Vulnerability Localizer

Extracts vulnerability scope using audit report info and program slicing

Model or implementation: LLM (GPT-4/3.5) for entity extraction + Static Slicing

Patch Generator (CoT)

Generates patch via multi-step reasoning (Attack Analysis -> Strategy -> Patch)

Model or implementation: GPT-4 / GPT-3.5

Patch Refiner

Validates and refines the generated patch

Model or implementation: LLM (Validator) + Compilation Checker

Novel Architectural Elements

Contextual Dependency Graph (CDG) construction guided by audit report entities to prune irrelevant code before LLM context
Interleaved Static Analysis and CoT: Static analysis results (slices/graphs) are injected specifically into relevant reasoning steps (e.g., Q2, Q3) rather than just once at the start

Modeling

Base Model: GPT-4 and GPT-3.5

Compute: Not reported in the paper (Inference-only approach using APIs)

Comparison to Prior Work

vs. SGuard/EVMPatch: ContractTinker addresses high-level functional/business logic vulnerabilities, whereas baselines focus on low-level bugs (re-entrancy, overflow)
vs. Standard LLM (Zero-shot): ContractTinker uses CoT + Static Analysis to reduce hallucination and improve context [not cited in paper as specific baseline, but implied comparison]
vs. SCRepair: Does not rely on manually created unit tests for search guidance

Limitations

Cannot repair vulnerabilities where business logic is extremely complex or description is too vague
Dependency on the quality of the audit report text
Static analysis (Slither) limitations may propagate to the LLM context
Dataset size (48) is relatively small due to the manual effort of collection

Reproducibility

Code: https://github.com/CheWang09/LLM4SMAPR

Code and dataset are publicly available at https://github.com/CheWang09/LLM4SMAPR. Uses Slither for static analysis. Prompts are open-sourced.

📊 Experiments & Results

Evaluation Setup

Repairing 48 high-risk vulnerabilities with fix recommendations from Code4Rena audit reports.

Benchmarks:

Code4Rena Dataset (Smart Contract Vulnerability Repair) [New]

Metrics:

Success Rate (Strategy)
Patch Validity (Compile/Manual/Validator check)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Code4Rena Dataset	Valid Patches	Not reported in the paper	23	Not reported in the paper
Code4Rena Dataset	Minor Modification Needed	Not reported in the paper	10	Not reported in the paper

Experiment Figures

Command-line interface usage example showing input arguments and an example output patch.

Main Takeaways

High success rate (69% combined valid + minor mod) on real-world functional bugs, significantly outperforming traditional pattern-based expectations.
The tool effectively breaks down complex logic bugs (e.g., price manipulation) into manageable reasoning steps.
Failures primarily stem from vague audit descriptions or business logic exceeding the LLM's context window/reasoning capability.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Solidity and Smart Contract vulnerabilities
Static Analysis concepts (Dependency Graphs, Program Slicing)
Large Language Models (Prompting, Chain-of-Thought)

Key Terms

APR: Automated Program Repair—techniques to automatically fix software bugs

CoT: Chain-of-Thought—a prompting technique encouraging LLMs to generate intermediate reasoning steps

Slither: A static analysis framework for Solidity that detects vulnerabilities and provides code information

Program Slicing: Extracting the subset of a program that affects the value of a variable at a specific point

CDG: Contextual Dependency Graph—a pruned dependency graph containing only elements relevant to the specific vulnerability

TWAP: Time-Weighted Average Price—an algorithmic pricing mechanism often targeted in manipulation attacks

Code4Rena: A competitive audit platform where security researchers find vulnerabilities in smart contracts