← Back to Paper List

Making Bielik LLM Reason (Better): A Field Report

Adam Trybus, Bartosz Bartnicki, Remigiusz Kinas
Institute of Philosophy, Jagiellonian University
arXiv (2026)
Reasoning Benchmark RL Agent

📝 Paper Summary

Reasoning in Large Language Models National/Regional LLM Development Multi-Agent Systems for Mathematics
The paper documents the development of Bielik-R and Bielik-M, enhancing a Polish language model's reasoning through reinforcement learning on verifiable tasks and a multi-agent solver pipeline.
Core Problem
Polish language models lag behind global frontier models in complex reasoning and formal logic, often hallucinating or failing to scale on multi-step tasks.
Why it matters:
  • Poland ranks near the bottom of EU AI adoption, missing out on the transition from predictive AI to autonomous research agents
  • Generic LLMs struggle with language-specific nuance in strict formal analysis, legal reasoning, and advanced mathematics
  • Single-model approaches often hit 'lost-in-the-middle' limits on long reasoning chains, requiring orchestrated multi-component systems
Concrete Example: When solving 'Einstein's Riddles' or formal logic puzzles, the base Bielik 2.3 model hallucinates contradictions when additional variables are introduced and fails to dynamically abandon initial assumptions when instructions change.
Key Novelty
Bielik-M Multi-Agent Solver & Bielik-R Training Pipeline
  • Training a dedicated 'thinking' model (Bielik-R) using a 3-stage pipeline: SFT on distilled traces, DPO alignment, and RL (GRPO/DAPO) on 143k Polish verifiable tasks
  • Deploying a multi-agent system (Bielik-M) for mathematics that decomposes problems into Analytical (method ID), Executor (SymPy), and Summary agents to solve exam-level problems
Evaluation Highlights
  • Bielik-R achieved 89% accuracy on a specialized First-Order Logic benchmark
  • Bielik-R achieved 80% accuracy on Propositional Calculus tasks
  • An 11B parameter model (Bielik-M) successfully solves Polish matura exam-level mathematics problems by leveraging agentic decomposition and symbolic verification
Breakthrough Assessment
5/10
Significant for the regional (Polish) ecosystem and demonstrates solid application of modern post-training (RLVR) and agentic patterns, but admits to 'lagging behind' frontier models globally.
×