Bielik: Poland’s Open-Source AI Revolution

Bielik is more than just a model. It represents the spirit of collaboration, innovation, and determination within Poland’s AI community. Developed by the SpeakLeash Foundation, Bielik emerged as a testament to what passionate volunteers can achieve through dedication and teamwork. This post highlights what Bielik stands for, why it’s worth exploring, and the hard work that brought it to life.

What is Bielik?

Bielik is Poland’s pioneering Large Language Model (LLM), designed to process and understand Polish at a level unmatched by global models. Built on the robust architecture of a transformer model, Bielik delivers safe, powerful, and precise text processing capabilities, tailored for Polish-language applications. Its decoder-only design ensures efficiency and control, making it ideal for various use cases.

The Role of the SpeakLeash Foundation

SpeakLeash is a grassroots initiative rooted in open-science principles. Over the past year and a half, it has become a beacon of AI innovation in Poland, achieving milestones such as:

  • Building the largest Polish text dataset, fully compliant with the European AI Act.
  • Collaborating with research institutions like Clarin, PAN IPI, and NASK PIB.
  • Leveraging cutting-edge supercomputing resources from Cyfronet AGH’s Athena and Helios to train large-scale models like Bielik.

What sets SpeakLeash apart is the collective effort of volunteers, including AI enthusiasts, researchers, students, and industry professionals, who contributed their time, skills, and resources pro bono. From data collection to model fine-tuning, every aspect of Bielik’s development reflects their dedication.

From Ambition to Reality

Bielik’s journey began with a simple yet ambitious goal: to create a model that could understand and generate Polish with precision. Here are some key milestones:

  • 1 TB Dataset Collection: In January 2024, SpeakLeash announced the completion of a 1 TB dataset—a major achievement that solidified its position as a leader in Polish AI development.
  • APT3 Models: Early successes with smaller models like APT3-1B laid the groundwork for Bielik. Trained on consumer-grade hardware over 44 days, APT3 demonstrated the potential of community-driven AI projects.
  • Supercomputing Power: Partnering with Cyfronet AGH enabled the team to scale their efforts, utilizing state-of-the-art GPUs and infrastructure to bring Bielik to life.

Bielik-11B-v2: A New Chapter

The second version of Bielik, Bielik-11B-v2, represents a leap forward. With 11 billion parameters and a 32,000-token context window, it’s a model built to tackle even more complex linguistic tasks. Developed through a unique collaboration between SpeakLeash and Cyfronet AGH, Bielik-11B-v2 showcases the power of community-driven innovation and high-performance computing.

Key features of Bielik-11B-v2:

  • Open and Free: Available on Hugging Face for anyone to use.
  • Collaborative Roots: Trained with support from a computational grant and cutting-edge resources.
  • Versatile Applications: From chatbots to content generation, Bielik-11B-v2 sets a new standard for Polish AI.

Further Reading and Resources

For those eager to dive deeper into the Bielik project, here are some valuable resources:

This is just the beginning of Bielik’s story. In the next post, we’ll explore how to interact with Bielik through its demo and run it locally on your PC, making advanced AI accessible to everyone!