Report: AI speaks Polish. The ecosystem of open language models in Poland

“AI speaks Polish. The ecosystem of open language models in Poland” was researched and written by Alek Tarkowski (Open Future), Kuba Piwowar (Centrum Cyfrowe), and Michał Owczarek (SWPS University).

This report serves as a case study of Poland’s ecosystem for creating open AI models tailored to the Polish language. These small language models are developed as open-source solutions to fill gaps left by large commercial models, which often fail to cater to Polish language and cultural nuances. These efforts demonstrate how effective alternatives to dominant models can be created.

Large commercial models are typically trained on vast datasets, leveraging significant computational power, and driven by a vision of continuous technological scaling. The creation of such large-scale language models requires enormous financial resources, affordable only for monopolistic tech giants. However, alternatives are emerging. A new paradigm involving smaller language models and the availability of open foundational models is enabling the development of additional language models—particularly those addressing linguistic gaps in generative AI.

This report focuses on two key projects: the creation of the SpeakLeash language corpus and the Bielik model developed from it, and he work of the PLLuM (Polish Large Language Model) consortium, aiming to produce a large language model tailored to the Polish language.

Based on interviews with the creators of these Polish models, the authors examined the development process, challenges faced, and lessons learned from their achievements.

About the Report

Open Future is a European think tank exploring new approaches to building an open internet that maximizes social benefits from shared data, knowledge, and culture.

The Centrum Cyfrowe Foundation is a “think-and-do” tank focused on the societal dimension of technology. Its work revolves around the digital aspects of public affairs in Poland, particularly analyzing social, cultural, and economic changes linked to digital technology and supporting knowledge development in this field.

Authors

Alek Tarkowski is Strategy Director at Open Future, with a Ph.D. in sociology from the Polish Academy of Sciences. He has 15 years of experience in advocacy and building social movements for public-interest technologies. His research interests include public AI policies and data governance.

Kuba Piwowar is a sociologist and cultural scholar with a Ph.D. in cultural studies. He is a senior fellow at Humanity in Action, working on projects related to data usage and activism. He also serves as an assistant professor at SWPS University’s Department of Culture and Media in Warsaw. From 2008 to 2024, he worked at Google as an analyst and later as a strategic advisor to key business partners.

Michał Owczarek is a Ph.D. candidate in cultural studies at SWPS University, researching the history of media in Poland. He earned a master’s degree in digital sociology, focusing on conflicts between states and platforms over digital infrastructure. He is also interested in urban studies, particularly the impact of digital technologies on urban environments.

The authors thank the interviewees who contributed insights on the development of Polish LLMs: Paweł Cyrta, Adrian Gwoździej, Jan Kocoń, Sebastian Kondracki, Marek Kozłowski, Jacek Nagłowski, and Maciej Piasecki.

 

Polish version of the report available here.

 


The report is available under a Creative Commons Attribution (CC-BY) license.

AI-Speaks-Polish-EN_Centrum-Cyfrowe-Open-Future_cover2
Tytuł: AI-Speaks-Polish-EN_Centrum-Cyfrowe-Open-Future_cover2