Unlocking Europe’s Digital Sovereignty: The Rise of Open Source LLMs
Large language models (LLMs) have recently taken center stage in Europe’s digital sovereignty agenda, especially with the launch of the new OpenEuroLLM initiative aimed at developing open-source LLMs for all official European Union languages. This ambitious project is set to encompass the current 24 EU languages, along with those of countries negotiating EU membership, such as Albania.
Understanding OpenEuroLLM: A Collaborative Effort
The OpenEuroLLM project is a collaboration among approximately 20 organizations, co-led by Jan Hajič, a computational linguist from Charles University in Prague, and Peter Sarlin, CEO of Silo AI, a Finnish AI lab recently acquired by AMD for $665 million. This initiative is part of a broader European effort to prioritize digital sovereignty and keep mission-critical infrastructure closer to home.
Significance of Digital Sovereignty in Europe
As Europe intensifies its focus on digital sovereignty, it has prompted major cloud companies to invest in local infrastructure to ensure EU data remains within Europe. Notably, OpenAI has launched a new service that allows customers to process and store data in Europe. Furthermore, the EU has signed an $11 billion deal to establish a sovereign satellite constellation to compete with Elon Musk’s Starlink.
Funding and Collaboration Challenges
OpenEuroLLM has a budget of €37.4 million, with around €20 million sourced from the EU’s Digital Europe Programme. This amount is minimal compared to the investments made by large corporate AI entities. The project’s partners include EuroHPC supercomputer centers in several EU countries, which collectively have a budget of approximately €7 billion for related tasks.
- Key partners in OpenEuroLLM include:
- EuroHPC supercomputer centers in Spain, Italy, Finland, and the Netherlands.
- AI companies such as Aleph Alpha, Ellamind, Prompsit Language Engineering, and LightOn.
However, the extensive number of participating entities raises concerns about the project’s focus and efficiency. Anastasia Stasenko, co-founder of Pleias, emphasized that smaller, focused teams have proven successful in the AI sector, citing examples like Mistral AI.
Progress and Future Goals of OpenEuroLLM
The OpenEuroLLM project, while starting anew, builds on previous efforts like the High Performance Language Technologies (HPLT) project, coordinated by Hajič since 2022. This predecessor project aimed to develop reusable datasets and workflows using high-performance computing.
Hajič anticipates the first versions of the models will be released by mid-2026, with the final iterations expected by 2028. Despite the promising outlook, early-stage developments have only yielded a bare-bones GitHub profile.
Participating Organizations
The OpenEuroLLM initiative brings together diverse organizations from several countries, including:
- Czechia
- The Netherlands
- Germany
- Sweden
- Finland
- Norway
Notably missing from the collaboration is French AI unicorn Mistral, which has been approached for participation but has not engaged in discussions.
Core Objectives and Model Development
The primary aim of OpenEuroLLM is to create foundation models for transparent AI that uphold the linguistic and cultural diversity of all EU languages. The deliverables will likely include:
- A core multilingual LLM for general-purpose tasks.
- Smaller “quantized” models optimized for efficiency in edge applications.
Hajič emphasized the importance of quality, stating that the project is committed to avoiding half-baked results, especially given the significant public funding involved.
Open Source AI: Definitions and Challenges
The project seeks to adhere to the principles of open-source AI; however, challenges arise regarding the definition of “open source.” The Open Source Initiative has established guidelines, though there are ongoing debates about whether training data should be included as a requirement for open-source AI.
Hajič acknowledged that while the goal is to maintain transparency, some datasets may need to be kept confidential to comply with EU AI regulations.
Collaboration with Similar Initiatives
Another project, EuroLLM, launched earlier this year with similar objectives, has sparked discussions about collaboration. Andre Martins, a researcher from Unbabel, highlighted the need for different communities to work together rather than duplicating efforts.
Funding and Future Outlook
Despite skepticism surrounding funding and costs, Sarlin believes that OpenEuroLLM will secure sufficient resources to cover personnel and computational needs through EuroHPC partnerships. The project is not aimed at creating consumer-grade products but focuses on establishing an open-source foundation model for companies in Europe to build upon.
As Europe strives for digital sovereignty, the success of OpenEuroLLM could pave the way for a new era of AI development that respects linguistic diversity and fosters innovation within the continent.
For more insights on AI and digital sovereignty, visit the European Commission’s Digital Strategy.