The Digital Commons for Generative AI: Where Do We Stand?

Who will develop the digital commons of generative AI?

In the crowded field of France 2030 call-for-proposals, one is dedicated to this question. It is expected to yield models (generalist or specialized), deployment, evaluation and control systems, as well as databases that “valorize the national heritage.”

The phase for submitting applications closed in October 2023. In May 2024, the State announced seven laureates. Here we provide a brief progress update on those who have communicated their advances most openly.

Democratic Commons a structuré une équipe

Between 3,200 and 3,700 € gross per month depending on experience, 40 days of vacation and 5 days of RTT per year, public transport reimbursed at 50%.

Read also: Databricks hits a valuation of $134 billion

These conditions were proposed by Sciences Po at the start of 2025 for two postdoctoral positions. One required solid knowledge in political science; the other, in social sciences. But both would enter the same framework: Democratic Commons.

Make.org, Sciences Po, Sorbonne Université and the CNRS are the originators of this program, which has attracted, notably, Hugging Face, Mozilla.ai and the Aspen Institute network. Its principal objective, in broad strokes: to design a way to evaluate and correct biases in AI to ensure responsible use in democratic processes. This implies:

A scientific framework for determining the democratic principles applied to AI
A model for evaluating biases of LLMs relative to these principles
Debiased LLMs and citizen-participation platforms that conform to these same principles

Possible uses are envisaged in the synthesis of political debates, translation, moderation and writing assistance for contributing to the democratic debate.

The first six months of activity (September 2024 to February 2025) were the subject of an official report. It notes the recruitment of 12 members (5 in the tech & coordination team, 4 postdocs, 3 junior researchers), participation in “more than a dozen” events, and an initial draft of democratic principles applied to AI.

We are also told of an update to Panoramic. This platform, launched by Make.org in March 2024, offers a conversational interface on the Mistral AI stack to inform the public on various topics from validated corpora. It was tested, among other things, in early 2025 under the aegis of the National Commission for Public Debate in the context of the project to build two nuclear reactors at the Bugey site (Ain). It is also currently being used within the framework of the Assises for the protection of French expatriates.

ArGiMi, attaqué sur le droit d’auteur

“What lies behind the Villers-Cotterêts project?”

Read also: Red Hat strengthens AI security with Chatterbox Labs

The iDFRights (Institute for Digital Fundamental Rights, chaired by Jean-Marie Cavada) published, in summer 2024, a post with that title. Behind it stands ArGiMi.

The project bears the name of the three companies leading it: Artefact (AI integration into industrial applications), Giskard (model evaluation) and Mistral AI. Officially launched in February 2025, it aims to deliver French-language LLMs tailored to the specific needs of companies. The initiative includes the development of tools designed to simplify fine-tuning, including open-source datasets. Public actors (INA, BnF) and private ones (Ardian, Cdiscount, Crédit Mutuel Arkéa) are in the loop. CentraleSupélec is also involved, with a mixed research team.

The BnF is to provide a textual corpus drawn from the Gallica library. It will feed a foundation model developed by Mistral AI and which will later be fine-tuned for uses such as correcting texts transcribed by OCR. The INA, for its part, will contribute to adapting the models to audiovisual use cases, where spoken French is a particularity.

The roadmap also includes a legal study to determine the conditions under which patrimonial data could be used for training models. It is hard not to see this as a reaction to iDFRights’ release, in solidarity with the APIG (Alliance of General Information Press), the SEPM (Syndicat des éditeurs de la presse magazine) and the SACD (Society of Dramatic Authors and Composers).

The institute asserts that ArGiMi intends to benefit from the Villers-Cotterêts project, named after the town hosting since November 2023 the International City of the French Language. The project involves creating on site a center for automatic processing of French and other languages in France. Also supported within France 2030, it fits into the European ALT-EDIC4EU initiative, aimed at supporting the development of a common infrastructure in the field of language technologies (see our article on it).

The iDFRights expressed concern that ArGiMi could freely access data held by the INA and BnF, data that include materials still protected by copyright. The consortium, the institute explains, “cannot claim to fulfill its legal and moral obligations toward journalists, press publishers, authors and composers, or their rightful owners,” [sic] “by invoking that its activities fall under research and that its solutions for integrating language models in companies will be open source.”

OpenLLM France : LUCIE comme jalon référent

Linagora is the lead of this 17-member consortium formed in the continuation of the community bearing the same name.

OpenLLM France plans to provide a family of pre-trained, multimodal LLMs (with a particular emphasis on voice processing) and under 15 billion parameters. It also aims to advance academic research to demonstrate that specialized models—ideally around 1.5B—can rival the largest LLMs as long as they are combined with the right data sources.

Read also: Cloud databases: the abundance of offerings becomes a challenge

Among the use cases highlighted by the consortium:

Voice assistants
Evaluation and certification of LLMs for civil security
Detection of informational systemic risks
Educational assistant within an educational and research setting (the National Education department is a primary target)

As things stand, the main embodiment of its work is called LUCIE. We recall the chatbot version, which drew its share of bad buzz in early 2025. But there is also the underlying LLM (7B), published since, along with its dataset.

Since the call for proposals, OpenLLM France has found continuity in OpenLLM Europe, which claims complementarity with ALT-EDIC4EU.

Photoroom : chantier (grand) ouvert au public

The company of the same name, specialized in AI-assisted photo editing, is leading this project aimed at developing a text-to-image model that promotes French heritage.

Recently, the company has begun sharing with the community its work on a model whose weights will be published under a permissive license. It has notably presented images from a 1.2B version trained at 256 pixels in 9 days on 64 H200 GPUs. Various libraries have already been released upstream, for example to enable rapid image export and dataset management.