Public data alone no longer suffices to train our code models.
JetBrains justifies its decision to begin collecting data on the “real-world usage” of the LLMs integrated into its products. In particular, the entirety of prompts and responses… along with the personal or commercial information they may contain.
Under the banner of legitimate interest, this collection is enabled by default for individual users with non-commercial licenses. They still retain the option to opt out in the settings.
For enterprise deployments, the collection is opt-in. If activated, it applies to all end users across the affected products. To encourage adoption, JetBrains promises early adopters a one-year free subscription to its full pack… contingent on validation after joining a waitlist.
The collection is also opt-in for individual users who are on trial versions, community licenses or early-access builds.
Data collection alongside JetBrains AI
On most JetBrains products, collection begins with version 2025.2.4. Exceptions for DataSpell (2025.2.3) and DataGrip (2025.2.3). As well as for the community editions of IntelliJ IDEA and PyCharm, not involved.
The data will be retained for one year. The promised benefits: sharper detection of insecure code, more specialized models that are cheaper to operate, and, more broadly, an improvement in code quality and explanations. JetBrains also commits to publishing LLMs as it has already done with Mellum.
Data collection of LLM interactions adds, on the one hand, to the collection of anonymous telemetry (product usage: time spent, clicks…). On the other, to the behavioral data that the JetBrains AI service can gather. They are used to train not the code-generating models, but the models that govern the behavior of various features.