How Data Lineage Reveals Its Secrets

Today, security teams are under constant, exponential pressure. For example, the 2025 edition of the ANSSI Threat Panorama reports 196 data exfiltrations in the past year, up from 130 in 2024.

At any hour, they may receive a call from their security operations center (SOC) alerting them to the transfer of files from a confidential digital vault to a personal cloud storage account. Even if their analysts halted the flow immediately, some files have already vanished into thin air. They can only observe the breach, and wait for their CISO to report this incident to the board. Yet they know that the board will question the first person involved, questions that will go unanswered.

Even armed with the most modern tools and cutting-edge AI features, security teams struggle to reconstruct a traceability chain with convincing evidence. And without this information, they cannot draw lessons from such incidents, improve policies and procedures, or minimize the risk of recurrence.

A relatively recent mechanism that can be integrated into data-flow policies, data lineage provides CISOs with the elements they need to answer the questions they are frequently asked by their boards.

To Start at the Beginning

Traditional security controls monitor the entry points to an environment and, in most cases, help identify those that have vulnerabilities. However, unlike physical structures, software enables the emergence of new doors—often faster than inspection systems can detect and shield them.

Read also: How Shadow AI is Driving Data Exfiltration Risk

In fact, it’s better to master what passes through these doors than to fortify the doors themselves. To that end, a wide range of access checks based on permissions, embedded in files and data objects, has existed for decades. Yet they remain impractical due to their complexity and poor interoperability.

Unlike entry doors, data lineage simplifies the development of file and data protection policies. This mechanism traces the journey of an object from its origin to its destination, recording every action and every actor involved. It provides detailed information about all movements and those responsible.

Moreover, a simple “Save As” command, which typically erases the historical metadata of new copies, will not interrupt the process. Data lineage provides an immutable audit trail that enables security teams to identify ordinary and authorized data flows. From there, they can establish protection policies that anticipate potential future vulnerabilities of these data.

For example, a malicious employee downloads a list of at‑risk clients and emails it to a colleague. The colleague reformats the data, before uploading it to a personal account. Without data lineage, these three distinct events bear no relation to each other.

With data lineage, information about the actors and actions remains attached to the file, regardless of its format, creating a throughline of movements and intents. Data lineage tracks their complete lifecycle.

It also provides the elements needed for effective internal risk management. Companies can thus halt unauthorized flows when attackers attempt to bypass security policies by changing names or formats. The lineage graph (a visual representation of an object’s lifecycle) becomes another tool in the security teams’ arsenal to more quickly pinpoint the root cause of an incident.

Data Lineage as a Value Driver

The process also scales to ever larger environments. If a company struggles to manage data flows among 10,000 people, it will be overwhelmed when facing 10 million agents. In this context, alongside non-human identity management systems (each agent must have one), data lineage quickly becomes an indispensable approach.

Read also: SSE: The Experience Is Getting Simpler Than the Prices

It not only ensures reliable traceability of these actors and their actions but also helps comply with increasingly numerous and increasingly vague regulations defined worldwide. For example, integrating a data lineage process into the training data of large language models (LLMs) helps ensure they comply with internal ethical policies and external regulations.

Thus, companies have the evidence necessary to demonstrate to customers, auditors, and regulatory bodies the security and reliability of their learning materials.

As with other security products, autonomous data lineage tools exist. That said, deploying multiple autonomous lineage tools in concert yields only marginal added value. They prove unable to exchange signals, and, if they cannot influence the policies governing data flows, their usefulness remains limited. To be effective, and enable portable protection of data and files, lineage must be an integral part of broader platforms that apply security policies to all data that traverses them.

From private applications to a wide variety of websites, including sanctioned and unsanctioned SaaS apps, files circulate everywhere, across managed and unmanaged devices. Security teams require platforms capable of inspecting all these flows and categorizing each object. Going forward, they must also track every actor and every action. They ensure that the right people have the right access to the right resources, at the right time, and for the right reasons.

Data is very much like air: it fills every available space and escapes as soon as possible. It also tells stories. Data lineage sheds light on these stories, strengthening the security posture of enterprises as they pursue the balance between protection and innovation.

*Steve Riley is Field CTO at Netskope

Dawn Liphardt

Dawn Liphardt

I'm Dawn Liphardt, the founder and lead writer of this publication. With a background in philosophy and a deep interest in the social impact of technology, I started this platform to explore how innovation shapes — and sometimes disrupts — the world we live in. My work focuses on critical, human-centered storytelling at the frontier of artificial intelligence and emerging tech.