Data Minimization: Why More Data Is Not Always Better

As organizations rely on personal data across more systems, teams, and business functions, it becomes easier for data collection and retention practices to expand beyond what is necessary.

Data minimization is more than a requirement to collect less data. It requires organizations to evaluate whether personal data is necessary and proportionate across the full data lifecycle, including how it is used, shared, accessed, stored, and eventually deleted.

This article looks at how organizations can unintentionally collect and retain more data than they need, the risks that can result, and practical steps for applying data minimization in a way that supports compliance, data governance, and operational efficiency.

What Is Data Minimization?

Data minimization is a central data protection principle that mandates the limitation of collection and processing of personal data.

The GDPR states in Article 5(1)(c):

“Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’);”

The California Consumer Privacy Act takes a similar approach. California Civil Code § 1798.100(c) states:

“A business’ collection, use, retention, and sharing of a consumer’s personal information shall be reasonably necessary and proportionate to achieve the purposes for which the personal information was collected or processed, or for another disclosed purpose that is compatible with the context in which the personal information was collected, and not further processed in a manner that is incompatible with those purposes.”

Other U.S. state laws follow a similar structure. Virginia’s Consumer Data Protection Act requires data to be “adequate, relevant, and reasonably necessary,” while Colorado’s Privacy Act requires it to be “limited to what is reasonably necessary.”

Brazil’s LGPD reflects the same principle:

“Limitation of the processing to the minimum required for the accomplishment of its purposes, encompassing relevant, proportional and non-excessive data in relation to data processing purposes”

Major Asian privacy laws take a similar approach. South Korea’s Personal Information Protection Act (PIPA) requires organizations to collect only the minimum personal information necessary for the stated purpose. Japan’s Act on the Protection of Personal Information (APPI) requires businesses to specify the purpose of use and generally stay within the scope necessary to achieve that purpose. Singapore’s Personal Data Protection Act (PDPA) similarly limits organizations to collecting, using, or disclosing personal data for purposes that are appropriate in the circumstances and made known to the individual where consent is required.

Regulators reinforce this principle with practical guidance. The ICO makes it clear that organizations should not collect data “on the off chance that it might be useful in the future,” and CNIL emphasizes that data must be necessary at the time of collection.

Across jurisdictions, the wording might differ slightly, but the expectation is consistent. Organizations must be able to connect each category of personal data to a defined purpose and justify why that data, at that level of detail, is needed.

Why More Data Can Feel Valuable

For many businesses, data is closely tied to growth, efficiency, and competitiveness. Large technology companies such as Meta and Google, often referred to as “tech giants,” have shown how extensive data collection and analysis can support personalization, recommendation systems, targeted advertising, and product improvement.

Many organizations may see additional data as a way to better understand users, improve services, and compete in digital markets. The decision to collect more data is not always driven by carelessness. In some cases, it may reflect a belief that more information will lead to better insights, more tailored experiences, or stronger product performance. However, some of these tech giants have been subjected to growing regulatory scrutiny, particularly in relation to how personal data is collected, combined, and used at scale.

Outside of large platforms, the same pressures are present in more everyday contexts. Marketing teams often want more data to better understand their audiences and improve conversion rates. Product teams rely on detailed usage data to identify friction points and refine features. In some cases, organizations choose to retain data because it may become useful later, whether for identifying trends, improving services, or supporting future initiatives.

The motivation to collect more data is understandable because additional information can provide greater visibility and, in many cases, support better decision-making. The challenge is that the perceived value of data does not always align with what is necessary or proportionate for a specific purpose. That is where data minimization becomes less theoretical and more difficult to apply in practice.

More Data, More Risk

The benefits of collecting more data are often easier to see than the risks. The risks tend to emerge later, when data accumulates across systems, teams, and time. Holding more personal data does not only increase volume. It increases exposure in multiple ways.

  • First, there is the security risk. Every additional dataset expands what needs to be protected. The more data an organization holds, the more it becomes a target, and the greater the potential impact if something goes wrong.
  • Second, there is the risk of profiling. Data collected for one purpose can take on a different meaning when combined with other data. Information that seems harmless in isolation can become identifying or sensitive when aggregated. For example, combining location data, usage patterns, and account activity may allow an individual to be identified or categorized in ways that were not originally intended.
  • There is also a practical operational impact. Large datasets make it harder to respond to data subject rights requests. When information is duplicated across systems, stored in different formats, or retained beyond its useful life, responding to access or deletion requests becomes more time-consuming and less reliable. The CPPA noted that excessive data collection and retention can make it more difficult to locate and process consumer requests, reinforcing that data minimization supports more efficient rights responses.
  • Retention is another area where risk builds over time. Data rarely stays in one place. It is copied into analytics tools, shared with vendors, exported into spreadsheets, and stored in backups. Even if the original collection was justified, the continued existence of that data across multiple systems can become difficult to track and even harder to delete.
  • Finally, not all data carries the same level of risk. Collecting general preference data is one thing. Collecting biometric data, health information, or political affiliation is another. The sensitivity of the data changes the level of justification required and the consequences if it is misused or exposed.

The result is that more data does not just mean more value. It often means more complexity, more exposure, and more responsibility.

Data Minimization and AI Training Data

Data minimization is also relevant to AI development, particularly where personal data is used to train, fine-tune, evaluate, or improve AI systems. In this context, minimization does not mean using too little data. It means using data that is relevant, reliable, lawfully collected, and necessary for the intended model purpose.

Large datasets may appear valuable, but more data does not always lead to better AI outcomes. Training data that is unnecessary, outdated, duplicative, inaccurate, or poorly labeled can introduce noise, increase bias, reduce model quality, and make model behavior harder to explain. It can also increase privacy risk by expanding the amount of personal data that must be governed, secured, retained, and eventually deleted.

A minimization-based approach can therefore support both privacy compliance and model performance. Before using personal data for AI development, organizations should ask whether each category of data is genuinely needed for the intended AI use case. They should also consider whether identifiers can be removed, whether sensitive data should be excluded, and whether aggregated, anonymized, or pseudonymized data could achieve the same purpose.

This approach can make AI development more efficient. By focusing on data that is fit for purpose, organizations can reduce unnecessary review, remediation, storage, and governance work. They can also direct resources toward higher-quality datasets that are more likely to improve model performance and produce reliable outputs.

Where Organizations Can Accidentally Collect More Than They Should

In many cases, excessive data collection is not the result of a deliberate decision. It happens gradually, through processes that were never revisited or fully questioned.

  • One common example is the use of forms. Over time, fields are added for specific initiatives, experiments, or one-off needs. Those fields often remain long after the original purpose is gone. When no one can clearly explain why a field exists, continuing to collect that data becomes difficult to justify.
  • Human resources workflows present similar issues. It may seem efficient to use a single questionnaire for all applicants, but this can result in collecting sensitive information that is not relevant for certain roles. For example, asking for health-related information during early recruitment stages for office-based positions can easily cross into excessive collection, even if the same data might be justified in other contexts.
  • Marketing systems are another area where over-collection becomes visible over time. Organizations often retain large volumes of lead data, historical campaign information, and segmentation fields that are no longer actively used. The intention is usually to preserve value, but in practice, much of this data sits unused while still creating obligations and risk. Product and engineering teams may also contribute to over-collection. Modern systems make it easy to log detailed user activity, track interactions, and store event-level data. These logs can include more information than is necessary, including data that remains identifiable long after its immediate purpose has passed.
  • Duplication is a related issue that is often overlooked. Data collected for one purpose is frequently copied into other tools, shared across teams, or exported into different formats. Each copy increases the overall volume of data being processed, even if the original collection was justified. Over time, this creates a fragmented data environment that is difficult to manage.
  • Finally, retention “just in case” is another recurring pattern. Organizations keep data because it might be useful in the future, even when there is no defined plan for using it. This directly conflicts with regulatory guidance, which consistently emphasizes that necessity must be reassessed over time, not assumed indefinitely.

Is the Value of More Data Worth the Risk?

This is the central question organizations need to ask when deciding whether to collect, retain, or further use personal data. There is rarely a simple yes or no answer. Instead, it requires looking at the purpose and weighing it against the scope and sensitivity of the data.

Some practical considerations include:

  • Is the data actually used for the stated purpose?
  • Is the level of detail necessary?
  • How sensitive is the data?
  • How long is the data retained?
  • Could the same goal be achieved differently?

Facial recognition is a useful example because it can provide value while also introducing a higher level of risk. It can improve user experience and add a layer of security, but it also raises questions that go beyond convenience. Do users fully understand how their biometric data is processed? Who has access to that data? What happens if it is compromised? The risk profile is fundamentally different from other types of data. The question is not whether data can provide value. It is whether that value justifies the level of data being collected and the risks that come with it.

How Organizations Can Apply Data Minimization in Practice

How organizations collect personal data, how much they collect, and how long they retain it often depends on their industry, business model, data uses, and applicable legal obligations. The steps below are general starting points that organizations can use when assessing their data processing practices:

  • Review collection points
    Start with forms, applications, and intake processes. Each field should have a clear and current purpose. If the purpose cannot be explained, it is a strong indicator that the field should be removed or reconsidered.
  • Remove unnecessary data fields
    Fields that exist for “future use” or historical reasons often remain long after they are needed. Removing these fields not only reduces risk but also simplifies processes and improves data quality.
  • Review existing datasets
    Data minimization does not stop at collection. Organizations should assess whether existing data is still needed. Where it is not, deletion, anonymization, or pseudonymization should be considered, depending on the context.
  • Limit internal access
    Not every team or employee needs access to the same level of data. Restricting access based on function reduces exposure and aligns with the principle of minimizing use.
  • Map data flows and duplication
    Understanding where data is stored, copied, and shared is critical. Data that moves across multiple systems can quickly exceed what is necessary, even if the original collection was justified.
  • Align retention with purpose
    Retention periods should be defined based on actual business and legal needs, not default settings or convenience. Where possible, automated deletion mechanisms can help ensure that data is not kept longer than necessary.
  • Use existing technical controls
    Many organizations already have tools that support minimization, such as access controls, retention settings, and data classification features. These should be actively used rather than left as optional configurations.
  • Establish internal rules and processes
    Policies and SOPs help ensure that data minimization is applied consistently. This includes defining what data can be collected, under what circumstances additional data can be requested, and when data must be reviewed or deleted.

Data minimization should be applied across the entire data lifecycle, not just at the point of collection. It includes how data is used, shared, accessed, and eventually deleted.

Conclusion

Data minimization is often framed as a requirement to collect less data. In practice, it is about understanding why data is collected, whether it remains necessary, and how it is managed over time. Organizations may focus on how to collect and use more data while remaining compliant, but a more effective approach is to step back and ask whether the data is needed in the first place.

When personal data is collected without a clear purpose, retained longer than necessary, or duplicated across systems, the result is not only compliance risk. It can also create operational inefficiencies, increase exposure, and make data harder to manage over time.

Data minimization brings the focus back to fundamentals. Why is this data needed? Is the level of detail justified? Could the same outcome be achieved with less? Organizations that approach data this way are not only reducing risk. They are building data practices that are easier to manage, easier to explain, and more aligned with the expectations of regulators and individuals alike.

VeraSafe can help organizations review their data mapping, identify areas where data collection or retention may need to be reduced, and develop a practical plan for applying data minimization across their operations. Book your free consultation today.

You may also like:

A Guide to Privacy-Enhancing Technologies (PETs)
Data Protection Considerations for Impact Assessment Practitioners
Privacy by Design in the Age of AI

Related Topics: Compliance Tools and Advice

Monthly Newsletter

Contact VeraSafe to discuss your data security management and privacy program today.