Press "Enter" to skip to content

Blog: Data broker Oracle just acquired millions’ sensitive health data

Justin Sherman

January 14, 2022

Full PDF View: Data broker Oracle just acquired millions sensitive health data

Right before the holidays, Oracle, the large data broker and cloud provider, announced it was acquiring electronic health records company Cerner for $28.3 billion.

Much of the press coverage focused on the possible delivery of better healthcare outcomes. For instance, in a Wall Street Journal news article, analysts pointed out that Oracle could create a cloud-based platform for accessing medical records that could help both healthcare providers and patients access and use data.

This reflected the line promoted by Oracle and Cerner. “This new generation of medical information systems promises to lower the administrative workload burdening our medical professionals, improve patient privacy and outcomes, and lower overall healthcare costs,” Larry Ellison, Oracle’s founder, chairman, and chief technology officer, said in a press release. David Feinberg, Cerner’s chief executive officer, said the acquisition will help with “modernizing electronic health records (EHR), improving the caregiver experience, and enabling more connected, high-quality and efficient patient care.”

Improving the quality of medical care is undoubtedly an important objective, ever more so amid a still-raging pandemic. While disparities in healthcare access—between, for instance, Black and white communities or the wealthy and the poor—were not mentioned in the press release, lowering the costs of care while improving quality could also help to address some of those issues. It is quite possible Oracle and Cerner will do work in those areas.

But the majority of the media coverage missed an essential fact: Oracle is one of the largest data brokers in the country, earning millions of dollars a year from the virtually unregulated practice of data brokerage—collecting, aggregating, buying, and selling Americans’ intimate personal data on the open market. Its acquisition of Cerner, the second largest electronic health records vendor in the country, therefore presents serious implications for citizens’ privacy.

Cerner provides many software-related services for the health sector, such as cloud-based software for health providers to manage data. But Cerner also has health-related data on individuals. For instance, Cerner advertises “Cerner Real-World Data,” a “national, de-identified, person-centric data set solution that enables researchers to leverage longitudinal record data from contributing organizations. With Real-World Data, you can access volumes of de-identified information for retrospective analysis and post-market surveillance to help support health care outcomes.”

As I wrote in my most recent WIRED column, true data anonymity is a myth. While there is some value in statistical techniques like differential privacy that can help better obscure individuals’ specific information when that data is processed, labeling a wide range of tools and techniques “anonymization” or “deidentification” is patently misleading. Organizations do not need names in a dataset to cause serious harm; many times, organizations, especially data brokers, have individuals’ names anyway; and with all the data that exists in the world on

individuals, it’s all too easy to link information to specific people, even if the entity doing so does not start with that person’s name or Social Security Number.

Thus, there is still great reason for concern even when Cerner says its data is “de-identified.” Oracle already holds and advertises data on millions of people, including highly sensitive data like Americans’ GPS location histories. It could easily combine those immense datasets with supposedly “de-identified” data held by Cerner to learn even more information about specific people. Oracle could then fold that information into its data brokerage services—all part of an ecosystem built on the virtually unregulated collecting, aggregating, buying, selling, and sharing of people’s highly sensitive information. Companies could buy that data to target and potentially exploit individuals in all kinds of ways.

The question for Oracle is the degree to which the company will put controls in place on the sale of individuals’ data it may get in the Cerner acquisition, and whether there will be public transparency on those controls. Currently, the data broker industry does not have commonly understood best practices for many important controls, and for companies that do put some controls in place, they are most often not transparent about those controls. For instance, some brokers put in place vetting processes to screen potential customers, but those processes are opaque and rarely ever described to the public. These vetting processes could include requesting information from customers about their organization, their employees, their internal privacy and security controls, and their intended data use cases. Many brokers also do require customers to agree to terms and conditions that limit the sale, licensing, or sharing of data they acquire from the broker, including to restrict the ability of customers to then subsequently transfer or provide access to that data to third parties. However, based on our research, those terms and conditions are opaque and rarely available to the general public. Further, there is only very limited transparency on the degree to which brokers put in place detective controls to understand whether customers use data for reasons outside of their stated purpose after they purchase it—paired with accountability mechanisms to mitigate against those risks and to prevent uses of data outside those previously specified or agreed upon. These are all important risk mitigation controls, and brokers should provide enough information for individuals to understand the degree to which they are implemented.

Far more importantly, however, the question for regulators is why the United States allows the exploitative collection, aggregation, buying, selling, and sharing of millions of Americans’ sensitive data in the first place.

Justin Sherman (@jshermcyber) is a fellow and research lead of the data brokerage project at Duke University’s Sanford School of Public Policy