Big Data and Healthcare

Leveraging medical claims data while safeguarding consumer privacy

Section Sponsored by:

The big data, consumer privacy paradigm has been drifting on shifting sands for at least the last ten years, and most people still aren't certain whether they understand the value proposition of big data in the healthcare environment. This is especially true when we introduce Protected Health Information (or "PHI") into that dialogue. 

The concerns are real - according to reports released earlier this year (in the wake of the OPM data breach), the black market is oversaturated with credit card data - and financial institutions are more readily identifying cybercrimes before direct financial losses are incurred by consumers. Fraudsters are therefore turning elsewhere; for example, they may steal PHI to create false payer-provider member data that they then use to purchase goods or equipment for resale. In today's market, medical records reportedly can fetch up to ten times more than a credit card on the black market, and identity theft has been a reality for millions of Americans.  

With that in mind, it is understandable that when determining whether using PHI to develop big data analytics outweighs the risk, there are very few certainties. Certainties, such as current federal and state privacy, security and consumer regulations, most assuredly exist to protect individuals from these aforementioned "bad guys." More often than not, however, there is a good deal of ambiguity concerning the protection of PHI, particularly while utilizing that data to improve clinical outcomes and administrative inefficiencies within our newly regulated healthcare system. If you believed that properly secured PHI could help solve for clinical outreach to underserved populations within the U.S. - what value would you place on that record? 10x your credit card? Priceless?

We believe - and we have demonstrated from our expertise and experience with clinical analytics - that utilizing medical claims data within a HIPAA, the Health Insurance Portability and Accountability Act, compliance framework creates measurably improved clinical outcomes without jeopardizing individual privacy. We can balance big data with healthcare privacy, and there is no time like the present for improving our healthcare system for individuals, payers and providers alike.

Our personally identifiable information (PII) has long been in use by credit bureaus, law enforcement and pay-per-click advertisers, to name a few. The risks to individual privacy in these fields are regulated to some degree by the FTC, FACTA, state privacy statutes, opt-out provisions and privacy guidelines.  However where, as here, we are primarily discussing PHI (a subset of PII), HIPAA is not a roadblock - but rather the lynchpin to success. Few of the other checks-and-balances come close to the protection afforded to consumers by HIPAA.

HIPAA plays a critical role in the value proposition of these big data platforms. HIPAA ensures that our protected health information (PHI) is used to make healthcare treatment, R&D and associated administrative processes possible while effectively prohibiting any other use of PHI. In addition to the litany of data privacy and security requirements, HIPAA mandates aggregation and/or de-identification of data, which prohibits relinking data that would lead to identification of an individual.

De-identification is a tool that organizations use to remove PHI from data that they collect, use, transfer or disseminate to other organizations. The term "de-identification" is often interchanged with "redaction", "pseudonymization" and "anonymization." Here, we refer to "de-identification" as a process through which we anonymize data that would otherwise constitute PHI under HIPAA. De-identification helps us to leverage medical claims data assets in such a manner as to remove the association between the identifying dataset and the data subject. This way, we can help our payer - provider customers understand the predicative capability of healthcare analytics - without incurring the risk of injury to the individual consumer.

One method of de-identification is established by removing upwards of 18 direct identifiers of the individual or of relatives, employers or household members. De-identification may also be established via statistical certification, which effectively establishes that the risk of re-linking or re-identification is extremely low, e.g., less than 0.5%. All clinical analytics models can and should be HIPAA-certified to protect privacy while gaining greater insight into patient population patterns, avoidable cost opportunities and potential patient outcomes.

Once we understand the HIPAA "opportunity" we see that healthcare analytics provides the most promise in creating meaningful clinical and financial results for consumers. Whether through a health plan member portal or EHR, once an insurer, payer or provider has access to fully vetted and contextualized medical claims data, they are well-positioned to help their member, other patients and society more generally. 

SEE ALSO: Best Practices in Healthcare IT

Consider the ability to:

-          Improve clinical outcomes such as addressing why 20% of the population do not come back for follow-up visits nor fill their prescriptions.
-          Enhance participation in wellness management programs through effective outreach based on behaviors and preferences.
-          Reduce avoidable costs by identifying actions or treatment plans that reduce complications or the continuation of adverse outcomes.
-          Assign clinicians to help target the patients most likely to benefit from population health intervention.
-          Develop programs to focus medical management resources on the specific diseases/processes that most impact future costs.
-          Minimize fraud, waste and abuse by identifying practitioner anomalies, e.g., an orthopedic surgeon overprescribing Xanax or controlled substances in clusters.
-          Enabling insurers and pharmacists to identify centers of excellence, measure quality assurance metrics, reduce expenses - and reduce individual insurance premiums accordingly.

That ability - that insight into our healthcare system - is the tremendous opportunity that leveraging big data in healthcare offers consumers.

There is an innovative opportunity on the near horizon - the capability to determine potential unfavorable outcomes for patients while they are still in the hospital. By taking the existing data on the patient from as many sources as possible, including lab results, pharmacy history, patient specific data, physician notes, nursing notes, etc. and assembling them into the EMR, then using this data combined with the historical data for this patient and other patients, it is possible to forecast the likelihood of re-admission within 31 days, or of post-discharge infection, or any of several other possible outcomes. Armed with this foreknowledge, the case manager can intervene prior to the discharge of the patient to effectively eliminate or reduce the likelihood of the potentially unfavorable outcome by proper pre-release education or other interventions. Further, this allows the case manager to be more focused on follow-up visits, etc. to react to indications of non-compliance or other mitigating circumstance. The innovation here is the use of big data and analytics to develop an application that can perform near real-time forecasting to patient's data during their admission.

Ultimately, it may be a concern for privacy activists and spectators on the Hill, but we believe that our commitment to both protecting individual privacy and to advancing data analytics in healthcare is paramount to the success of our evolving healthcare system.

Jennifer G. Smith, Esq., CIPP, is divisional lead counsel, healthcare, LexisNexis Risk Solutions. Healthcare solutions from LexisNexis combine proprietary analytics, science and technology with the industry's leading sources of provider, member, claims and public records information to improve cost savings, health outcomes, data quality, compliance and exposure to fraud, waste and abuse.

You Might Also Like...

Potential of Big Data

From bench to bedside to the bottom line

Leveraging Data Analytics

Honing care pathways and treatment protocols

Articles Archives

I think that we will be able to “both protecting individual privacy and to advancing data analytics in healthcare,” and I agree that we must hope that the HIPAA mandated aggregation and/or de-identification of data, can prohibit “relinking data that would lead to identification of an individual.”
But we need to add more security over time and future-proof against increasingly sophisticated data matching efforts.”
Sweeney's now famous re-identification of Weld's hospitalization data using US voter list information in his 2010 paper.
We also know that NIST concluded that "Many of the current techniques and procedures in use, such as the HIPAA Privacy Rule’s Safe Harbor de-identification standard, are not firmly rooted in theory."
We know that the risk depends upon the availability of data in the future that may not be available now.
I think that we need a policy driven approach that can be easily adjusted over time as more data is available. I like to consider employing a combination of several approaches to mitigate re-identification risk.
I've seen two interesting technical approaches that can provide a balanced combined solution to address the growing issue of privacy and access to data.
The first approach is based on a service oriented privacy-preserving data publishing. This service oriented approach can provide policy driven control over how combinations of different data is accessed and the accumulated volume of data that is accessed.
The second approach is based on data tokenization and dynamic masking, can secure the data itself against misuse and theft. I think that a balance between the first and second approach can provide an attractive data centric solution for different sensitivity levels.

I think that strong data security can help to contain some of these issues with matching and theft of sensitive data.

Ulf Mattsson, CTO Protegrity

Ulf Mattsson,  CTODecember 08, 2015
New York Metro, CT


Email: *

Email, first name, comment and security code are required fields; all other fields are optional. With the exception of email, any information you provide will be displayed with your comment.

First * Last
Title Field Facility
City State

Comments: *
To prevent comment spam, please type the code you see below into the code field before submitting your comment. If you cannot read the numbers in the below image, reload the page to generate a new one.

Enter the security code below: *

Fields marked with an * are required.



executive insight

Get the latest news first!

With a FREE Executive Insight online account, you are always on the cutting edge.



Network with your colleagues on Facebook.


Join our group on LinkedIn.


Receive updates and new job postings



Subscribe to our feed.


Back to Top

© 2016 Merion Matters

2900 Horizon Drive, King of Prussia PA 19406