Behind the scenes: GDPR and data-driven innovation in the pharmaceutical industry

Chiara Rustici

As pharma continues its journey towards obtaining the optimal data lifecycle, four industry experts contemplate what the path ahead looks like for pharma innovation and the implementation of GDPR.

Despite the increase in pharma’s R&D funding, the amount of innovative products generated by the market hasn’t seen a proportionate rise.

Pharma is now looking at its vast reserves of data for actionable insights. However, many hurdles stand in the way before pharma reaches the optimal data lifecycle.

Also, May’s General Data Protection Regulation (GDPR) deadline adds a few more hurdles for manufacturers to negotiate in order to protect data subjects.

Pharmaceutical organizations handle incredibly sensitive information: including customer data, patients’ details, and internal intelligence on research and market developments.

The new EU regulation demands operational overhaul in this regard. It would be a huge mistake for individuals or establishments to assume they are born GDPR compliant.

Here is a look behind the scenes at the panel discussion which took place at the 2017 Data Analytics in Pharma Development conference.  


 Nigel Hughes, Scientific Director, Janssen

Dr. Gerhard Noelken, Business Development Europe, Pistoia Alliance

 Arun Bondali, Senior Architect Lead R&D Application Management, AstraZeneca

 Chiara Rustici, Independent GDPR Analyst,

How far is the pharma sector from the optimal data lifecycle?

Gerhard: “For good data lifecycle management, consistent data architecture combined with semantic technology that supports different Ontologies in different pharmaceutical domains is needed.

“While many areas like clinical research and safety have invested heavily in excellent data management, aligning those domains with the data coming from areas like early research, commercial or manufacturing can still be difficult.

“In research, creative excellence in coming up with new ideas is not always easily compatible with consistent vocabulary or data standards to be used. Semantic technology, intelligent data capture systems with automatic recording of experimental and contextual metadata can be very helpful.

“Good data standards can be a great relief for a whole scientific domain if they are introduced efficiently and broadly accepted. Not only will scientists benefit from more efficient data sharing, but now data analysis tools and artificial intelligence machines can digest data from very different sources much more efficiently without the need for huge data cleansing activities.”

Arun, if you could remove one big hurdle from the path to the ideal lifecycle, what would it be?

Arun: “It is necessary to identify master/golden record data entities that should be unique and non-duplicable. A few examples could be related to patient identity from an EHR result record from a sample management system and customer record from an ERP system. This can be achieved by a master data management tool with a technology service bus or a data warehouse that can aggregate multiple data points.

“While larger enterprises have realized the need and are investing significantly in master data management with good success, a lot of smaller companies still continue to utilize data tools like spreadsheets. This can be initially cost-effective, but eventually becomes a huge liability for integrity and compliance with the expansion of data volume and utilization. It is strongly advised to design system strategies that account for data integrity and avoid duplications.”

 If you were given the same opportunity to remove one large obstacle from the path to lifecycle nirvana: what would that one hurdle be for you, Gerhard?

Gerhard: “The need to constantly improve the understanding of how important data governance is.

“We typically see two different types of data: data generated around new potential drug candidates and data generated for the better understanding of basic research principles. Unfortunately, data in those two areas are often treated very differently.

“In the first case, the key driver is the success of the project, to get the drug to the patient as efficiently and safely as possible. This sometimes means that you have to drop non-efficient compounds early and cannot invest into the documentation of all the information for those non-efficient drugs to the same level of detail.

“When you look into these datasets ten years later to evaluate a different target mechanism, “negative” data can be as valuable if not more valuable than the data of your original lead compound.

“Good data governance standards require applying the same level of attention to all the various data types. In the long-term this can create a much higher value for the “Corporate knowledge repository” pharma companies have accessible today.”

What keeps Nigel, the scientific director at Janssen, awake at night?

Nigel: “I believe there are some key challenges significantly impacting the development of a 21st century real world evidence domain (in Europe).

“The impending implementation of the GDPR concerns me not only due to the complexity of the regulation, but the apparent lack of urgency expressed by many involved.

“According to the EU Commission across all member states, only Germany and Austria have enshrined the GDPR within local law, and we have limited time in which all need to be prepared and ready for the differences in interpretation of consent, privacy, anonymity, etc. Of more concern is a prevalent belief that to be ‘GDPR compliant’ is likely to be minor in nature, and for the most part we are all relatively compliant already, but are we?

“Meanwhile, there are areas of positive development, as well as other challenges. A recent and encouraging trend is in the increasing pre-competitive collaboration between pharmaceutical companies, in particular via the Innovative Medicines Initiative (IMI), and projects such as EHR4CR, EMIF, GetReal, and soon EHDN, all supporting real world data/evidence infrastructure, methods and research.

“There is a saying that 80% of bioinformatics is about people, and not data. Such a pivotal challenge is the inception of collaborations and establishment of networks, bringing together the diverse actors within joint endeavours. Understanding mutual areas of interest, versus divergent cultures and agendas is key to successfully working together to establish the infrastructure between data sources and data users, with relevant governance, standards and methods. GDPR needs to also be seen within that context.”

Chiara, as a GDPR analyst, examining the impact of this regulation on specific sectors, can you explain the effect it will have on pharma data and do you share the concerns of some in the pharma industry?

Is data-driven pharma innovation in Europe going to be penalised by data protection compliance?

Chiara: “The General Data Protection Regulation (GDPR) does not regulate pharma data per se but regulates what must and must not be done when working with personal data sets. In particular, health data, genetic data and biometric data are singled out as special categories of personal data attracting additional precautions because of their high-risk potential to infringe individuals’ rights or freedoms.

“However, it is not a piece of law that will automatically invalidate existing sector-specific laws, but it will work alongside any laws already applicable to pharma research: the effect is to “fill any gaps in between other laws”, so to speak. Thanks to the GDPR we will know how to handle personal data in all areas of research, whether previously regulated or not. What’s new, of course, is a mindset shift from believing that if a data use-case is not explicitly prohibited then we go ahead presuming it’s ok, to knowing that when a data set includes personal data like biometric, genetic and health data, it will need to be given the GDPR/Privacy-by-default treatment.

“So, a first reassurance I’d like to offer is that, even if national parliaments round the EU do not manage to incorporate the GDPR in their existing bodies of national law in time for the May 25th deadline, the black-on-white text of the GDPR will apply directly and will be your guide for any data research. It might make actual litigation slightly more difficult, as not all the legal procedures will be in place, but from a pharma researcher standpoint, rather than a litigator’s, overall the GDPR has been drafted clearly enough to allow it to dovetail with other sector-specific laws and rules.

“A second reassurance I’d like to offer is that the GDPR has been drafted with a number of clauses “suspending” some of the GDPR prohibitions in the context of public health and scientific research. Most areas of pharma research, in short, have little to fear from the GDPR. (See my presentation slides and sector reports).

“What will become important, however, is transparent accountability: the obligation to indicate - for any data record or data manipulation exercise - the legal basis or research-specific legal exception we rely on. Every data record and any data transformation must be accompanied, right from the start, by a description of: (1) its legal basis; (2) its purpose; (3) its retention period or timeframe; (4) its recipients/any parties it will be disclosed to; (5) whether individual profiling is carried out and its likely consequences for the individual.

“This accountability framework will turn out to be an important ally in what all other panellists see as critical investment in pharma data management: no longer a nice to have, a data architecture enabling excellent data lineage, purpose-driven data taxonomy and metadata management will double-up as an indispensable tool to demonstrate accountability to the data protection supervisory authorities as of May 25th.

“Real-world data and real-world evidence, however, deserve a special mention in the context of data-driven pharma innovation.

“The potential of vast troves of connected device data or social media feeds to yield pharmaceutical insights needs to be tempered with GDPR considerations about individuals’ awareness of any further uses their data is put to and understanding of which organisations it will be shared with. In most situations connected users are not intentionally donating their data to research. Until now the consumer IoT has been relying on opaque consent terms or authorisations to share data with generically described “trusted third parties”. The GDPR is a turning of the screw on these loose permission terms and will not grandfather the legality of legacy data sets if they have old-style permissions. Any real-world data sources pharma research relies on must be scrutinised for their potential to be, quite simply, obtained illegally.”

Assuming we are already living in a perfect GDPR world, where every company is compliant and you can trust your contractors and data sources, what is the key innovation we should focus on in pharma?

Arun: “Targeted real world evidence (RWE) data could be used for:

  • Improving quality of trial design and patient recruitment.
  • Research outcomes that cannot be obtained from traditional clinical trials like patient adoption, secondary unintended reactions, placebo impact etc
  • Collaborating transparently with direct patients and recruits  

The identification of RWE data sets can be modeled on:

  • Patient Claims and EMR’s
  • Regulatory agency compliance
  • Affinity to clinical data sets
  • Statistical significance”


Nigel, you have strong views about this same issue as pharma analytics focuses too much on outcomes/findings. What are your gripes with RWE?

Nigel: “No one data source is the whole truth and real world data is of course intrinsically ‘messy’. With those caveats in mind, we can derive evidence and insights into the real world experience for patients, their carers, the healthcare provider, et al., but we need not to be too definitive about that evidence and those insights currently until we especially see improvement in the quality of real world data.

“Clearly, this is problematic when the data is collected for the primary purpose of clinical management, not research, but it is in all our interest to raise the standard of data and evidence and this may be so in time with a patient-centric, patient-involved model.

“An example of innovation here is the utilisation of placebo data, which though not real-world data per se, can be seen as a hybrid between the rigor of protocol-driven RCT data, and real-world data, enabling research into e.g. natural history of disease in such patients evaluated more diligently over time within a protocol of observation, but not on an active agent.

“Within EMIF, two EFPIA partners, GSK and Janssen, collaborated on the evaluation of liver disease, presumed NAFLD/NASH in type 2 diabetics, via studies over a few years of observation and in thousands of placebo patients (a manuscript is being submitted to a relevant journal).

“Ultimately the pharma model in this respect is decidedly odd – we start in the real world, abstract from there into an increasing artificial paradigm from discovery onwards, and then return to the real world in post-authorisation and marketing of a new agent, while perhaps not fully appreciating what is lost in that process.

“The incorporation of real world data/evidence needs to be considered therefore from discovery, through development and into post-authorisation, not just in that latter phase, while reflecting the patient journey, trajectory and treatment pathway. We also need to improve upon on that abstraction, so ensuring there are longitudinal phenotypic and genotypic cohorts, facilitating access to real world data and samples in discovery, study protocol feasibility, strategic site and participant selection for development, and then outcomes and pharmacovigilance in post-authorisation.”

How can the pharma sector prevent the GDPR from becoming a litigation industry in its own right? 

Chiara: “Great question!

“The short answer is education and direct engagement with your regulators and your membership organisations. Talk to them more than you talk to your lawyers!

“The longer answer is that the GDPR is a type of legal instrument that does not offer technical specification and leaves many determinations of harm to the person whose data we are using as risk-based evaluations. In practice, we have some open-ended clauses that require an objective methodology to assess these risks, but the various industry sectors are expected to come up with these objective methodologies themselves, possibly through sector-specific Codes of Conducts and standards.

“A case in point is repurposing of personal data.

“The basic rule is that such data needs to be collected for a specified purpose or purposes and the individual offering up their data needs to know what the purposes are.

“Now, if a pharma research company comes up with a novel use for that same data it has two choices: it can go back to the individual and explain the new purpose, or it can ascertain that the new purpose is “compatible with the original collection purpose”, record its reasoning for that determination and carry on its research using the repurposed data.

“It is not difficult to see that “compatibility of a new purpose with the original collection purpose” is one such open-ended clause.

“Throughout the GDPR we are not told how to go about making these determinations. We are told, however, that we must make them on the basis of an objective criteria (not one that we create on the spot but one that our processes have identified from the start) and that we must assess any risks from the point of view of the individual, not the organisation.

“It is also not difficult to see that these open-ended clauses are much sought-after litigation “hooks”: ask legal teams to define “compatible purposes” and they’ll dine on it for months.

“The solution is not to rely on the courts to make that determination through case law, but to invest time on developing pharma research standards with industry colleagues in pre-competitive collaboration mode.

“What is an acceptable common industry understanding of “compatible purpose” in a diabetes drug trial? And who is best placed to answer that question? The courts? Law firms? Or the industry stakeholders themselves?

“Through the encouragement of the creation of Art. 40 Codes of Conduct, the GDPR hands over to the professions themselves the freedom and responsibility to shape the four corners of a level playing field and the ground rules of their own data handling ecosystems.

“The Regulators are keen to understand what works for each sector. It’s their job to encourage and oversee the creation of standards and Codes of Conduct that will turn all those open-ended, semantically vague, clauses into workable frameworks.

“So, if you do not want to see a pharma GDPR litigation industry, talk to the Privacy Regulators and talk to your colleagues and professional bodies about standards.”

Chiara Rustici is an independent GDPR analyst, author of “Applying the GDPR: Privacy Rules for the Data Economy” and publisher of sector-specific GDPR reports. She is acting as GDPR subject matter specialist for the BCS law special interest group and has been leading the efforts to create a Code of Conduct for the eDiscovery profession at EDRM/Duke University since 2016.

Chiara Rustici can be contacted via her LinkedIn profile