The biopharmaceutical industry engages in a variety of ways with UK health data, some examples of which are set out in Box 6, and works with a number of different datasets, of which some of the larger ones are set out in Box 7.
In building opportunities to support more health research, HDR UK has identified three dimensions in which health datasets can be strengthened:
- Breadth/scale: the populations that health data cover must be sufficiently wide to enable and support clinical research.
- Depth: within each health dataset, the type and volume of information consistently captured about patients must become more diverse and varied.
- Follow-up duration: the datasets must routinely collect follow-up data wherever possible, so that changes in patient outcomes over time – and what might have led to them – can be investigated.
The UK’s data landscape is illustrated in the diagram.
Figure 2. Examples of the three key parameters of UK health datasets – breadth/scale, depth and duration of follow-up
It is worth noting that different research projects at different stages of the new medicine development process require different mixes of the parameters of breadth, depth and follow-up duration.
We welcome the initiatives that are under way to make progress towards HDR UK’s ambitions. For example, the UK Government is building on its world-leading 100,000 Genomes Project with ambitions to sequence five million genomes within five years  – supported by the multi-million-pound contribution that industry is making to UK Biobank.
Northern Ireland, Scotland and Wales have all contributed to the 100,000 Genomes Project  and the devolved nations maintain their role in the UK-wide ambitions.
In addition to improving the quantity and quality of UK health data in these ways, however, steps must also be taken to improve its accessibility.
Figure 3. The views of small and medium-sized companies on the NHS data landscape 
Respondents were asked: Please could you indicate how much you agree/disagree with these statements about the NHS (net agreement percentages)
As the Life Sciences Industrial Strategy noted, and formal feedback from the biopharmaceutical industry has made clear (see Figure 3 and Box 8), although the UK possesses a number of data sources which offer significant potential to researchers, the UK does not offer the deep, near-real-time access to data across multiple care settings which would allow the UK to offer health data resources comparable to the best in the world, such as Flatiron (see Box 9).
Furthermore, steps need to be taken to facilitate the use of health data captured during routine clinical care by bodies such as NICE, to enable better information on the potential value of a new medicine in the NHS.
We see five characteristics of the health data landscape that need to be addressed to unlock the opportunity that exists, set out below – alongside corresponding priority action areas, where industry can work with Government to deliver the required improvements.
Box 6: Examples of the ways in which biopharmaceutical companies use health data for research
The Salford Lung Study was a community-based, real-world Phase III randomised controlled trial (RCT) for a new treatment for COPD and asthma, sponsored by GSK. The RCT made use of electronic patient records, which allowed patients to be monitored during ‘normal’ clinical practice in near-real-time but with much less intrusion into their lives than typical RCTs.13 The Salford Lung Study shows the potential of establishing virtual clinical trials using the UK’s health data resources.
Research undertaken by biopharmaceutical companies BioMarin and Alexion using health data gathered through the 100,000 Genomes Project has helped researchers better understand the clinical spectrum of symptoms that people living with rare genetic diseases show – and has also helped diagnose patients unknowingly living with rare genetic disorders.
The BSRBR-RA study is a unique collaboration between the University of Manchester, the British Society for Rheumatology and the biopharmaceutical industry. It tracks the progress of over 20,000 people with rheumatoid arthritis (RA) who have been prescribed biologic medicines (including biosimilars) and other targeted therapies.
AstraZeneca is working with NHS Scotland as part of its Global Genomics Initiative to make use of patients’ genetic information to develop new treatments.
Box 7: Examples of the UK’s larger health datasets
The Clinical Practice Research Datalink (CPRD) collects data on patients from a network of GP practices across the UK (including 16 million currently registered patients ).
Wales’s Secure Anonymised Data Linkage (SAIL) Databank holds a wide range of de-identified health and care datasets, from primary care to outpatient data, which can be linked and accessed via a remote gateway for approved research projects.
The 100,000 Genomes Project combines whole genome sequencing data with medical records from around 85,000 people.
England’s Hospital Episode Statistics (HES) capture a wide range of clinical information on around 20 million patients admitted to hospital a year.
The UK Biobank has been collecting increasingly detailed data on 500,000 people since 2006.
England’s National Cancer Registration and Analytics Service (NCRAS) collects data on all cases of cancer that occur in people living in England.
The Scottish Cancer Registry has been collecting population-based information on cancer since 1958 and now holds over 1.8 million records.
The National Institute for Cardiovascular Outcomes Research (NICOR) collects clinical data on cardiovascular patients across the UK. It oversees the National Cardiac Audit Programme, which had over 380,000 patient records entered in 2016-17.
The Systemic Anti-Cancer Therapy (SACT) dataset has been collecting data on the use of systemic anti-cancer therapies across all NHS trusts in England since 2012.
Box 8: Feedback collected from the biopharmaceutical industry by Health Data Research UK 
An engagement process with biopharmaceutical industry representatives led by HDR UK in 2019, in order to inform the specification of the Digital Innovation Hubs (DIHs) programme, collected the following feedback on the UK health data landscape:
Time delays and unpredictability prevent UK data access for many companies: their priorities are to see transparent, predictable, quick access to data.
Companies most frequently request health data that can support trial recruitment, help demonstrate value, and understand and stratify disease.
Companies value health data services that assist with health data discovery, offer quick and predictable access to health data once discovered, provide data curation, and are underpinned by pre-approved contracts and models.
Gaps in the UK’s health data that companies want to see addressed are: direct linkage to secondary care data to understand treatment effectiveness in detail; quick assessments of patients presenting in each site for trial feasibility; and the ability to recruit patients in real time based on automated eligibility checks.
Box 9: Flatiron 
For maximum utility, cancer datasets need to capture each patient’s stage at diagnosis, every treatment cycle (including the specific treatments delivered) and each patient’s responses and outcomes. Few health datasets anywhere in the world capture this kind of detail.
US company Flatiron created a unique dataset of around two million patients with cancer, which was bought by biopharmaceutical company Roche in 2018.
Flatiron’s value was generated not by the sheer volume of information in its database, but instead by the way in which each entry in its database was meticulously curated to develop a clinical research-grade dataset, in an enormously labour-intensive process. 
1. The data landscape is fragmented
The NHS is seen by many as a single, national organisation – but in practice this is not the case for health data.
Health data is collected by a large number of different organisations, including over 200 legally separate NHS trusts across England, autonomous research organisations (including UK Biobank), government-owned companies (such as Genomics England), disease registries (collecting information on, for example, patients with cancer, heart disease and diabetes) and national-level bodies (such as, in England, NHS Digital).
The devolved nature of the UK health system complicates this picture further, with each devolved nation maintaining separate legal and data governance structures for their health systems. They can decide their own approaches to promoting their data and responding to requests from researchers, and the risk is that – as stated above – implementation of the OLS’s second guiding principle on health data exacerbates rather than alleviates this situation.
The following anecdotal examples illustrate this fragmentation:
- The UK might have been able to take part in a real-world study in type 2 diabetes (sponsored by a global company), but the available datasets were much smaller than those accessible through US administrative claims databases, which were able to provide data on around 700,000 patients.
- The UK had the opportunity to take part in a multi-country real-world study in inflammatory bowel disease (also sponsored by a global company). However, lack of central data access across the NHS in England meant there was information on too few patients per centre for the UK to be readily included.
We recognise that important steps are being taken to address this fragmentation: for example, the UK Health Data Research Alliance was launched in February 2019 to bring together the many organisations in the UK which hold health data – including academic institutions, NHS England, NHS Scotland, NHS Wales, NHS Digital, Genomics England, Public Health England, CPRD and others  – and to bring about a consistent approach to data access for research.
However, membership of the Alliance does not require specific organisational commitments to data access, and it is not comprehensive: with 18 members as of December 2019, its membership does not include many relevant UK data custodians, including many disease registries and the majority of NHS organisations. The first priority action area is to address fragmentation.
Shared aim 1: Address the fragmentation in the UK health data landscape; create linkages to enhance scale and depth
|Proposed government action||Industry action|
Establish (or nominate) a single NHS data access organisation (DAO) which can act on behalf of the whole NHS as a single counterparty to data access agreements. For example, the
|Pay reasonable costs towards the running of the DAO as part of the data access fee.|
|Encourage membership of the UK Health Data Research Alliance, and work with HDR UK to standardise access and curation processes within the Alliance.||Provide resources for the co-production of standard commercial models which can be used by NHS organisations.|
2. Processes to access data are inefficient
As Boxes 10 and 11 make clear, the biopharmaceutical industry encounters processes to access data in the UK which can be slow, bureaucratic and unpredictable, which require multiple applications and agreements, and which are guided by data controllers who are under pressure to be risk averse.
These processes can take months and sometimes years to respond to requests from researchers for health data (see Box 10), frustrating the development both of new treatments and new technologies (see Box 11). Slow and inefficient processes also impede steps to improve the quality of the UK’s health data, since the longer the time between generating and accessing data (known as ‘latency’), the more difficult it is to make corrections while the clinical team is able to recall the case.
Where processes are more efficient elsewhere in the world, some elements of global new medicine development will be done there rather than in the UK. However, significant improvement could ensure that the UK can attract more R&D investment.
We hope that the Government’s plans to create a new ‘National Centre of Expertise’ in NHSX, to provide commercial and legal expertise to NHS organisations, and tools such as standard contracts and guidance, will help the NHS respond more efficiently to requests for data from researchers. In addition, we will continue to support efforts to improve digital and data-handling skills in the NHS as recommended in Dr Eric Topol’s 2019 review for Health Education England.
Shared aim 2: Increase the efficiency of the UK’s health data access processes
|Proposed government action||Industry action|
Task the DAO with administering quick and predictable access processes.
|Provide resources for the co-production of standard legal
agreements to help facilitate commercial access to NHS
data on fair terms, following the model of HIPAA BAAs.iii
|Establish NHS’s Centre of Expertise in NHSX as rapidly as possible, and make standard contracting for data access simple and quick.||Jointly fund training for relevant industry and NHS colleagues on data governance.|
Box 10: Examples of inefficient data access processes 56
Delays in HES linkage to clinical data for a rare disease specialist centre, requiring further amendments to Confidentiality Advisory Group (CAG) and Health Research Authority approvals, led to an 18-month delay in a project for direct care supporting better disease detection and referrals.
A global contract research organisation (CRO) reported that it had agreed and executed a data access arrangement in one EU country in eight months, but that the equivalent access arrangement in the UK was still under discussion in the UK two years after the CRO had first sought the data.
In 2018, a UK SME looking for linked genetic and clinical data to validate a suspected target association and raise funds to develop a new drug found a relevant dataset within two weeks but, after an unexplained delay of three months while the university concerned started the contracting process, had to give up working with that dataset.
A global company wanted to access national data on outcomes related to current treatment pathways to support a submission to NICE but found there was no way to access data across the country. After discussions with a number of trusts, this eventually resulted in the company conducting a single-centre audit which itself took six months to complete.
Box 11: AI and the need for access to high-quality data
There is much interest in the promise of AI to improve healthcare decision-taking and improve efficiency.
On 8 August 2019, for example, the Prime Minister announced £250 million of investment to help the NHS become a world leader in its use.
However, AI tools require access to high-quality data to learn from, and companies investing in AI therefore invest significantly in accessing and improving data – for example:
A joint report by the Medicines Discovery Catapult and the BioIndustry Association found that 75% of spending by companies in AI is actually on the upstream (often unseen) activities of data access, curation and data labelling, and not algorithm development and improvement.
IBM has also reported around 80 per cent of the time spent by scientists developing AI technologies is spent finding, cleansing and organising data – rather than in developing the algorithms which actually perform any analysis.
If the NHS’s £250 million investment in AI is appropriately allocated, therefore, at least £185 million of the investment may need to be spent on accessing and improving data.
3. The NHS is not sufficiently digitised to allow data to be linked and accessed readily
A particular issue for clinicians and researchers seeking to work with UK health data is that much of the data is unlinked. For example, one global biopharmaceutical company seeking health data to support its cancer research found that the UK’s SACT and cancer registry datasets could not help because it was not possible to follow a patient between primary and secondary care.
Although commitments have been made to a ‘paperless NHS’ in England – which would allow data on an individual patient to be linked seamlessly across care settings – the timetable for delivery has repeatedly slipped: the NHS in England was first challenged to ‘go paperless’ by 2018, and then by 2020; and the NHS Long Term Plan’s current target is 2024.
We hope that, in time, the NHS’s digital infrastructure across the UK will be sufficiently mature to allow the easy use of health data for the purposes identified in Box 11 – for example, trial recruitment, demonstrating value, and understanding and stratifying disease. In relation to the former, this will – for example – create the conditions to allow further trials of the kind exemplified by the Salford Lung Study to be located in the UK. However, in order for the UK to secure such clinical trials investment, researchers must have the confidence that the UK’s digital infrastructure will be able to identify patients who might benefit from a treatment under development in near-real time, secure their consent for participation, enroll them into a trial, and report on results to the standards that traditional clinical trials would offer.
We recognise that the UK is taking important and positive steps to help digitise and link the UK’s health datasets, and achieve this goal:
NHS England is supporting ‘global digital exemplars (GDEs)’ in secondary care and ‘local health and care record exemplars (LHCREs)’ to join together health and care records to test the most efficient approaches to achieving its aim of a ‘paperless’ NHS.
HDR UK’s ‘Sprint Exemplar Projects’ aim to test technologies and methodologies to enable the utilisation of linked datasets, and include a project led by the University of Leicester to make patient datasets safely linkable and discoverable so that a complete patient profile can be readily located.
Seven ‘Health Data Research Hubs’ were established in September 2019, and offer the prospect of creating rich disease-focused datasets that will enable new clinical trials and real-world evidence studies to be undertaken in the UK.
Whilst we recognise the progress that each of these initiatives will deliver, it is important that timetables for delivery do not slip, and that new initiatives are not created on top of older initiatives, which would further fragment the health data landscape.
While industry continues to support the digitisation of the NHS, the issues of linkage and accessibility are addressed through the first two action areas. The third priority action area for industry is specifically to harness health data to support clinical development of new medicines:
Shared aim 3: Harness UK health data specifically to support the efficient design, feasibility, recruitment and conduct of the full range of clinical trials (from Phase II through to real-world studies)
|Proposed government action||Industry action|
Support the development of timely access to health data which helps the industry rapidly explore the feasibility of conducting clinical trials in the UK.
|Invest in more commercial clinical trials, and a greater share of patients in multi-centre trials in the UK.|
|Develop processes to transition rapidly from feasibility to recruitment of patients.||Work with the NHS, NIHR and Health Data Research Hubs to
enable and co-fund near-real-time recruitment processes in
key therapeutic areas aligned with the NHS Long Term Plan.
|Develop processes to encourage patients to be given the opportunity to participate in clinical trials using their routinely
collected health data.
4. The quality and accessibility of the UK’s health data resources is opaque
Researchers engaged in biopharmaceutical R&D frequently struggle to understand the quality and accessibility of the UK’s health data, encountering particular challenges in answering the following common questions:
- Consent: does the patient consent cover the intended use of the health data?
- Permissions: am I legally able to use the dataset in the way I intend?
- Cost/terms: what are the costs or terms of accessing the dataset?
- Time: how long will it take me to get permission and then practically access the dataset?
As a result, researchers look elsewhere in the world for health data to meet their research needs, rather than work in a UK health data environment where the discovery of relevant health datasets and information about their quality is a challenge (see Box 12).
We are nonetheless encouraged by a number of initiatives:
- In Scotland, the electronic Data Research and Innovation Service (eDRIS) offers an effective, single point of contact to assist researchers with their data access questions.
- The Health Data Research Innovation Gateway has been established in January 2020, with the aim of allowing data from the Health Data Research Alliance members to be discovered quickly.
- The new National Centre of Expertise being established in NHSX is due to set clear and robust standards on transparency and reporting.
Therefore, the fourth priority action area is to make it easy for researchers to find out what datasets are available, and what the quality and utility of each dataset is.
Shared aim 4: Enhance the transparency of the quality and accessibility of the UK’s health data resources
|Proposed government action||Industry action|
Monitor and report on data applications, access and turnaround times (TATs).
|Capture and collate user experience and TAT from
biopharmaceutical companies, through the ABPI.
|Ensure the Health Data Research Innovation Gateway’s metadata provides information on the quality and accessibility of the data in available datasets (for example through a directory).||Fund and deliver consent codification for priority datasets.|
Box 12: Examples of the challenges in discovering UK health data 69
health data was incorrectly informed that comprehensive, national health data was available. However, after six months it became clear that the data available was incomplete and low quality, particularly regarding prescription data. The delay in accessing the data meant that it was nearly impossible to have the quality issues addressed.
In 2019, a global CRO requested data on the number of specific patients attending UK hospitals so that the UK could be included as a potential site for a global clinical trial. However, the data took so long to arrive that the UK was not included as a possible location.
5. Regulatory and reimbursement processes do not recognise the potential that developments in health data can offer
Many of the challenges that the biopharmaceutical industry encounters in securing approval of new medicines for use in healthcare systems – including the NHS – relate to the methods used by regulatory and reimbursement authorities to assess them. For example:
- The clinical endpoints required by regulatory authorities are naturally based upon historical experience in clinical trials of medicines with established measures of efficacy.
As innovation proceeds, new mechanisms with new measures and endpoints are discovered and novel designs for clinical trials are developed to assess these. The adoption of these new approaches in the regulatory process needs to be accelerated through early dialogue.
- NICE’s methods of evaluation rely mainly on the use of research findings in academic journals (or syntheses of these research findings), despite the growing possibility of using broader types of data – including data captured from the NHS during routine clinical care – to inform its assessments. This gap is itself recognised by NICE, which launched a consultation on making use of broader types of data in June 2019.
- Current, international approaches to the regulation of clinical trials are based on standards agreed by the International Council for Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) in 1995, before health data healthcare institutions became digitised.
These characteristics pose particular challenges for the new categories of treatments which the industry is researching and developing. For example:
- The efficacy of a variety of new treatments for cancer may need to be tested on multiple tumour sites, leading to a need for novel trial designs (for example ‘basket’ or ‘umbrella’ trials).
- The need to address smaller patient populations due to biomarker-led stratification, personalised treatments or rare diseases also requires novel designs of clinical trials.
Clear demonstration of the long-term value of emerging curative technologies such as cell and gene therapies is challenging and may require new approaches including long-term follow-up.
We therefore hope to see that the advances being made in biopharmaceutical R&D and healthcare systems’ digital capabilities are accompanied by advances in the approaches taken by regulators and reimbursement authorities.
This will enable the use of data captured digitally, allowing the use of real-world evidence to support the approval and adoption of new treatments. As the Life Sciences Industrial Strategy made clear, this will improve the speed and efficiency of regulatory studies, increase the cost effectiveness of trials, and reduce the cost of developing medicines.
The UK share of global biopharmaceutical R&D spending is, at 7%, relatively small  – but the UK has an excellent reputation for regulation and health economic assessment. The UK could lead the way in the development of a regulatory and reimbursement environment which harnesses the potential improvements that health data can deliver – as the MHRA and NICE are already considering.
The rapid evolution of clinical trial designs and endpoints, coupled with the potential for health data to be accessed and analysed in near-real time, leads to the suggestion that regulatory and health technology assessment authorities will in future have a wider variety of data to consider – some of it eventually enabling more precise estimation of both the clinical and economic value of an intervention.
At the same time, novel treatments that promise to provide long-term remission or even cure after a single (or only a few) administrations will need specific programmes of long-term data collection, to demonstrate or confirm their value and perhaps to underpin new approaches to reimbursement (for example outcomes-based payments).
Therefore, industry’s fifth priority area is to work with the authorities to understand new types of health data, how they can be factored into evaluations, and how standards can be developed.
Shared aim 5: Broaden the data considered to help demonstrate value
|Proposed government action||Industry action|
Task the MHRA, NICE and the NHS with developing clear guidance on the increasing variety of data they use to support their regulatory, value assessment and reimbursement processes.
|Work with the MHRA, NICE and the NHS to embed
new UK data standards for approval, valuation and
|Work towards developing standards for different types and sizes of patient population, and a wider variety of outcome measures.||Support the dialogue about new approaches to payment models and the generation of data that can underpin them.|
|Lead the dialogue about new approaches to payment contracts and the data required to support them.|