What is the UK Biobank, and why is it so important?
In late July of this year, the science community came alive with the news that the UK Biobank had opened access to a huge database of genetic information. The Biobank, a long-running study that combined genetic analysis with physical and health-related traits in over 500,000 British residents, is one of the largest genetic databases ever—and it could lead to revolutionary discoveries in fields related to human health.
Why are studies like UK Biobank so important? They enable researchers to identify genetic features that are associated with physical traits (a disease, body weight, or hair color, for example), which can lead to better and more accurate insights based on an individual’s DNA. And in the case of the Biobank, this groundbreaking dataset is open to any bona fide researcher—a very rare level of accessibility for studies of this size—which makes it possible for researchers around the world to aid in the search for these new insights.
The UK Biobank started in 2006 and explored over 800,000 locations in the human genome in males and females between the ages of 40 and 69. Participants were enrolled in the study until 2010, at which point researchers began to aggregate the collected data. While study enrollment is no longer open, researchers continue to follow up with participants to obtain long-term health information and may continue to collect genetic data as technology improves. Participants have provided a range of study material including blood, saliva, and urine samples; some even agreed to body scans so that researchers could monitor heart, kidney, and brain function.
Researchers hope to gain insights on how genetics can influence our lives with a wide array of data points, ranging from ice cream intake to more serious health concerns like liver disease and cancer. Such information allows correlations to be drawn between a physical trait and a genetic variant.
To ensure that this information remains anonymous and that participants’ privacy is maintained, the project enlisted multiple independent ethics councils to review the research practices and establish guidelines for data management. The project details ethical guidelines that include statements on “vigorously opposing law enforcement requests” for patient information and ensuring that all participants are involved of their own volition.
There are several similar databases that have been released or are currently recruiting participants, but very few have included such a large number of people and a broad range of physical traits. For instance, the Framingham Study is a long-running biobank that focuses on the link between heart disease and genetics. Although extremely informative, the tight focus of the study hinders researchers’ ability to leverage the data to explore other conditions. Similar large-scale studies are currently underway in the US, including the All of Us project which aims to collect a broad set of health and genetic data for more than 1 million people. Other databases like the Chinese Kadoorie Biobank focus on different ancestral populations, so it will be interesting to see how these large data sets complement each other and advance the field of population genomics and human health.
What’s so exciting about a larger number of samples? Basically, looking at more people means we can find more subtle connections between a person’s genes and different traits or diseases. Some things, like eye and hair color, are controlled by a relatively small number of genes with big impact. Others, like height, are determined by hundreds (or maybe thousands!) of genes, each contributing a small amount. Based on comparisons of identical twins with non-identical siblings, height is estimated to be 80% determined by DNA. However, we’ve only uncovered about a quarter of the specific spots in the genome that contribute to that 80%. We need very large studies—like the UK Biobank—to be able to identify the remaining pieces. Scientifically, it’s very exciting to find even a little bit more than what has been known before.
As with all science, there are limitations to the data. One limitation is self-selection among study participants. That is to say, healthier people may have volunteered more than unhealthy people, leading to a lower frequency of diseases within the participant cohort than is present in the general British population. Another limitation is the ancestral makeup of the study participants which is overwhelmingly white (94%) and from northern Europe. This means that findings from this data set may not be applicable to those who do not share that European ancestry. Additionally, some of the information collected in the study relies on self reporting, such as diet and previous diagnosis of mental health. Self-reported data, unlike standardized physical measurements, can be less accurate because different people might interpret the question differently, or might misremember things. The study’s designers created their surveys with this in mind to limit ambiguity, but there are still biases and errors in self-reported data that are difficult to correct for.
Still, the UK Biobank data release is a truly exciting occasion in the scientific community that could have a wide impact on healthcare and our understanding of genetics. Prior to July’s release, it had made available data pertaining to 150,000 participants which has been used in more than 100 research articles to date. These articles contribute to our understanding of genetics in many areas like sleep, heart disease, obesity, liver disease, and more. The recent release of the full 500,000 participant database will likely serve as the basis for many, many more papers.
It’s not just scientists who will benefit from the UK Biobank, though. As new products come to the Helix Store over time, it’s likely that some of the new insights available to you will be able to trace their discovery to this database. Nearly 15 years after its first sequencing, the human genome is still full of amazing secrets—and studies like these hope to unlock them, one participant at a time.