By Olivia Marx
Did you know that there are more genetic differences between populations of Africa than there are between Africans and Eurasians?1 Despite the worldwide diversity of genetic sequences, most of the thousands of genomes that have been recorded have been from people of European background. Even with the decades of time and effort that scientists have devoted to developing a comprehensive reference genome, interpreting genomic data remains a major challenge, in part because of a glaring disparity in the amount of data from people of non-European descent. Although genetics has the power to answer age old human questions, a major limitation to genetic studies is a lack of information from a diverse range of populations. This gap in knowledge leaves researchers with only a partial picture of human genetics.
Though researchers are working to gather more genomic data, the process costs over $1,000 per genome for whole genome sequencing.7 Because all humans share about 99.9% of the same DNA, researchers often focus on the parts of the genome that contain single nucleotide polymorphisms (SNPs), which can be sequenced for around $50. SNPs are small differences that may not necessarily affect an individual physically, but can still be passed down to offspring in a population.2 When genes are passed down from parent to child, an event called crossing over occurs in which regions of each parent chromosome combine to create a single chromosome. Because of this crossing over, genes and genetic polymorphisms that are close together tend to get passed down together, while genes that are farther apart tend to separate randomly. Sections of the chromosome that tend to get passed down together are called haplotype blocks. Analyzing a polymorphism within a haplotype block is a convenient way to statistically determine the likelihood that an individual with a certain polymorphism might have a trait. Computational tools are constantly advancing to accommodate the huge amount human genetic data, and a common method is to focus on genetic markers or polymorphisms, which tend to be specific to ethnic groups3. In fact, companies like 23AndMe and Ancestry.com use SNPs to determine an individual’s ethnic makeup and to also predict physical characteristics and disease risk.4
Haplotype blocks are a direct result of ancestry, meaning that people with different ethnic backgrounds will have very different haplotype blocks with differing genetic markers. Since the majority of available data is from individuals with European backgrounds, researchers mainly study European genomic data to get the most accurate results. Thus, most genomic research and the most significant results are limited to European studies.
Why is it that most of our human genetics data comes from less than 20% of the world’s population?5 Some reasons include the cost of genetic testing and social barriers like a minority population’s knowledge and trust in genetic testing. Their mistrust is not unfounded, as genetic testing does have the potential to bring to light predispositions that could lead to insurance and job discrimination. Furthermore, studies show that doctors who tend to serve minorities are less likely to recommend a genetic test.6 The disparity in genetic testing is a problem that has implications beyond the accuracy of a 23AndMe ancestry composition. For example, clinical treatments are gearing up to become more personalized, so that a genetic test can tell a person’s disease susceptibility and even predict the best drug to provide them. In depth genetic analyses are only possible when a strong reference is available with genetic data from a group of similar background.6 The disparity in data availability puts a huge limitation on scientific development, leaving nearly 80% of the world underserved.
The first human genome was sequenced nearly 20 years ago, but it is clear that we still have a long way to go in fully understanding genetics. Methods using machine learning are constantly being developed to improve genetic analysis so that perhaps one day, diversity will not be a key limitation in the field. Currently, the main way to improve genetic studies is to obtain more data on different populations. To improve the systemic issue of disparities in genetic testing, we can encourage education in genetic testing in underrepresented communities and include genetic testing in basic health insurance plans. Only once we have scientific references that accurately reflect the world can we have the most complete picture of the human condition and use science to improve the lives of everyone.