DNA Day 2023

DNA Day 2023: A Call for Diverse Genomic Datasets – Time for Canada to Step Up

By: Naveed Aziz, CEO, CGEn

Diverse, large-scale health genomic resources are key to realizing the true potential of research, and to fully exploit the power of new health technologies based on artificial intelligence (AI), to improve health for all. The data these resources hold is used to identify and study genetic variations associated with disease, which in turn can enable clinicians to provide personalized care to patients based on their genetic makeup, known as Personalized Medicine.

However, the lack of diversity in population genomic data has been a systemic challenge; the research community must work collaboratively with underserved communities to integrate inclusion and diversity in all aspects of study designs to advance health equity.

Until recently, genomic datasets have been largely limited to individuals of European descent, leaving out other ancestries and populations. This lack of diversity can have serious implications for downstream health research and care. For example, treatments and therapies developed for one population may not be as effective across all populations, while groups underrepresented in the data are potentially missing out on tailored care. All of this compounds established and existing socially determined health care disparities.

In Canada, researchers currently lack access to a large-scale human genomic resource that is fully representative of our population. While Canadian investigators can access data from the UK Biobank, 94% of its participants self-identify as White British or other White background. The United States’ AllofUs initiative data is more diverse, but it is currently only available to US-based researchers.

Canada is a country of immense diversity, composed of people from a variety of backgrounds, including Indigenous peoples, immigrants, and refugees. In fact, the 2021 Census reported more than 450 ethnic and cultural origins, 200 places of birth, 100 religions and 450 languages. The ability to research Canadian population-level data would provide a unique opportunity to study the effects of variations in genetic backgrounds on research outcomes and provide insights into the disparities arising from diverse cultural and social contexts.

Recently, CGEn’s HostSeq Initiative, funded through Genome Canada’s CanCOGeN network, built a national databank that includes the genomes of over 10,000 Canadian residents impacted by COVID-19 along with in-depth clinical data. HostSeq demonstrated that Canada is capable of generating human genomic data at-scale and produced a blueprint for genomic health data sharing, analysis and access. 

Through community partnerships and inclusion of diverse groups within study teams, Canadian researchers have begun to make in-roads with previously underrepresented populations to engage in genomics and other research. These partnerships must be built upon when considering a Canadian genomic data generation project at a population scale. The AllofUs initiative, as an example, has prioritized diversity and inclusion with dedicated participant engagement teams working within communities to enrol participants from previously underrepresented groups.

We now have the tools to organize and mine genomic data at scale. AI-based tools are being developed and used across sectors – including health care. In the context of large-scale genomic and health data, AI can help maximize its use and impact by providing insights that would otherwise be difficult to uncover. For example, predictive models can be developed for the identification of individuals at risk for certain diseases or conditions, or to inform personalized treatments based on an individual’s genetic profile, including identifying new therapeutic targets and drug candidates.

Canada must ensure it has access to the necessary data to support AI research and maintain its competitive edge in this area, as the accuracy of AI models is dependent on the data used to train them and in order to produce AI-based genomics tools that will be effective across a population, large-scale, high-quality and diverse data resources are needed. At the same time, even the best AI models can be difficult to interpret and explain, and further research into the ethical and legal implications of using AI in genomics research is greatly needed to ensure our health care systems are equipped to properly and equitably implement them.

The future of genomics in personalized medicine is bright and full of possibilities for Canada. Genomic testing has already revolutionized the diagnosis and treatment of many diseases, and it will become even more valuable in health care, if supported by research based on large-scale diverse data and other technology developments. The resulting more precise, targeted, and effective treatments provided to patients, will lead to better outcomes and health care system efficiencies and savings.

In the last 5 years, the Federal government has invested in CGEn, building and supporting the large-scale infrastructure required to produce health data at scale. However, Canada is lagging behind other countries in capturing its genomic diversity due to a lack of funding and resources dedicated to the collection and analysis of genomic health data at a population-scale. We must build a foundation of diverse and equitable datasets and research that is inclusive and representative of the diversity within the Canadian population. I urge all of us to take action now to prioritize the ethical, legal, and social implications of genomics in health care and to work to ensure equitable access to personalized medicine for all patients.

Finally, we must collaborate across different fields and sectors to bring together the expertise and resources needed to advance personalized medicine in Canada. This includes healthcare providers, researchers, policymakers, industry leaders, patient advocates, representatives from equity-deserving groups and other stakeholders. By working together, we can unlock the full potential of genomics to improve health care and provide better outcomes for patients. On DNA Day, let us take this call to action to help create a future where personalized medicine is expected, and every Canadian receives the best possible care based on their unique genetic makeup.

DNA Day: CGEn Celebrates 10,000 Human Genomes in the Canadian HostSeq Databank

DNA Day: CGEn Celebrates 10,000 Human Genomes in the Canadian HostSeq Databank

April 25, 2022 – CGEn, Canada’s national platform for genome sequencing and analysis, and its partners are marking World DNA Day by celebrating Canada’s landmark human genome sequencing initiative called HostSeq. Led by CGEn, the HostSeq databank will include the whole genomes (the full set of DNA) of 10,000 Canadians (“Hosts”) with the aim to help understand the genomic architecture of the host response to COVID-19. The HostSeq databank also includes linked standardized clinical information collected at multiple clinical sites across Canada over the past two years of the pandemic.

“The need for a national genomics databank grew very quickly early on in the pandemic. As soon as we started to observe differences of disease manifestation and symptoms in people infected with SARS-CoV-2 virus, we realized that there was an urgent need to collect and analyze population-wide host genetic data,” explains Dr. Naveed Aziz, Chair of the HostSeq Implementation Committee and Chief Executive Officer, CGEn. “The Government of Canada’s prior investment in CGEn allowed us to deploy our capacity to sequence genetic information of Canadians who were experiencing vastly different health outcomes in response to infection by the virus.”

The national HostSeq initiative, funded by the Federal government through Genome Canada’s Canadian COVID-19 Genomics Network (CanCOGeN), is accessible to Canadian and international researchers to help study the health outcomes of those affected by the virus and to identify the genetic contributors of disease severity. This first-of-its-kind Canadian genomics databank may help to better diagnose and treat those predisposed to severe illness, backed by their genomic data.

Owing to a concerted effort led by CGEn and its team, 14 clinical and research studies across the country were brought together to contribute to the databank that includes 10,000 study participants who have consented to the use of their genomic and clinical data for future research. The HostSeq databank will not only help to answer current research questions related to the virus, but will also serve as a resource when responding to future infections and preparing for future pandemics.

“We are thrilled to have a population-based whole genome sequencing effort in Canada,” says Dr. Lisa Strug, Director of the University of Toronto’s Data Sciences Institute and the Ontario regional Centre of the Canadian Statistical Sciences Institute; Senior Scientist, Genetics and Genome Biology, and Associate Director, The Centre for Applied Genomics, The Hospital for Sick Children (SickKids). “Many countries have whole genome sequencing cohorts that they have made available and have had an important impact in health research broadly. The HostSeq databank will provide the Canadian perspective, and reflect data from individuals living in Canada,” says Strug, who is part of the HostSeq Implementation Committee and Chair of the Genetic Epidemiology Committee.

The COVID-19 Host Genome Sequencing project (HostSeq) is Canada’s largest national genomic databank to date containing genome sequences linked to medical and clinical data from 10,000 individuals affected by COVID-19

The HostSeq databank gives Canadian scientists access to a rich dataset to help analyze and identify genetic determinants of susceptibility, severity and outcomes of COVID-19. It will also enable researchers to investigate population-level risks for many other diseases, including the potential to deliver new biomarkers that support prediction of risk, and novel therapeutic strategies.

This DNA Day, there are countless scientific advancements to celebrate, and Canadians can count the HostSeq databank as a major step in how important population-level data can be collected and shared to help advance research and health care in Canada.

“The delivery of the HostSeq project would not have been possible without the effort and commitment of its Implementation Committee, whose tireless efforts since the beginning of the project has allowed CGEn to play a key role in Canada’s response to the COVID-19 pandemic. The HostSeq project has proven that large-scale genomics is a present reality in Canada. The time is now for all partners and stakeholders to come together and build on the years of investment and effort towards a Canadian ecosystem where research is enabled by large datasets,” says Aziz.


Linked Resources

HostSeq: A Canadian Whole Genome Sequencing and Clinical Data Resource by Dr. Lisa Strug et. al. ­

Read about the importance of Big Data in enabling data-driven Canadian research and discovery in Dr. Naveed Aziz’s latest blog post.

To apply for HostSeq data access, please visit the CGEn website.

About CGEn

CGEn is a federally funded national platform for genome sequencing and analysis. Established in 2014, CGEn employs over 200 staff, and is funded primarily by the Canada Foundation for Innovation (CFI) through its Major Science Initiatives Fund (MSI), leveraging investments from Genome Canada and other stakeholders. CGEn operates as an integrated national platform with nodes in Toronto (The Centre for Applied Genomics at The Hospital for Sick Children), Montréal, (McGill Genome Centre at McGill University) and Vancouver (Canada’s Michael Smith Genome Sciences Centre), providing genomic services, including genome sequencing and analysis, that enable research in agriculture, forestry, fishery, the environment, health sciences, and many other disciplines of interest to Canadians.