Blue Whale genomics

Whale Tales: Blue Whale Genomes, Resilience to Genetic Bottleneck, and Interbreeding Insights

CGEn sat down with Mark Engstrom, Associate Professor at the University of Toronto and Deputy Director, Collections & Research at the Royal Ontario Museum (ROM), and Oliver Haddrath, Research Technician from the ROM to go behind the science with a special focus on the genomics work in their recent research paper that demystifies the blue whale genomes. This research garnered significant interest, including coverage in a New York Times article.

Image: North Atlantic blue whale, source:

Blue whales, the largest animals, continue to captivate the scientific community with their enigmatic behaviours and ecological importance. Despite facing a century of industrial whaling that decimated their populations, new genomic research is painting a surprisingly resilient picture of these majestic creatures.

“One of the challenges early on was obtaining high-quality DNA from decomposed whale tissues”- Oliver Haddrath

The genesis of this pioneering study traces back to a poignant event in 2014 when nine blue whales perished after being trapped by ice near Newfoundland, Canada. Engstrom’s team was focused on whale conservation in Canada, particularly on the East Coast, and sought to model new data. They aimed to map the blue whale genome, which lacked an existing reference. Engstrom’s team used samples from the stranded whales and 26 others to conduct genomic analysis of the North Atlantic population.  “One of the challenges early on was obtaining high-quality DNA from decomposed whale tissues. We finally devised strategies such as focusing on submerged parts of the remains for high-quality tissue.” Haddrath says. 

Working with long-standing collaborators at CGEn-Toronto (The Centre for Applied Genomics (TCAG) at The Hospital for Sick Children) the samples were sequenced using a mix of long- and short-read technologies, partially supported through Canseq150, a CGEn and partners-driven initiative to sequence 150 new genomes for Canada’s 150th birthday.

Engstrom, Haddrath, and their team were faced with the challenging task of assembling the genome of the blue whale. Supported by TCAG they chose a de novo genome assembly approach. This method allowed them to build the genome without relying on a ‘template genome’ from a closely related species, minimizing the risk of missing unique blue whale genome features. Long-read sequencing, though less accurate and requiring more specialized infrastructure (PacBio Sequel), was essential for producing large sequence fragments that could be pieced together more easily. Complementary short-read sequencing produced more precise sequences to refine the constructed genome. To help researchers in the community, the authors made the blue whale genome publicly accessible.

The building of the blue whale genome might have been the end of a research project in itself but for the authors, it was the starting point towards answering their question. The collaborative effort yielded unprecedented findings, challenging conventional wisdom and shedding light on hidden aspects of blue whale demographics. Despite the severe impacts of historical whaling, genomic analysis found no signs of a genetic bottleneck, indicating the resilience of blue whales, possibly due to their slow reproductive rate, which offers hope for future survival. The study also revealed unexpected patterns of gene flow, contradicting traditional notions of east-west population divisions, with genetic evidence showing extensive intermingling in one direction—west to east—driven by ocean currents carrying whales and their prey, krill, across vast distances. Additionally, the research uncovered evidence of interspecies mating between blue whales and fin whales, tracing back approximately 3.5 percent of the blue whale genome to fin whales, challenging previous beliefs about hybrid sterility and raising questions about the functional significance of these genes in blue whale physiology. 

You can’t conserve what you don’t know. This is why this information is so important.”- Mark Engstorm

The implications of these findings extend beyond scientific curiosity to inform conservation strategies crucial for the preservation of the blue whale population, which is currently estimated to be below 3,500.” You can’t conserve what you don’t know. This is why this information is so important.” explains Engstorm. Understanding whether there are distinct units or one metapopulation informs plans for conservation efforts across North American and European waters. The movement and intermixing of blue whales across the Atlantic provides a glimmer of hope for population recovery. However, the authors caution that while the (current) absence of a genetic bottleneck is promising, ongoing population decline could eventually lead to its emergence.

By linking natural history with genomic analysis, this work shows the value of collaboration and interdisciplinary approaches in unravelling nature’s puzzles and guiding conservation efforts in an ever-changing world. Engstorm, Haddrath, and their collaborators remain committed to pushing the boundaries of genomic research and leveraging their findings to advocate for the protection of blue whales. They hope to carry forward their research to study other populations of blue whales and look in more depth at the other player in this scenario- the fin whales.

A ‘behind-the research’ conversation with René L. Warren on unravelling the Genetic Clues to COVID-19 Severity

A ‘behind-the research’ conversation with René L. Warren on unravelling the genetic clues to COVID-19 severity

By: Shantala Hari Dass


As the world grappled with the complexities of the COVID-19 pandemic, scientists raced to comprehend the factors influencing the severity of the disease. Intrigued by the spectrum of COVID-19 severity, researchers from Dr. Inanc Birol’s group at Vancouver’s Michael Smith Genome Sciences Centre, led by Bioinformatics Team Leader René L. Warren, embarked on a quest to understand the spectrum of COVID-19 severity, including possible links to immune genes.

The Human Leukocyte Antigens (HLAs) are such genes and, for the central role they play in human immunity, they are disease determinants. Because versions of HLA genes (called HLA alleles) are found at different frequencies within human populations, they could be a molecular clue to explain why certain groups might be more susceptible to infection and exhibit more or less severe symptoms. HLAs can be likened to servers presenting food in a restaurant. When the body encounters a pathogen, it breaks it down into protein fragments, presented by HLA ‘servers’ to immune cells, triggering an immune response. However, these ‘servers’ are selective in what they present. Several HLA variants, such as HLA-C*04:01, bind to viral degradation fragments from the SARS family, which includes the coronavirus responsible for COVID-19. In our analogy of ‘server’ genes, HLA-C*04:01, contrary to most HLA variants, can be seen as serving only a restricted number of viral fragments to the host immune system, potentially limiting the immune system’s ability to combat the COVID-19 virus.

At the start of the COVID-19 pandemic, multiple research groups began probing for possible associations between HLAs and COVID-19 disease outcomes. Such studies had to march in real-time with efforts to collect data and build datasets. Warren aimed to investigate whether our HLA makeup influences susceptibility to and severity of COVID-19, and they had the exact tool to do it. 

“We had developed a bioinformatics tool called HLAminer a decade ago, aimed at determining HLA types from next-generation sequencing datasets. We saw potential in applying this tool to explore this research question,” Warren explains 

The research team began analyzing existing data, starting with whole genome sequences from eight COVID-19 patients at the Wuhan seafood market, leading to the first publication of this research in 2020. They expanded their analyses to include a genomic dataset with meta/clinical variables from COVID-19 patients in New York. While these initial studies revealed an intriguing (but weak) association between an HLA allele-C*04:01- and severe COVID-19, they were limited by the small sample sizes of 100 patients.Since the initial report of an association between HLA-C*04:01 alleles and COVID-19 severity, many other research teams have confirmed these findings in patient cohorts worldwide, including in Europe, India, Armenia, Spain, and the United Arab Emirates. However, these studies were also limited by small (in the range of 100s) sample sizes.

Since the initial report of an association between HLA-C*04:01 alleles and COVID-19 severity, many other research teams have confirmed these findings in patient cohorts worldwide, including in Europe, India, Armenia, Spain, and the United Arab Emirates. However, these studies were also limited by small (in the range of 100s) sample sizes.

A pivotal step for Warren was analyzing the HostSeq dataset. HostSeq is a first-of-its-kind Canadian genomic and clinical databank driven by CGEn and COVID-19 studies from across Canada, that is broadly consented for health research. The databank contains genomes, and health metadata for 10,000+ participants affected by COVID-19. “Working on the HostSeq dataset was our plan from the start,” he says. This collaboration enabled the team to leverage a larger dataset, essential for robust conclusions.

Their findings were confirmatory – certain HLA alleles, such as C*04:01 and A*11:01, emerged as potential markers for severe COVID-19 cases (measured as the requirement for mechanical ventilation). However, the authors remain cautious, merely having these genes does not necessarily mean one will develop severe COVID-19, just that the probability is higher

What might the clinical implications of this study be? Characterizing the HLA alleles of COVID-19 patients could serve as a valuable prognostic tool. While COVID-19 is no longer a pandemic, these findings could help in guiding public health decisions for COVID-19 — such as vaccine dissemination strategies and disease management.

Using existing publicly available datasets, Warren and the research team were able to extend their expertise to address a topical and timely challenge. Looking ahead, the researchers hope to validate their findings on an even larger scale and with diverse datasets. 


  1. The research story captured here was published in Establishing association between HLA-C*04:01 and severe COVID-19, HLA. 2024 Jan;103(1) 
  2. The HostSeq databank (genomic data and health metadata) is accessible to the Canadian and international research community. The data has been consented for use in any approved health research project. More details about the phenotype variables can be found on the data portal. Currently, over 30 research projects have been approved to use this data.If you would like to learn more about HostSeq and the data access process please click here

Unleashing the Potential: The Transformative Role of Artificial Intelligence in Precision Medicine

Unleashing the Potential: The Transformative Role of Artificial Intelligence in Precision Medicine

By: Shantala Hari Dass

 From left to right: Dr. Naveed Aziz, Fanny Sie, Dr. Devin Singh, Dr. Tracie Risling


The landscape of health care is on the cusp of transformation, and Artificial Intelligence (AI) holds the key to revolutionizing patient outcomes, refining clinical decision-making, and streamlining costs. However, this transformative potential is accompanied by a set of challenges and ethical considerations that demand careful consideration.

CGEn curated an insightful panel discussion on November 13, 2023, at the Canadian Science Policy Centre’s annual conference in Ottawa, titled ‘The Role of Artificial Intelligence in Delivering Precision Medicine of the Future’, moderated by Dr. Naveed Aziz, CEO at CGEn. Renowned panelists—Dr. Devin Singh, Emergency Medicine Physician and Clinical Lead in Artificial Intelligence and Machine Learning at SickKids, Dr. Tracie Risling, Associate Professor in the Faculty of Nursing at The University of Calgary, and Vice-President of the Canadian Nurses Association, and Fanny Sie, Head of AI and Emerging Technology External Collaborations at Roche Global Integrated Informatics—shared profound insights into the opportunities and challenges presented by the integration of AI into precision medicine, with a focus on ethical, regulatory, and societal implications. This report encapsulates key take-aways and poignant quotes from the discussion.

Three Key Takeaways:

1. Innovation in Policy for AI in Precision Medicine:

There is an urgency to innovate policies governing the implementation of AI solutions in precision medicine. This sets the tone for the broader discussion on adapting regulatory frameworks to unlock potential benefits. A unified approach would contribute to mitigating potential risks and fostering a secure pipeline for AI industry development.

“It would be nice for the Federal Government to come up with a broad implementation strategy that can be adjusted and adopted in Provinces and Territories across the country.”– Fanny Sie

“There is an opportunity for Canada to learn from policy models in the US and EU to mitigate potential harms and build a pipeline for the industry that is deeply aligned with citizen privacy.”- Devin Singh

2. Collaborating to Build AI Tools for Precision Medicine:

How well-informed is the general public in Canada about AI and precision medicine? This question is pivotal, considering the collaborative efforts required between patient partners and practitioners in co-designing solutions.  These AI tools should be built in collaboration and cooperation with patients and healthcare professionals such as doctors and nurses. In this process, we must caution against making assumptions based on social determinants of health, stressing that individuals facing challenges like food insecurity or lack of housing can still be valuable contributors to the development and implementation of emerging technologies.

The gaps extend to the workforce as well with the massive gaps in education around ethical AI use. Concerns were raised about creating a workforce ill-equipped to implement AI solutions effectively and equitably, stressing the need for comprehensive education.

“We need a better handoff between academic research outputs and industry to get Canadian-made technology/AI solutions into the Canadian healthcare system”- Fanny Sie

“For an effective implementation of AI in health care, you need practitioners with IA – intelligence amplification. AI won’t replace practitioners, but it will be on the shift with them,” – Tracie Risling

3. Diversity and Sustainability in AI Implementation:

AI is pushing the diversity question to the forefront, necessitating attention in ways that previous technologies did not. However, the lack of reflective data from these diverse communities has hindered the capitalization of this potential. 

Canada has some of the most diverse communities in the world whose expertise and data can be used to build equitable AI tools“- Devin Singh

“If you want scale and sustainability, you need long-term engagement of patients and practitioners, and trust.” – Tracie Risling


The CGEn panel has illuminated the intricate landscape of AI’s role in the future of precision medicine. The discourse navigated through the urgent need for policy innovation, the imperative of closing educational gaps, and the importance of data reciprocity to empower patients. The panelists stressed that diversity is not just a checkbox but a catalyst for developing equitable AI tools. Furthermore, multidisciplinary collaboration and trust emerged as critical elements for scaling AI initiatives in healthcare. As we forge ahead, learning from global policy models and establishing a broad federal implementation strategy could position Canada at the forefront of AI-driven precision medicine. Summarising the need for such discussions, Naveed Aziz highlights,” In the realm of precision medicine, the transformative alliance of AI beckons for a thoughtful discourse—a vital conversation that navigates the potential benefits, ethical considerations, and necessary policies to ensure responsible and impactful integration for the betterment of health care. These cannot be one-off events but instead, ongoing dialogues that inform policy, practice, and implementation at all levels.” The panel discussion at the Canadian Science Policy Centre’s annual conference has not only fostered an essential dialogue but also charted a course for the future of healthcare outcomes for all Canadians.

DNA Day 2023

DNA Day 2023: A Call for Diverse Genomic Datasets – Time for Canada to Step Up

By: Naveed Aziz, CEO, CGEn

Diverse, large-scale health genomic resources are key to realizing the true potential of research, and to fully exploit the power of new health technologies based on artificial intelligence (AI), to improve health for all. The data these resources hold is used to identify and study genetic variations associated with disease, which in turn can enable clinicians to provide personalized care to patients based on their genetic makeup, known as Personalized Medicine.

However, the lack of diversity in population genomic data has been a systemic challenge; the research community must work collaboratively with underserved communities to integrate inclusion and diversity in all aspects of study designs to advance health equity.

Until recently, genomic datasets have been largely limited to individuals of European descent, leaving out other ancestries and populations. This lack of diversity can have serious implications for downstream health research and care. For example, treatments and therapies developed for one population may not be as effective across all populations, while groups underrepresented in the data are potentially missing out on tailored care. All of this compounds established and existing socially determined health care disparities.

In Canada, researchers currently lack access to a large-scale human genomic resource that is fully representative of our population. While Canadian investigators can access data from the UK Biobank, 94% of its participants self-identify as White British or other White background. The United States’ AllofUs initiative data is more diverse, but it is currently only available to US-based researchers.

Canada is a country of immense diversity, composed of people from a variety of backgrounds, including Indigenous peoples, immigrants, and refugees. In fact, the 2021 Census reported more than 450 ethnic and cultural origins, 200 places of birth, 100 religions and 450 languages. The ability to research Canadian population-level data would provide a unique opportunity to study the effects of variations in genetic backgrounds on research outcomes and provide insights into the disparities arising from diverse cultural and social contexts.

Recently, CGEn’s HostSeq Initiative, funded through Genome Canada’s CanCOGeN network, built a national databank that includes the genomes of over 10,000 Canadian residents impacted by COVID-19 along with in-depth clinical data. HostSeq demonstrated that Canada is capable of generating human genomic data at-scale and produced a blueprint for genomic health data sharing, analysis and access. 

Through community partnerships and inclusion of diverse groups within study teams, Canadian researchers have begun to make in-roads with previously underrepresented populations to engage in genomics and other research. These partnerships must be built upon when considering a Canadian genomic data generation project at a population scale. The AllofUs initiative, as an example, has prioritized diversity and inclusion with dedicated participant engagement teams working within communities to enrol participants from previously underrepresented groups.

We now have the tools to organize and mine genomic data at scale. AI-based tools are being developed and used across sectors – including health care. In the context of large-scale genomic and health data, AI can help maximize its use and impact by providing insights that would otherwise be difficult to uncover. For example, predictive models can be developed for the identification of individuals at risk for certain diseases or conditions, or to inform personalized treatments based on an individual’s genetic profile, including identifying new therapeutic targets and drug candidates.

Canada must ensure it has access to the necessary data to support AI research and maintain its competitive edge in this area, as the accuracy of AI models is dependent on the data used to train them and in order to produce AI-based genomics tools that will be effective across a population, large-scale, high-quality and diverse data resources are needed. At the same time, even the best AI models can be difficult to interpret and explain, and further research into the ethical and legal implications of using AI in genomics research is greatly needed to ensure our health care systems are equipped to properly and equitably implement them.

The future of genomics in personalized medicine is bright and full of possibilities for Canada. Genomic testing has already revolutionized the diagnosis and treatment of many diseases, and it will become even more valuable in health care, if supported by research based on large-scale diverse data and other technology developments. The resulting more precise, targeted, and effective treatments provided to patients, will lead to better outcomes and health care system efficiencies and savings.

In the last 5 years, the Federal government has invested in CGEn, building and supporting the large-scale infrastructure required to produce health data at scale. However, Canada is lagging behind other countries in capturing its genomic diversity due to a lack of funding and resources dedicated to the collection and analysis of genomic health data at a population-scale. We must build a foundation of diverse and equitable datasets and research that is inclusive and representative of the diversity within the Canadian population. I urge all of us to take action now to prioritize the ethical, legal, and social implications of genomics in health care and to work to ensure equitable access to personalized medicine for all patients.

Finally, we must collaborate across different fields and sectors to bring together the expertise and resources needed to advance personalized medicine in Canada. This includes healthcare providers, researchers, policymakers, industry leaders, patient advocates, representatives from equity-deserving groups and other stakeholders. By working together, we can unlock the full potential of genomics to improve health care and provide better outcomes for patients. On DNA Day, let us take this call to action to help create a future where personalized medicine is expected, and every Canadian receives the best possible care based on their unique genetic makeup.

Dr. Edward M. Rubin joins prestigious Royal Society of Canada as International Fellow

Dr. Edward M. Rubin joins prestigious Royal Society of Canada as International Fellow

Chair of CGEn’s Scientific Advisory Board, Dr. Edward M. (“Eddy”) Rubin has been named a Fellow of the prestigious Royal Society of Canada (RSC) in recognition of his long-term contributions to advancing genomics internationally.

Edward M. Rubin portrait photo
Dr. Edward M. Rubin

RSC Fellows are elected by their peers for their outstanding scholarly, scientific and artistic achievement. Recognition by the RSC is the highest honour an individual can achieve in the Arts, Social Sciences and Sciences. Dr. Rubin is one of 102 new 2022 RSC Fellows. 

“Edward Rubin pioneered laboratory and computational technologies as part of the Human Genome Project, to sequence and analyze human chromosomes 5, 16 and 19. He then decoded these complex data, comparing DNA sequences between species to discover genes of pivotal evolutionary and biomedical importance. Throughout, Professor Rubin has generously shared his knowledge with Canadian scientists and been a champion of their research worldwide,” states the RSC.

“Canadian science is indebted to Eddy’s long-term contributions to advancing genomics internationally through his own research, and more specifically in Canada through sharing his wisdom and vision while sitting on many national advisory boards in our country, including CGEn’s Scientific Advisory Board,” says Dr. Naveed Aziz, CEO, CGEn. “On behalf of CGEn, I congratulate Eddy on this well-deserved honour by the Royal Society of Canada, and thank him for his continued leadership and commitment to CGEn’s scientific excellence and impact.”

The RSC will welcome the new 2022 RSC Fellows in November at the RSC Celebration of Excellence and Engagement in Calgary, Alberta.

CGEn receives $48.9 million in federal funding through the Canada Foundation for Innovation’s Major Science Initiatives Fund

CGEn receives $48.9 million in federal funding through the Canada Foundation for Innovation’s Major Science Initiatives Fund

August 19, 2022, TORONTO – From deepening our understanding of climate change to diagnosing rare diseases, to developing targeted treatments for hard-to-cure cancers, new research will unlock the potential for a healthier, more resilient and sustainable future for Canadians thanks to renewed investments in genomics infrastructure in Canada. Announced today by the Honourable François-Philippe Champagne, Minister of Innovation, Science and Industry, CGEn, Canada’s national platform for genome sequencing and analysis, has been granted $48.9 million from the Canada Foundation for Innovation (CFI) through the 2023 Major Science Initiative (MSI) Fund Competition.

“This new investment from the Government of Canada will enable CGEn to harness the power of large-scale genomics and support the scientific community from coast to coast,” says Dr. Naveed Aziz, Chief Executive Officer, CGEn.

As Canada’s national platform for genome sequencing and analysis, CGEn provides genomics services to a wide range of projects led by principal investigators across the country, working in a wide variety of sectors including health, environment, forestry, fishing, agriculture, and other areas important to Canada and its research community.

“Genomics is transforming medical and life science research here at The Hospital for Sick Children (SickKids) and across the country and will underpin Canada’s greatest scientific discoveries to come. Today’s investment will enable Canadian researchers to access and expand the use of genome analysis to address important biological questions at a pace we’ve never seen before,” says Dr. Stephen Scherer, Scientific Director, CGEn-Toronto, Director, The Centre for Applied Genomics (TCAG) at SickKids and Professor of Genome Sciences, University of Toronto.

“The Government of Canada’s investment in CGEn today will not only help to sustain current research, but will also serve as a catalyst for innovative genomics research that may lead to new discoveries and breakthroughs that will be of value to all Canadians in the future,” says Dr. Mark Lathrop, Scientific Director, CGEn-Montreal, Professor of Human Genetics and Scientific Director, McGill Genome Centre, McGill University.

“This renewed funding for CGEn will help to ensure that Canada’s scientific community will continue to have access to the latest, cutting-edge technology, advanced training and expertise, and ensure Canada is poised to handle future challenges requiring large-scale genomics,” says Dr. Steven Jones, Scientific Director, CGEn-Vancouver, Director, Head of Bioinformatics and Distinguished Scientist, Canada’s Michael Smith Genome Sciences Centre, BC Cancer.

With this renewed support from the Government of Canada through the CFI, CGEn is poised to embark upon the next stage of its strategic vision, continuing to provide the leading genomic infrastructure for Canada’s research and innovation community, while also helping to solidify Canada’s position as a genomics world leader by training the next generation of Canadian genome scientists and collaborating with other major centres to push the field even further.

“We remain committed to our vision and core values of providing essential infrastructure and expertise to enhance, accelerate and support genomics research in Canada. We are grateful for CFI’s support to continue our work with partners and stakeholders to forge a path towards a healthier and more sustainable future for all Canadians through the power of genomics,” says Aziz.

For more information about the CFI MSI 2023 awards, please visit the CFI website.

About CGEn

Established in 2017 as a CFI MSI, CGEn is comprised of three nodes: The Centre for Applied Genomics, The Hospital for Sick Children (SickKids), Toronto; McGill Genome Centre, McGill University, Montreal; Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver. More than 200 CGEn staff serve close to 3,000 distinct laboratory teams, have authored over 670 scientific publications, and the data and support CGEn has provided has helped to train more than 16,000 highly-qualified personnel since CGEn’s inception. With a mandate to make cost-effective and high-quality genome sequencing a reality for all Canadian researchers, CGEn provides services to projects funded by Canadian funding agencies like Genome Canada, the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council (NSERC) as well as other internationally-funded projects. CGEn’s intensive work with academics, industry and government has substantiated the power of genomics, enabling research and development in Canada and beyond.

Media contact:
Hillete Warner, CGEn

DNA Day: CGEn Celebrates 10,000 Human Genomes in the Canadian HostSeq Databank

DNA Day: CGEn Celebrates 10,000 Human Genomes in the Canadian HostSeq Databank

April 25, 2022 – CGEn, Canada’s national platform for genome sequencing and analysis, and its partners are marking World DNA Day by celebrating Canada’s landmark human genome sequencing initiative called HostSeq. Led by CGEn, the HostSeq databank will include the whole genomes (the full set of DNA) of 10,000 Canadians (“Hosts”) with the aim to help understand the genomic architecture of the host response to COVID-19. The HostSeq databank also includes linked standardized clinical information collected at multiple clinical sites across Canada over the past two years of the pandemic.

“The need for a national genomics databank grew very quickly early on in the pandemic. As soon as we started to observe differences of disease manifestation and symptoms in people infected with SARS-CoV-2 virus, we realized that there was an urgent need to collect and analyze population-wide host genetic data,” explains Dr. Naveed Aziz, Chair of the HostSeq Implementation Committee and Chief Executive Officer, CGEn. “The Government of Canada’s prior investment in CGEn allowed us to deploy our capacity to sequence genetic information of Canadians who were experiencing vastly different health outcomes in response to infection by the virus.”

The national HostSeq initiative, funded by the Federal government through Genome Canada’s Canadian COVID-19 Genomics Network (CanCOGeN), is accessible to Canadian and international researchers to help study the health outcomes of those affected by the virus and to identify the genetic contributors of disease severity. This first-of-its-kind Canadian genomics databank may help to better diagnose and treat those predisposed to severe illness, backed by their genomic data.

Owing to a concerted effort led by CGEn and its team, 14 clinical and research studies across the country were brought together to contribute to the databank that includes 10,000 study participants who have consented to the use of their genomic and clinical data for future research. The HostSeq databank will not only help to answer current research questions related to the virus, but will also serve as a resource when responding to future infections and preparing for future pandemics.

“We are thrilled to have a population-based whole genome sequencing effort in Canada,” says Dr. Lisa Strug, Director of the University of Toronto’s Data Sciences Institute and the Ontario regional Centre of the Canadian Statistical Sciences Institute; Senior Scientist, Genetics and Genome Biology, and Associate Director, The Centre for Applied Genomics, The Hospital for Sick Children (SickKids). “Many countries have whole genome sequencing cohorts that they have made available and have had an important impact in health research broadly. The HostSeq databank will provide the Canadian perspective, and reflect data from individuals living in Canada,” says Strug, who is part of the HostSeq Implementation Committee and Chair of the Genetic Epidemiology Committee.

The COVID-19 Host Genome Sequencing project (HostSeq) is Canada’s largest national genomic databank to date containing genome sequences linked to medical and clinical data from 10,000 individuals affected by COVID-19

The HostSeq databank gives Canadian scientists access to a rich dataset to help analyze and identify genetic determinants of susceptibility, severity and outcomes of COVID-19. It will also enable researchers to investigate population-level risks for many other diseases, including the potential to deliver new biomarkers that support prediction of risk, and novel therapeutic strategies.

This DNA Day, there are countless scientific advancements to celebrate, and Canadians can count the HostSeq databank as a major step in how important population-level data can be collected and shared to help advance research and health care in Canada.

“The delivery of the HostSeq project would not have been possible without the effort and commitment of its Implementation Committee, whose tireless efforts since the beginning of the project has allowed CGEn to play a key role in Canada’s response to the COVID-19 pandemic. The HostSeq project has proven that large-scale genomics is a present reality in Canada. The time is now for all partners and stakeholders to come together and build on the years of investment and effort towards a Canadian ecosystem where research is enabled by large datasets,” says Aziz.

Linked Resources

HostSeq: A Canadian Whole Genome Sequencing and Clinical Data Resource by Dr. Lisa Strug et. al. ­

Read about the importance of Big Data in enabling data-driven Canadian research and discovery in Dr. Naveed Aziz’s latest blog post.

To apply for HostSeq data access, please visit the CGEn website.

About CGEn

CGEn is a federally funded national platform for genome sequencing and analysis. Established in 2014, CGEn employs over 200 staff, and is funded primarily by the Canada Foundation for Innovation (CFI) through its Major Science Initiatives Fund (MSI), leveraging investments from Genome Canada and other stakeholders. CGEn operates as an integrated national platform with nodes in Toronto (The Centre for Applied Genomics at The Hospital for Sick Children), Montréal, (McGill Genome Centre at McGill University) and Vancouver (Canada’s Michael Smith Genome Sciences Centre), providing genomic services, including genome sequencing and analysis, that enable research in agriculture, forestry, fishery, the environment, health sciences, and many other disciplines of interest to Canadians.

Dr. ​​Bartha Maria Knoppers joins CGEn’s Board of Directors

Dr. ​​Bartha Maria Knoppers joins CGEn’s Board of Directors

CGEn is pleased to announce the appointment of Dr. ​​Bartha Maria Knoppers as a member of our Board of Directors.

Profile photo of Bartha Maria Knoppers
Dr. Bartha Maria Knoppers

Bartha Maria Knoppers, PhD (Comparative Medical Law: Sorbonne, FR), is a Full Professor, Canada Research Chair in Law and Medicine and Director of the Centre of Genomics and Policy of the Faculty of Medicine at McGill University. Since 2005, she has led the Policy Committee of the Canadian Stem Cell Network and chaired the Ethics Working Party of the International Stem Cell Forum (2005-2015).  Additionally, she was the founder of the Public Population Project in Genomics (P3G) and CARTaGENE Quebec’s population biobank.  She was the Chair of the Ethics and Governance Committee of the International Cancer Genome Consortium (2009-2017) of the Ethics Advisory Panel of WADA (2015-2021), and Co-Chair of the Regulatory and Ethics Workstream of the Global Alliance for Genomics and Health (GA4GH) (2013-2019). In 2015-2016, she was a member of the Drafting Group for the Recommendation of the OECD Council on Health Data Governance and gave the Galton Lecture in November 2017. She holds four Doctorates Honoris Causa and is a Fellow of the American Association for the Advancement of Science (AAAS), the Hastings Center (bioethics), the Canadian Academy Health Sciences (CAHS), and the Royal Society of Canada (RSC). She is also an Officer of the Order of Canada and of Quebec, and was awarded the 2019 Henry G. Friesen International Prize in Health Research, the Till and McCulloch Award for science policy (2020) and the Lifetime Achievement Award from the Canadian Bioethics Society (2021). She served on the International Commission on the Clinical Use of Human Germline Genome Editing in 2020. Currently, she serves on Canada’s: Vaccine Task Force(-2021), Health Data Strategy Expert Advisory Group, CanCOGen’s HostSeq Steering Committee and the COVID Cloud (DNAstack); and is a Member, Scientific Advisory Board of the Human Pangeonome Reference Consortium, USA and Ethics Advisor to the Big Data Big Heart, IMI Project, Europe.

CGEn CEO Announcement Aug 4, 2021

CGEn CEO Announcement

The Board of Directors and Executive Committee of CGEn, Canada’s national platform for genome sequencing and analysis, is pleased to announce the appointment of Dr. Naveed Aziz as Chief Executive Officer effective June 1, 2021.

Dr. Aziz has served as CGEn’s Chief Administrative & Chief Scientific Officer, since 2017. Under his stewardship, CGEn has grown into a multi-million-dollar genomics enterprise that supports genome sequencing studies across many human diseases and genome analysis of all other species. As one of the largest data producers in the world, CGEn has also built the underlying informatics infrastructure to house and decode complex genomic data. Most recently, CGEn leads the $20M “Host-Seq” project funded by the Government of Canada, sequencing the genomes of 10,000 people affected with COVID-19 to search for the genetic factors influencing the wide-range of response to infection.

Dr. Aziz holds a PhD in Gene Targeting from University of Dundee, UK, MPhil in Biotechnology and Executive MBA from Bradford School of Management, UK. His previous roles include serving as the Director of Technology programs at Genome Canada, Head of Genomics at University of York, UK and as Research Fellow at the Noble Research Institute, USA.

About CGEn

CGEn is a federally funded national platform for genome sequencing and analysis. Established in 2014, CGEn employs over 200 staff, funded by the Canada Foundation for Innovation (CFI) through its Major Science Initiatives Fund, leveraging investments from Genome Canada and other stakeholders. CGEn operates as an integrated national platform with nodes in Toronto (The Centre for Applied Genomics at The Hospital for Sick Children), Montréal, (McGill Genome Centre at McGill University) and Vancouver (Canada’s Michael Smith Genome Sciences Centre), providing genomic services, including genome sequencing and analysis, that enable research in agriculture, forestry, fishery, the environment, health sciences, and many other disciplines of interest to Canadians.


CGEn-led Canadian genomics team joins international initiative to study and protect global biodiversity.

Every spcies on Earth possesses a characteristic genome, shaped by millions of years of evolution. Through the study of genomes—the complete genetic and inheritable information of an organism—we can explore life’s diversity, better understand how species are related, how they develop together to create ecosystems, foster conservation, and uncover the biology of health and disease for all living things. 

The Earth BioGenome Project aims to resolve in detail the genomes of all complex life on Earth. With new funding of approximately $6.5 million, Canada is now joining this global initiative through the Canadian BioGenome Project, led by Dr. Steven Jones, Scientific Director of CGEn-Vancouver node and Co-Director and Head of Bioinformatics for Canada’s Michael Smith Genome Sciences Centre (GSC; part of the BC Cancer Research Institute) and Dr. Maribeth Murray, Director of the Arctic Institute of North America at the University of Calgary. Other CGEn scientists included in the project team are Dr. Stephen Scherer, Scientific Director of CGEn-Toronto node and Chief of Research at Hospital for Sick Children, Toronto, and also, Dr. Ioannis Ragoussis, CGEn-Montreal node at McGill Genome Centre.

“Sequencing the genomes of Canada’s plants and animals is a massive proposition that requires significant scientific collaboration—one with enormous benefits not only for better understanding the evolution of life itself but in uncovering fundamental principles of health and disease, for individuals and populations,” says Dr. Jones, who is also a Scientific Director of CGEn Vancouver node. CGEn is a federally funded national platform for genome sequencing and analysis with nodes at the GSC, The Centre for Applied Genomics at Hospital for Sick Children, Toronto and the McGill Genome Centre. “We are proud to contribute CGEn’s expertise and technology to this important endeavor, which has been made possible through substantial recent advances in genome sequencing technology and computational biology.”

Canada possesses significant biodiversity, having approximately 80,000 plant and animal species in environments ranging from desert to the arctic. Many of these species are under threat due to rapid changes in climate and other human-led impacts on our environment. In collaboration with scientists, Indigenous peoples and conservation groups, this project will embark on the task of determining the genetic diversity of Canada’s plants and animals through genomic sequencing.

In 2018, CGEn launched the CanSeq150 program to perform de-novo genome assemblies for 150 species deemed important to Canada’s biodiversity and conservation. The program has sequenced species ranging across the various classes of animals (vertebrate and invertebrate) and plants with economic, cultural, social or environmental significance to Canada. More than 100 species are already selected for sequencing through the CanSeq150 program. The CanSeq150 program has provided a platform for biologists, ecologists, population geneticists and other scientists who have in-depth knowledge and expertise in species of interest to work with genomic scientists to generate valuable data that can help advance research in important biological and conservation related areas. The addition of this Canadian arm of the Earth BioGenome project will lead to tangible benefits to Canada in wildlife conservation, recovery and monitoring.

“Genome BC recognizes the urgent need to develop and address international systems to monitor and protect our rapidly changing environment,” says Dr. Federica di Palma, Chief Scientific Officer and Vice President, Sectors. “Applications of this data are real-time, and it builds on our strengths in genome sequencing in this province.”

Initially the project will identify approximately 400 species that would benefit from a fully sequenced genome. The species will be selected based on existing and established priorities of Indigenous peoples, federal and provincial organizations, academic scientists and other conservation and wildlife groups.

Through a case study approach, the team will also work with partners to establish priorities for genomics tools development, policy recommendations for the use of genomics to maintain biodiversity and support conservation and management, and a user-friendly geospatial platform of genomics data and information from the project. The data generated will also be freely available to scientists in Canada and worldwide.

This project was funded through Genome Canada’s 2020 Large-Scale Applied Research Project Competition: Genomic Solutions for Natural Resources and the Environment.