Blue Whale genomics

Whale Tales: Blue Whale Genomes, Resilience to Genetic Bottleneck, and Interbreeding Insights

CGEn sat down with Mark Engstrom, Associate Professor at the University of Toronto and Deputy Director, Collections & Research at the Royal Ontario Museum (ROM), and Oliver Haddrath, Research Technician from the ROM to go behind the science with a special focus on the genomics work in their recent research paper that demystifies the blue whale genomes. This research garnered significant interest, including coverage in a New York Times article.

Image: North Atlantic blue whale, source:

Blue whales, the largest animals, continue to captivate the scientific community with their enigmatic behaviours and ecological importance. Despite facing a century of industrial whaling that decimated their populations, new genomic research is painting a surprisingly resilient picture of these majestic creatures.

“One of the challenges early on was obtaining high-quality DNA from decomposed whale tissues”- Oliver Haddrath

The genesis of this pioneering study traces back to a poignant event in 2014 when nine blue whales perished after being trapped by ice near Newfoundland, Canada. Engstrom’s team was focused on whale conservation in Canada, particularly on the East Coast, and sought to model new data. They aimed to map the blue whale genome, which lacked an existing reference. Engstrom’s team used samples from the stranded whales and 26 others to conduct genomic analysis of the North Atlantic population.  “One of the challenges early on was obtaining high-quality DNA from decomposed whale tissues. We finally devised strategies such as focusing on submerged parts of the remains for high-quality tissue.” Haddrath says. 

Working with long-standing collaborators at CGEn-Toronto (The Centre for Applied Genomics (TCAG) at The Hospital for Sick Children) the samples were sequenced using a mix of long- and short-read technologies, partially supported through Canseq150, a CGEn and partners-driven initiative to sequence 150 new genomes for Canada’s 150th birthday.

Engstrom, Haddrath, and their team were faced with the challenging task of assembling the genome of the blue whale. Supported by TCAG they chose a de novo genome assembly approach. This method allowed them to build the genome without relying on a ‘template genome’ from a closely related species, minimizing the risk of missing unique blue whale genome features. Long-read sequencing, though less accurate and requiring more specialized infrastructure (PacBio Sequel), was essential for producing large sequence fragments that could be pieced together more easily. Complementary short-read sequencing produced more precise sequences to refine the constructed genome. To help researchers in the community, the authors made the blue whale genome publicly accessible.

The building of the blue whale genome might have been the end of a research project in itself but for the authors, it was the starting point towards answering their question. The collaborative effort yielded unprecedented findings, challenging conventional wisdom and shedding light on hidden aspects of blue whale demographics. Despite the severe impacts of historical whaling, genomic analysis found no signs of a genetic bottleneck, indicating the resilience of blue whales, possibly due to their slow reproductive rate, which offers hope for future survival. The study also revealed unexpected patterns of gene flow, contradicting traditional notions of east-west population divisions, with genetic evidence showing extensive intermingling in one direction—west to east—driven by ocean currents carrying whales and their prey, krill, across vast distances. Additionally, the research uncovered evidence of interspecies mating between blue whales and fin whales, tracing back approximately 3.5 percent of the blue whale genome to fin whales, challenging previous beliefs about hybrid sterility and raising questions about the functional significance of these genes in blue whale physiology. 

You can’t conserve what you don’t know. This is why this information is so important.”- Mark Engstorm

The implications of these findings extend beyond scientific curiosity to inform conservation strategies crucial for the preservation of the blue whale population, which is currently estimated to be below 3,500.” You can’t conserve what you don’t know. This is why this information is so important.” explains Engstorm. Understanding whether there are distinct units or one metapopulation informs plans for conservation efforts across North American and European waters. The movement and intermixing of blue whales across the Atlantic provides a glimmer of hope for population recovery. However, the authors caution that while the (current) absence of a genetic bottleneck is promising, ongoing population decline could eventually lead to its emergence.

By linking natural history with genomic analysis, this work shows the value of collaboration and interdisciplinary approaches in unravelling nature’s puzzles and guiding conservation efforts in an ever-changing world. Engstorm, Haddrath, and their collaborators remain committed to pushing the boundaries of genomic research and leveraging their findings to advocate for the protection of blue whales. They hope to carry forward their research to study other populations of blue whales and look in more depth at the other player in this scenario- the fin whales.

A ‘behind-the research’ conversation with René L. Warren on unravelling the Genetic Clues to COVID-19 Severity

A ‘behind-the research’ conversation with René L. Warren on unravelling the genetic clues to COVID-19 severity

By: Shantala Hari Dass


As the world grappled with the complexities of the COVID-19 pandemic, scientists raced to comprehend the factors influencing the severity of the disease. Intrigued by the spectrum of COVID-19 severity, researchers from Dr. Inanc Birol’s group at Vancouver’s Michael Smith Genome Sciences Centre, led by Bioinformatics Team Leader René L. Warren, embarked on a quest to understand the spectrum of COVID-19 severity, including possible links to immune genes.

The Human Leukocyte Antigens (HLAs) are such genes and, for the central role they play in human immunity, they are disease determinants. Because versions of HLA genes (called HLA alleles) are found at different frequencies within human populations, they could be a molecular clue to explain why certain groups might be more susceptible to infection and exhibit more or less severe symptoms. HLAs can be likened to servers presenting food in a restaurant. When the body encounters a pathogen, it breaks it down into protein fragments, presented by HLA ‘servers’ to immune cells, triggering an immune response. However, these ‘servers’ are selective in what they present. Several HLA variants, such as HLA-C*04:01, bind to viral degradation fragments from the SARS family, which includes the coronavirus responsible for COVID-19. In our analogy of ‘server’ genes, HLA-C*04:01, contrary to most HLA variants, can be seen as serving only a restricted number of viral fragments to the host immune system, potentially limiting the immune system’s ability to combat the COVID-19 virus.

At the start of the COVID-19 pandemic, multiple research groups began probing for possible associations between HLAs and COVID-19 disease outcomes. Such studies had to march in real-time with efforts to collect data and build datasets. Warren aimed to investigate whether our HLA makeup influences susceptibility to and severity of COVID-19, and they had the exact tool to do it. 

“We had developed a bioinformatics tool called HLAminer a decade ago, aimed at determining HLA types from next-generation sequencing datasets. We saw potential in applying this tool to explore this research question,” Warren explains 

The research team began analyzing existing data, starting with whole genome sequences from eight COVID-19 patients at the Wuhan seafood market, leading to the first publication of this research in 2020. They expanded their analyses to include a genomic dataset with meta/clinical variables from COVID-19 patients in New York. While these initial studies revealed an intriguing (but weak) association between an HLA allele-C*04:01- and severe COVID-19, they were limited by the small sample sizes of 100 patients.Since the initial report of an association between HLA-C*04:01 alleles and COVID-19 severity, many other research teams have confirmed these findings in patient cohorts worldwide, including in Europe, India, Armenia, Spain, and the United Arab Emirates. However, these studies were also limited by small (in the range of 100s) sample sizes.

Since the initial report of an association between HLA-C*04:01 alleles and COVID-19 severity, many other research teams have confirmed these findings in patient cohorts worldwide, including in Europe, India, Armenia, Spain, and the United Arab Emirates. However, these studies were also limited by small (in the range of 100s) sample sizes.

A pivotal step for Warren was analyzing the HostSeq dataset. HostSeq is a first-of-its-kind Canadian genomic and clinical databank driven by CGEn and COVID-19 studies from across Canada, that is broadly consented for health research. The databank contains genomes, and health metadata for 10,000+ participants affected by COVID-19. “Working on the HostSeq dataset was our plan from the start,” he says. This collaboration enabled the team to leverage a larger dataset, essential for robust conclusions.

Their findings were confirmatory – certain HLA alleles, such as C*04:01 and A*11:01, emerged as potential markers for severe COVID-19 cases (measured as the requirement for mechanical ventilation). However, the authors remain cautious, merely having these genes does not necessarily mean one will develop severe COVID-19, just that the probability is higher

What might the clinical implications of this study be? Characterizing the HLA alleles of COVID-19 patients could serve as a valuable prognostic tool. While COVID-19 is no longer a pandemic, these findings could help in guiding public health decisions for COVID-19 — such as vaccine dissemination strategies and disease management.

Using existing publicly available datasets, Warren and the research team were able to extend their expertise to address a topical and timely challenge. Looking ahead, the researchers hope to validate their findings on an even larger scale and with diverse datasets. 


  1. The research story captured here was published in Establishing association between HLA-C*04:01 and severe COVID-19, HLA. 2024 Jan;103(1) 
  2. The HostSeq databank (genomic data and health metadata) is accessible to the Canadian and international research community. The data has been consented for use in any approved health research project. More details about the phenotype variables can be found on the data portal. Currently, over 30 research projects have been approved to use this data.If you would like to learn more about HostSeq and the data access process please click here

Unleashing the Potential: The Transformative Role of Artificial Intelligence in Precision Medicine

Unleashing the Potential: The Transformative Role of Artificial Intelligence in Precision Medicine

By: Shantala Hari Dass

 From left to right: Dr. Naveed Aziz, Fanny Sie, Dr. Devin Singh, Dr. Tracie Risling


The landscape of health care is on the cusp of transformation, and Artificial Intelligence (AI) holds the key to revolutionizing patient outcomes, refining clinical decision-making, and streamlining costs. However, this transformative potential is accompanied by a set of challenges and ethical considerations that demand careful consideration.

CGEn curated an insightful panel discussion on November 13, 2023, at the Canadian Science Policy Centre’s annual conference in Ottawa, titled ‘The Role of Artificial Intelligence in Delivering Precision Medicine of the Future’, moderated by Dr. Naveed Aziz, CEO at CGEn. Renowned panelists—Dr. Devin Singh, Emergency Medicine Physician and Clinical Lead in Artificial Intelligence and Machine Learning at SickKids, Dr. Tracie Risling, Associate Professor in the Faculty of Nursing at The University of Calgary, and Vice-President of the Canadian Nurses Association, and Fanny Sie, Head of AI and Emerging Technology External Collaborations at Roche Global Integrated Informatics—shared profound insights into the opportunities and challenges presented by the integration of AI into precision medicine, with a focus on ethical, regulatory, and societal implications. This report encapsulates key take-aways and poignant quotes from the discussion.

Three Key Takeaways:

1. Innovation in Policy for AI in Precision Medicine:

There is an urgency to innovate policies governing the implementation of AI solutions in precision medicine. This sets the tone for the broader discussion on adapting regulatory frameworks to unlock potential benefits. A unified approach would contribute to mitigating potential risks and fostering a secure pipeline for AI industry development.

“It would be nice for the Federal Government to come up with a broad implementation strategy that can be adjusted and adopted in Provinces and Territories across the country.”– Fanny Sie

“There is an opportunity for Canada to learn from policy models in the US and EU to mitigate potential harms and build a pipeline for the industry that is deeply aligned with citizen privacy.”- Devin Singh

2. Collaborating to Build AI Tools for Precision Medicine:

How well-informed is the general public in Canada about AI and precision medicine? This question is pivotal, considering the collaborative efforts required between patient partners and practitioners in co-designing solutions.  These AI tools should be built in collaboration and cooperation with patients and healthcare professionals such as doctors and nurses. In this process, we must caution against making assumptions based on social determinants of health, stressing that individuals facing challenges like food insecurity or lack of housing can still be valuable contributors to the development and implementation of emerging technologies.

The gaps extend to the workforce as well with the massive gaps in education around ethical AI use. Concerns were raised about creating a workforce ill-equipped to implement AI solutions effectively and equitably, stressing the need for comprehensive education.

“We need a better handoff between academic research outputs and industry to get Canadian-made technology/AI solutions into the Canadian healthcare system”- Fanny Sie

“For an effective implementation of AI in health care, you need practitioners with IA – intelligence amplification. AI won’t replace practitioners, but it will be on the shift with them,” – Tracie Risling

3. Diversity and Sustainability in AI Implementation:

AI is pushing the diversity question to the forefront, necessitating attention in ways that previous technologies did not. However, the lack of reflective data from these diverse communities has hindered the capitalization of this potential. 

Canada has some of the most diverse communities in the world whose expertise and data can be used to build equitable AI tools“- Devin Singh

“If you want scale and sustainability, you need long-term engagement of patients and practitioners, and trust.” – Tracie Risling


The CGEn panel has illuminated the intricate landscape of AI’s role in the future of precision medicine. The discourse navigated through the urgent need for policy innovation, the imperative of closing educational gaps, and the importance of data reciprocity to empower patients. The panelists stressed that diversity is not just a checkbox but a catalyst for developing equitable AI tools. Furthermore, multidisciplinary collaboration and trust emerged as critical elements for scaling AI initiatives in healthcare. As we forge ahead, learning from global policy models and establishing a broad federal implementation strategy could position Canada at the forefront of AI-driven precision medicine. Summarising the need for such discussions, Naveed Aziz highlights,” In the realm of precision medicine, the transformative alliance of AI beckons for a thoughtful discourse—a vital conversation that navigates the potential benefits, ethical considerations, and necessary policies to ensure responsible and impactful integration for the betterment of health care. These cannot be one-off events but instead, ongoing dialogues that inform policy, practice, and implementation at all levels.” The panel discussion at the Canadian Science Policy Centre’s annual conference has not only fostered an essential dialogue but also charted a course for the future of healthcare outcomes for all Canadians.