Sustainable Development Goals for Agriculture
Modern agriculture faces extraordinary demands to become both more productive and sustainable amidst the pressures of climate change and the world’s growing population. Land areas experiencing higher temperatures than ever are seeing changes in crop productivity whilst becoming friendlier to varieties of pests and mold that thrive in these higher temperatures.
Agricultural innovation is essential to broader efforts named in the United Nations’ interdependent Sustainable Development Goals, especially food security (SDG2), good health and well-being (SDG3), responsible consumption and production (SDG12), and climate action (SDG13).
Specifically, we aim for plants with 1) high yields of high-quality product – whether it is grain, seed, or fiber, 2) resilience against climate-related stresses – like heat, cold, wind, drought, and flood, 3) resistance to pests and molds – whose ranges increase with warming of the earth and 4) reduce strain on existing resources – e.g. consume less water and grow in degraded soils.
Breeding takes multiple generations, and we have already learned through genomics that domesticated crops have more limited gene pools than those of ancient or ‘forefather’ genomes. Thus, developing new breeds and strains of plants that may meet these demands sooner rather than later requires publicly available knowledge of their genomes and their more diverse ancestral genomes so that they can be bred more efficiently.
First, What is De Novo Sequencing?
De novo sequencing, literally the sequencing of a species’ genome for the first time, is key to understanding genetic diversity and bringing this information into the public domain for more effective breeding programs. We will take a look at three of the world’s most economically important crops whose genomes presented the technology world with unique challenges, and the advances that gave us the genetic insights we have now.
In the sequencing laboratory, the workflow involves mashing up tissues and segmenting the DNA into pieces short enough for the instruments to read with minimal errors, and making many copies of each segment DNA to give confidence in the sequences read out. De novo assembly begins after we have the sequences of thousands to millions of segments. Segments with matching ends are first ‘stitched’ together computationally to form longer continuous sequences called contigs. Contigs are then roughly ordered and oriented in a specific direction to produce scaffolds. Eventually, the mixture of contigs and scaffolds is disentangled into individual chromosomes, and the complete set forms the genome.
While the first de novo assemblies of plants, animals, and humans began decades ago, it has taken multiple generations of sequencing technologies, bioinformatic strategies, and dedicated global consortia to resolve the genomes of the world’s most important crops. Over millions of years, rice, wheat, and cotton have adapted their genomes to confer versatility – resulting in long, interesting, and complicated genomes that have challenged the industry.
Rice
Rice provides food for half of the world’s population – naturally, it remains the most researched crop, according to the Consortium of International Agricultural Research (CGIAR),the world’s largest global agricultural innovation network.
The rice genome was first drafted in 2002, astonishingly by four research teams working independently, including BGI’s team. After Arabidopsisthaliana– the most common plant used in laboratory research, the rice genome was the second plant genome ever sequenced, and at 466 megabasepairs, it was the longest plant genome sequenced, 3.7 times longer than A. thaliana. Thanks to simultaneous efforts on the two main subspecies, indicaand japonica, rice became the first species where comparative genomics on two subspecies was possible.
A feature rare among animal genomes yet common among plant genomes is polyploidy– when an organism has more than two sets of chromosomes. (We, humans, have diploid genomes.) Plants often become polyploids after breeding together two closely related species. This often results in sterile offspring but on occasion the hybrid plant will duplicate its genome. The plant may then accommodate this massive gain in DNA by dropping many segments while copying others more. If successful, the process will create a tetraploid organism and new species that can reproduce.
For plants, polyploidy is a means of evolution and protection - extra copies of each chromosome can serve as ‘back-ups’ in case mutations occur in one copy. The extra copies can shield the plant from negative effects of the mutation, while keeping mutations that may prove beneficial during sudden and/or drastic stresses. Polyploidy can therefore confer adaptability to new conditions while maintaining robustness under conditions where the plant already thrives.
Polyploidy, high heterozygosity – i.e. variation among different copies of the same chromosome, and high proportions of repeating elements go hand-in-hand. They also exponentially complicate de novo assembly. If the pieces of DNA do not have enough unique identifying features, it’s like having 4 jigsaw puzzles portraying nearly the same scene with a handful of differences and a lot of pieces of the same color and shape, and thereby unable to discern which puzzle those pieces are part of.
Multiple rice genomes (diploid) were drafted in 2002, but the contradictory assertions that rice is an ancient aneuploid and an ancient polyploid was not resolved until 2005 when BGI sequenced indicawith more than 1,000-fold improvementin contiguity over these drafts. With the longest contiguity and scaffolds ever achieved for rice, the team identified 18 distinct duplicated segments that cover 65.7% of the genome and found that 17 of the 18 segments were duplicated over 50 million years ago, before rice and other grass species diverged.
Technological breakthroughs and dedicated cooperations have expanded our understanding of rice’s subpopulations and their preferred geographies. The 3,000 Rice Genomes Project, an international collaborative between the Chinese Academy of Agricultural Sciences, BGI Shenzhen, and the International Rice Research Institute and funded by the Bill and Melinda Gates foundation, has shed light on the populations and diversity from 3,010 publicly available rice genomes from 89 countries. The project has produced a pangenome - a gene set representing all strains of a species, and pinpointed locations of genes that breeders could use to improve strains for grain length, grain width, and resistance to bacterial blight.
Wheat
The innovations developed with rice set the stage for the genomes of wheat and cotton, giants in agricultural significance, genome size and complexity, and technical challenges.
Wheat provides ~20% of the caloriesconsumed worldwide, is grown on more land area than any other food crop and has evolved to thrive in a huge diversity of climates, environments, and altitudes. We now know that the ancestor of bread wheat has over 50% more cold-related genesthan rice, sorghum, and maize and over 30% more cold-related genes than its ancestral grass species, Brachypodium. Wheat owes its versatility to enormous genetic variation among the its polyploid species, Triticum turgidum, or durum wheat, is tetraploid and used to make couscous and pasta, and Triticum aestivum, or bread wheat, is hexaploid and is used to make bread and wheat noodles. To assemble the genome of bread wheat, whose structure can be thought of as AABBDD, where A, B, D correspond to ancestral genomes with distinct structure and content, teams led by BGI first assembled draft genomes of the diploid ancestral species Triticum urartuand Aegilops tauschiiin two articles published concurrently in Nature in 2013.
Each subgenome was nearly 10 times longer than that of the full rice genome and possessed more than 65% repeating elements (rice has ~42% repeating elements). Along the way, they examined genomes of wheat plants in different geographies throughout the Fertile Crescent and found that plants growing at >1,000 meters above sea level had more genes for resistance against powdery mildew, a destructive mold, than plants in lower altitude regions.
The number of plants sequenced was made possible by Next Generation Sequencing (NGS) approaches, which are more high-throughput and cost-effective than the Sanger-based approaches of the early 2000s. BGI’s open source de novo assembler specifically for NGS data, SOAPdenovo, was a key tool for these and many other genomes over the past dozen years.
Thefully annotated bread wheat genomepublished in 2018 by the International Wheat Genome Sequencing Consortium (IWGSC) ended up being 15.8 gigabasepairs long and had 85% repeating elements.
Cotton
Like the wheat genome, the genome for cotton, the world’s most valuable non-food agricultural product, is highly repetitive and gives evidence for a complicated evolution. Cotton is the principal natural resource for the textile industry, with more than 25 tons produced annually and consumption increasing with the world’s population, and the push for resilient breeds with lower demands on resources is stronger than ever. Upland cotton (G. hirsutum), which accounts for 90% of the world’s cotton production, is an allotetraploid, denoted AADD, a hybrid between an ‘Old World’ A-speciesand a ‘New World’ D-species. As with wheat, the A- and D- subgenomes had to be sequenced separately. These subgenomes alone revealed that different ancestors contributed genes for the length of cotton fibers and resistance to a soil-borne fungus that causes wide-spread and destructive cotton disease.
Many mysteries remain about which genes and groups of genes make some varieties of cotton more adaptable to different climates, soil conditions, and stresses. As with the wheat genome, third generation sequencing technologiesthat are simultaneously high-throughput, rapid, cost effective, and achieve longer reads than NGS methods, will enable us to sequence these, and many more, species more completely and with higher signal-to-noise.
Conclusion
Progress towards sustainability will be driven by technological advances that 1) allow us to sequence these long, highly varying, and repetitive genomes more completely and confidently, and 2) enable the public to access and use the terabyte volumes of genomic data. BGI is active in multiple consortia devoted to these goals, including the 3,000 Rice Genomes Project, the IWGSC, and the African Orphan Crops Consortium– a project to annotate reference genomes for 101 traditional food crops in Africa. Yet, there is still much more work to be done!