After completion of the Mt3.5 assembly and genome publication, we embarked upon efforts to generate a new and improved genome sequence. We generated a limited amount of 454 mate pair sequence and extensive Illumina paired end (PE) and mate pair (MP) sequence (see below). These were used together with the previously generated BAC- and fosmid-end sequences to produce an AllPaths assembly.
The much improved version of Medicago truncatula genome assembly (Mt4.0) was derived via de novo whole genome shotgun assembly by ALLPATHS-LG using mostly Illumina and 454 reads and annotation (Mt4.0v2). The ALLPATHS scaffolds were anchored onto the 8 linkage groups on the basis of alignments to both the optical map (OM) and genetic map derived from genotyping-by-sequencing (GBS) data. High quality contiguous BAC sequences were patched into the new Mt4.0 pseudomolecules to close gaps and reduce polymorphisms between the assembly versions, where possible.
This new version of the genome was re-annotated using our in-house structural and functional annotation pipelines, integrating a number of tiers of evidence: ab initio gene predictors, legacy Mt3.5v5 annotation, EST/RNA-seq assemblies and proteomic data, culminating in an annotation release, Mt4.0v2. A total of 50,376 gene loci (encompassing 57,585 gene models) were annotated by this pipeline, which were then binned into high (HC) and low (LC) confidence classes (~32k HC and ~19k LC loci), based on different levels of EST/RNA-seq/protein support and synteny to related plant genomes.