Assembling the Medicago truncatula Pseudomolecules

All relevant genomic sequences (BACs, fosmids, etc.) were downloaded from GenBank following a data freeze. BACs whose sequences were entirely contained within other clones plus any contaminants of non-Medicago origin were first eliminated. All remaining sequences were subjected to an all-by-all alignment using MUMMer (Delcher et al., 2002) to find overlap regions.


Overlap rules

  • Overlaps of at least 2,000 bp and 99% identity were considered valid.
  • In the case of terminal overlaps, the redundant region between two BACs was removed to form contiguous sequence using a left-greedy rule
  • When the left-hand BAC was phase 1, precedence was given to phase 2 or phase 3 sequence.
  • In the case of non-terminal overlaps (involving at least one phase 1 BAC), the complete sequence of each BAC was used and 5,000 Ns were incorporated into the pseudomolecule sequence to denote a non overlapping join and some redundant sequence in the adjacent BACs.

Following assembly of all available sequence into contigs, scaffolding using all BAC and Fosmid end sequences was performed with BLAT (Kent et al., 2002).

Criteria for BESs

  • Only low copy BES pairs were used. i.e. Both ends of the BES hitting two or fewer times in the genome
  • Matches with >= 99% identity and >= 90% coverage were used

In cases where scaffolds could be formed by at least two pairs of end sequences, 50,000 Ns were inserted between neighboring contigs. Where a contig could potentially be extended by a defined but unsequenced BAC, 50,000 Ns were added to the contig.

The final step of assembly was anchoring and ordering scaffolds onto the eight chromosomes by reference to genetic maps composed of DNA-based genetic markers (Choi et al., 2004; Thoquet et al., 2002).

Order & Anchoring the scaffolds - Inserting gaps

  • A spacer of 100,000 Ns was inserted between scaffolds that could not be spanned by paired end sequences.
  • Because some sequence contigs already terminate in 50,000 Ns due to the recognition of an unsequenced BAC extension, this process can result in gaps of 100,000, 150,000 or 200,000 Ns between sequence-defined contigs.

All singleton BACs that could not be anchored to the genetic map were collected together and designated as chromosome 0. Following the assembly, the Mt 3.0 pseudomolecules were re-arranged to conform as closely as possible to the optical map.