This set of Bioinformatics Multiple Choice Questions & Answers (MCQs) focuses on “Genome Sequence Assembly”.
1. The major challenges in genome assembly are sequence errors, contamination by bacterial vectors, and repetitive sequence regions.
Explanation: Sequence errors can often be corrected by drawing a consensus from an alignment of multiple overlapped sequences. Bacterial vector sequences can be removed using filtering programs prior to assembly. To overcome the problem of sequence repeats, programs such as RepeatMasker can be used to detect and mask repeats. Additional constraints on the sequence reads can be applied to avoid miss-assembly caused by repeat sequences.
2. When a sequence is generated from ____ ends of a single clone, the distance between the two opposing fragments of a clone is fixed to ________ meaning that they are always separated by a distance defined by a _____ length (normally 1,000 to 9,000 bases).
a) both, an uncertain range, clone
b) one, an uncertain range, clone
c) both, a certain range, clone
d) both, a certain range, gene
Explanation: A commonly used constraint to avoid errors caused by sequence repeats is the so called forward–reverse constraint. When the constraint is applied, even when one of the fragments has a perfect match with a repetitive element outside the range, it is not able to be moved to that location to cause miss-assembly.
3. Which of the following is untrue about base calling and assembly programs?
a) The first step toward genome assembly includes derive base calls
b) The first step toward genome assembly includes assigning associated quality scores
c) One of the steps is to assemble the sequence reads into contiguous sequences
d) There is no identifying overlap between sequence fragments
Explanation: One of the steps includes identifying overlaps between sequence fragments, assigning the order of the fragments and deriving a consensus of an overall sequence.
Assembling all shotgun fragments into a full genome is a computationally very challenging step. There are a variety of programs available for processing the raw sequence data.
4. Which of the following is incorrect?
a) Initial DNA sequencing reactions generate short sequence reads from DNA clones
b) To assemble a whole genome sequence, these short fragments are joined to form larger fragments
c) The average length of the reads is about 50 bases
d) A number of overlapping contigs can be further merged to form scaffolds
Explanation: The average length of the reads is about 500 bases. To assemble a whole genome sequence, these short fragments are joined to form larger fragments after removing overlaps. These longer, merged sequences are termed contigs, which are usually 5,000 to 10,000 bases long. A number of overlapping contigs can be further merged to form scaffolds (30,000–50,000 bases, also called supercontigs), which are unidirectionally oriented along a physical map of a chromosome.
5. Which of the following is incorrect about Phred?
a) It is a UNIX program
b) It doesn’t give a probability score in output
c) It is used for base calling
d) It uses a Fourier analysis to resolve fluorescence traces and predict actual peak locations of bases
Explanation: It also gives a probability score for each base call that may be attributable to error. The commonly accepted score threshold is twenty, which corresponds to a 1% chance of error.
The higher the score, the better the quality of the sequence reads. If the score value falls below the threshold, human intervention is required.
6. Which of the following is incorrect about Phrap?
a) It aligns individual fragments in a pairwise fashion using the Smith–Waterman algorithm
b) It doesn’t take input from Phred
c) It is used for sequence assembly
d) It is a UNIX program
Explanation: It takes Phred base-call files with quality scores as input and aligns individual fragments in a pairwise fashion using the Smith–Waterman algorithm. The base quality information is taken into account during the pairwise alignment. After all the pair wise sequence similarity is identified, the program performs assembly by progressively merging sequence pairs with decreasing similarity scores while removing overlapped regions. Consensus contigs are derived after joining all possible overlapped reads.
7. VecScreen is a primarily aimed for sequence assembly.
Explanation: is a web-based Program that helps detect contaminating bacterial vector sequences. It scans an input nucleotide sequence and compares it with a database of known vector sequences by using the BLAST program.
8. Which of the following is incorrect about EULER?
a) It is an assembly algorithm
b) It uses a Eulerian Superpath approach, which is a polynomial algorithm
c) In this approach, a sequence fragment is broken down to tuples of five nucleotides
d) The tuples are distributed in a diagram with numerous nodes that are all interconnected
Explanation: The tuples are converted to binary vectors in the nodes. By using a Viterbi algorithm, the shortest path among the vectors can be found, which is the best way to connect the tuples into a full sequence. Because this approach does not directly rely on detecting overlaps, it may be advantageous in assembling sequences with repeat motifs.
9. TIGR Assembler is a UNIX program from TIGR for assembly of large shotgun sequence fragments.
Explanation: It treats the sequence input as clean reads without consideration of the sequence quality. A main feature of the program is the application of the forward–reverse constraints to avoid miss-assembly caused by sequence repeats. The sequence alignment in the assembly stage is performed using the Smith–Waterman algorithm.
10. Which of the following is incorrect about ARACHNE?
a) It accepts base calls with associated quality scores assigned by Phred as input
b) It is a free UNIX program
c) It is for the assembly of whole-genome shotgun reads
d) It doesn’t involve heuristic approach
Explanation: Its unique features include using a heuristic approach similar to FASTA to align overlapping fragments, evaluating alignments using statistical scores, correcting sequencing errors based on multiple sequence alignment, and using forward–reverse constraints. It accepts base calls with associated quality scores assigned by Phred as input and produces scaffolds or a fully assembled genome.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics, here is complete set of 1000+ Multiple Choice Questions and Answers.