This set of Bioinformatics Multiple Choice Questions & Answers (MCQs) focuses on “Categories of Gene Prediction Programs”.
1. Which of the following is true regarding the methods of gene prediction?
a) They solely consist of a type called ab initio–based methods
b) The ab initio–based approach predicts genes based on the given sequence alone
c) The ab initio–based approach predicts genes based on the given sequence and relative homology data
d) They solely consist of a type called homology-based approaches
Explanation: The current gene prediction methods can be classified into two major categories, ab initio–based and homology-based approaches. The ab initio–based approach predicts genes based on the given sequence alone.
2. In the ab initio–based approaches—they rely on two major features associated with genes: one of them being the existence of gene signals, which include start and stop codons, intron splice signals, transcription factor binding sites, etc.
Explanation: They also include ribosomal binding sites, and polyadenylation (poly-A) sites. In addition, the triplet codon structure limits the coding frame length to multiples of three, which can be used as a condition for gene prediction.
3. In the ab initio–based approaches—they rely on two major features associated with genes: one of them being gene content, which is statistical description of coding regions.
Explanation: It has been observed that nucleotide composition and statistical patterns of the coding regions tend to vary significantly from those of the non-coding regions. The unique features can be detected by employing probabilistic models such as Markov models or hidden Markov models to help distinguish coding from non-coding regions.
4. The homology-based method makes predictions based on significant matches of the query sequence with sequences of known genes.
Explanation: For instance, if a translated DNA sequence is found to be similar to a known protein or protein family from a database search, this can be strong evidence that the region codes for a protein. Alternatively, when possible exons of a genomic DNA region match a sequenced cDNA, this also provides experimental evidence for the existence of a coding region.
5. FGENESB is a web-based program that is also based on fifth-order HMMs for detecting coding regions.
Explanation: The program is specifically trained for bacterial sequences. It uses the Vertibi algorithm to find an optimal match for the query sequence with the intrinsic model. A linear discriminant analysis (LDA) is used to further distinguish coding signals from non-coding signals.
6. Which of the following is untrue about GeneMark?
a) It is a suite of gene prediction programs based on the fifth-order HMMs
b) The main program is trained on a number of complete microbial genomes
c) A GeneMark heuristic program can be used to improve accuracy
d) If the sequence to be predicted is from a non-listed organism, the most closely related organism can be chosen as the basis for computation
Explanation: Another option for predicting genes from a new organism is to use a self-trained program GeneMarkS as long as the user can provide at least 100 kbp of sequence on which to train the model. If the query sequence is shorter than 100 kbp, a GeneMark heuristic program can be used with some loss of accuracy. In addition to predicting prokaryotic genes, GeneMark also has a variant for eukaryotic gene prediction using HMM.
7. Which of the following is untrue about Glimmer?
a) It stands for Gene Locator and Interpolated Markov Modeler
b) It is a UNIX program from TIGR
c) It does not necessarily use the IMM algorithm
d) It is used to predict potential coding regions
Explanation: The computation consists of two steps, namely model building and gene prediction. The model building involves training by the input sequence, which optimizes the parameters of the model. In an actual gene prediction, the overlapping frames are “flagged” to alert the user for further inspection. Glimmer also has a variant, GlimmerM, for eukaryotic gene prediction.
8. RBS finder is a UNIX program that uses the prediction output from Glimmer and searches for the Shine–Delgarno sequences in the vicinity of predicted start sites.
Explanation: A high-scoring site is found by the intrinsic probabilistic model, a start codon is confirmed. Otherwise the program moves to other putative translation start sites and repeats the process.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics, here is complete set of 1000+ Multiple Choice Questions and Answers.
Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!