This set of Bioinformatics Multiple Choice Questions & Answers (MCQs) focuses on “Gene Prediction in Eukaryotes-1”.
1. Which of the following is untrue?
a) Eukaryotic nuclear genomes are much larger than prokaryotic ones
b) They tend to have a very high gene density
c) Eukaryotic nuclear genomes’ sizes range from 10 Mbp to 670 Gbp (1 Gbp = 109 bp)
d) They tend to have a very high gene density
Explanation: In humans, for instance, only3%of the genome codes for genes, with about 1 gene per 100 kbp on average. The space between genes is often very large and rich in repetitive sequences and transposable elements.
2. Which of the following is untrue about translation and transcription?
a) The first is capping is at the 5’ end of the transcript which involves methylation at the initial residue of the RNA
b) The splicing process involves a large RNA-protein complex called spliceosome
c) The second event is splicing, which is the process of removing exons and joining introns
d) The second event is splicing, which is the process of removing introns and joining exons
Explanation: The reaction requires intermolecular interactions between a pair of nucleotides at each end of an intron and the RNA component of the spliceosome.To make the matter even more complex, some eukaryotic genes can have their transcripts spliced and joined in different ways to generate more than one transcript per gene. This is the phenomenon of alternative splicing.
3. The main issue in prediction of eukaryotic genes is the identification of exons, introns, and splicing sites.
Explanation: From a computational point of view, it is a very complex and challenging problem. Because of the presence of split gene structures, alternative splicing, and very low gene densities, the difficulty of finding genes in such an environment is likened to finding a needle in a haystack.
4. Most vertebrate genes use ____ as the translation start codon and have a uniquely conserved flanking sequence call a Kozak sequence (CCGCCATGG).
Explanation: In addition, most of these genes have a high density of CG dinucleotides near the transcription start site. This region is referred to as a CpG island (p refers to the phosphodiester bond connecting the two nucleotides), which helps to identify the transcription initiation site of a eukaryotic gene. The poly-A signal can also help locate the final coding sequence.
5. Which of the following is untrue about Ab Initio–Based Programs for Gene Prediction?
a) The goal of the ab initio gene prediction programs is to discriminate exons from noncoding sequences
b) The goal is joining exons together in the correct order
c) The main difficulty is correct identification of exons
d) To predict exons, the algorithms rely solely on gene signals
Explanation: To predict exons, the algorithms rely on two features, gene signals and gene content. Signals include gene start and stop sites and putative splice sites, recognizable consensus sequences such as poly-A sites.
6. In Ab Initio–Based Programs for Gene Prediction– Gene content refers to coding statistics, which includes nonrandom nucleotide distribution, amino acid distribution, synonymous codon usage, and hexamer frequencies.
Explanation: Among these features, the hexamer frequencies appear to be most discriminative for coding potentials. To derive an assessment for this feature,HMMscan be used, which require proper training. In addition to HMMs, neural network-based algorithms are also common in the gene prediction field.
7. Which of the following is untrue about PredictionUsing NeuralNetworks for Gene Prediction?
a) A neural network is a statistical model with a special architecture for pattern recognition and classification
b) It is composed of a network of mathematical variables
c) They resembles ab initio approaches
d) The variables in NeuralNetworks resemble the biological nervous system, with variables or nodes connected by weighted functions that are analogous to synapses
Explanation: Aspect of the model that makes it look like a biological neural network is its ability to “learn” and then make predictions after being trained. The network is able to process information and modify parameters of the weight functions between variables during the training stage. Once it is trained, it is able to make automatic predictions about the unknown. This is quite different than the ab initio methods.
8. Which of the following is untrue about Prediction Using Neural Networks for Gene Prediction?
a) A neural network is constructed with multiple layers; the input, output, and hidden layers
b) The input is the gene sequence with intron and exon signals
c) The model is not fed with a sequence of known gene structure
d) The output is the probability of an exon structure
Explanation: Between input and output, there may be one or several hidden layers where the machine learning takes place. The machine learning process starts by feeding the model with a sequence of known gene structure. The gene structure information is separated into several classes of features such as hexamer frequencies, splice sites, and GC composition during training. The weight functions in the hidden layers are adjusted during this process to recognize the nucleotide patterns and their relationship with known structures.
9. GRAIL is a web-based program that is based on a neural network algorithm Which is trained on several statistical features such as splice junctions, start and stop codons, poly-A sites, promoters, and CpG islands.
Explanation: The program scans the query sequence with windows of variable lengths and scores for coding potentials and finally produces an output that is the result of exon candidates. The program is currently trained for human, mouse, Arabidopsis, Drosophila, and Escherichia coli sequences.
10. Which of the following is untrue about Prediction Using Discriminant Analysis for Gene Prediction?
a) QDA draws a curved line based on a quadratic function
b) LDA works by drawing a diagonal line that best separates coding signals from noncoding signals based on knowledge learned from training data sets of known gene structures
c) Some gene prediction algorithms rely on discriminant analysis, either LDA or quadratic discriminant analysis (QDA), to improve accuracy
d) LDA works by plotting a three-dimensional graph of coding signals versus all potential 3’ splice site positions
Explanation: QDA draws a curved line based on a quadratic function instead of drawing a straight line to separate coding and noncoding features. This strategy is designed to be more flexible and provide a more optimal separation between the data points.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics, here is complete set of 1000+ Multiple Choice Questions and Answers.