This set of Bioinformatics Questions and Answers for Aptitude test focuses on “Gene Prediction in Eukaryotes – 2”.
1. Which of the following is untrue about FGENES?
a) It stands for FindGenes
b) It is a web-based program that uses LDA
c) It is used to determine whether a signal is an exon
d) It does not make a use of HMMs
Explanation: In addition to FGENES, there are many variants of the program. Some programs, such as FGENESH, make use of HMMs. There are others, such as FGENESH C, that are similarity based. Some programs, such as FGENESH+, combine both ab initio and similarity-based approaches.
2. GENSCAN is awebbased program that makes predictions based on fifth-order HMMs.
Explanation: It combines hexamer frequencies with coding signals (initiation codons, TATA box, cap site, poly-A, etc.) in prediction. Putative exons are assigned a probability score (P) of being a true exon. Only predictions with P > 0.5 are deemed reliable. This program is trained for sequences from vertebrates, Arabidopsis, and maize. It has been used extensively in annotating the human genome.
3. Which of the following wrong about HMM GENE?
a) It is also an HMM-based web program
b) It uses a criterion called the conditional maximum likelihood to discriminate coding from non-coding features
c) HMM prediction is unbiased toward the locked region
d) If a sequence already has a sub-region identified as coding region, which may be based on similarity with cDNAs or proteins in a database, these regions are locked as coding regions
Explanation: An HMM prediction is subsequently made with a bias toward the locked region and is extended from the locked region to predict the rest of the gene coding regions and even neighboring genes. The program is in a way a hybrid algorithm that uses both ab initio-based and homology-based criteria.
4. Which of the following is untrue about Homology-Based Programs?
a) They are based on the fact that exon structures and exon sequences of related species are less conserved
b) This approach assumes that the database sequences are correct
c) It is a reasonable assumption in light of the fact that many homologous sequences to be compared with are derived from cDNA or expressed sequence tags (ESTs) of the same species
d) Potential coding frames in a query sequence are translated and used to align with closest protein homologs found in databases
Explanation: Homology-based programs are based on the fact that exon structures and exon sequences of related species are highly conserved. When potential coding frames in a query sequence are translated and used to align with closest protein homologs found in databases, near perfectly matched regions can be used to reveal the exon boundaries in the query.
5. The drawback of Homology-based approach is its reliance on the presence of homologs in databases.
Explanation: If the homologs are not available in the database, the method cannot be used. Novel genes in a new species cannot be discovered without matches in the database. A number of publicly available programs use this approach.
6. GenomeScan is a web-based server that combines GENSCAN prediction results with BLASTX similarity searches.
Explanation: The user provides genomic DNA and protein sequences from related species. The genomic DNA is translated in all six frames to cover all possible exons. The translated exons are then used to compare with the user-supplied protein sequences.
7. Which of the following is untrue about EST2Genome?
a) It is a web-based program purely based on the sequence alignment approach to define intron–exon boundaries
b) It compares an EST (or cDNA) sequence with a genomic DNA sequence containing the corresponding gene
c) The alignment is rarely done using a dynamic programming–based algorithm
d) Advantage of the approach is the ability to find very small exons and alternatively spliced exons that are very difficult to predict by any ab initio–type algorithms
Explanation: The alignment is done using a dynamic programming–based algorithm. Another advantage is that there is no need for model training, which provides much more flexibility for gene prediction. The limitation is that EST or cDNA sequences often contain errors or even introns if the transcripts are not completely spliced before reverse transcription.
8. Which of the following is untrue about SGP-1?
a) The program translates all potential exons in each sequence and does pair wise alignment for the translated protein sequences using a dynamic programming approach
b) The near-perfect matches at the protein level define coding regions
c) It is a similarity-based web program that aligns two genomic DNA sequences from distinctly related organisms
d) It stands for Syntenic Gene Prediction
Explanation: It aligns two genomic DNA sequences from closely related organisms. Similar to EST2Genome, there is no training needed. The limitation is the need for two homologous sequences having similar genes with similar exon structures; if this condition is not met, a gene escapes detection from one sequence when there is no counterpart in another sequence.
9. TwinScan is also a similarity-based gene-finding Server and it is similar to GenomeScan in that it uses GenScan to predict all possible exons from the genomic sequence.
Explanation: The putative exons are used for BLAST searching to find closest homologs. The putative exons and homologs from BLAST searching are aligned to identify the best match. Only the closest match from a genome database is used as a template for refining the previous exon selection and exon boundaries.
10. Because different prediction programs have different levels of sensitivity and specificity, it makes sense to combine results of multiple programs based on consensus. This idea has prompted development of consensus-based algorithms.
Explanation: These programs work by retaining common predictions agreed by most programs and removing inconsistent predictions. Such an integrated approach may improve the specificity by correcting the false positives and the problem of over prediction. However, since this procedure punishes novel predictions, it may lead to lowered sensitivity and missed predictions.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics for Aptitude test, here is complete set of 1000+ Multiple Choice Questions and Answers.