This set of Bioinformatics Assessment Questions and Answers focuses on “Prediction Algorithms – 2”.
1. The eukaryotic transcription initiation is less dependent on transcription factors.
Explanation: The eukaryotic transcription initiation requires cooperation of a large number of transcription factors. Co-operativity means that the promoter regions tend to contain a high density of protein-binding sites. Thus, finding a cluster of transcription factor binding sites often enhances the probability of individual binding site prediction.
2. CpGProD is a web-based program that predicts promoters containing a high density of CpG islands_______
a) in archea genomic sequences
b) in mammalian genomic sequences
c) in eukaryotic and bacterial genomic sequences
d) only in bacterial genomic sequences
Explanation: It calculates moving averages of GC% and CpG ratios (observed/expected) over a window of a certain size (usually 200 bp). When the values are above a certain threshold, the region is identified as a CpG island.
3. Which of the following is incorrect regarding Eponine?
a) It is a web-based program that predicts transcription start sites
b) It is a web-based program that particularly predicts tranpososons and retropososons
c) The regulatory sites include the TATA box, the CCAAT box, and CpG islands
d) It is based on a series of pre-constructed PSSMs of several regulatory sites
Explanation: The query sequence from a mammalian source is scanned through the PSSMs. The sequence stretches with high-score matching to all the PSSMs, as well as matching of the spacing between the elements, are declared transcription start sites. A Bayesian method is also used in decision making.
4. Which of the following is incorrect regarding Cluster-Buster?
a) It is an HMM-based web-based program
b) A query sequence is scanned with a window size of 1 kb for putative regulatory motifs using motif HMMs
c) It works by detecting a region of high concentration of unknown transcription factor binding sites and regulatory motifs at the initiation
d) It is designed to find clusters of regulatory binding sites
Explanation: It works by detecting a region of high concentration of known transcription factor binding sites and regulatory motifs. If multiple motifs are detected within a window, a positive score is assigned to each motif found. The total score of the window is the sum of each motif score subtracting a gap penalty, which is proportional to the distances between motifs. If the score of a certain region is above a certain threshold, it is predicted to contain a regulatory cluster.
5. Which of the following is incorrect regarding First EF?
a) It is a program that predicts promoters for bacterial DNA
b) It is a web-based program that predicts promoters for human DNA
c) It stands for First Exon Finder
d) It integrates gene prediction with promoter prediction
Explanation: It uses quadratic discriminant functions (see Chapter 8) to calculate the probabilities of the first exon of a gene and its boundary sites. A segment of DNA (15 kb) upstream of the first exon is subsequently extracted for promoter prediction on the basis of scores for CpG islands.
6. McPromoter, a web-based program, uses a neural network to make promoter predictions.
Explanation: It has a unique promoter model containing six scoring segments. The program scans a window of 300 bases for the likelihoods of being in each of the coding, noncoding, and promoter regions.
7. The input for the neural network includes parameters for sequence physical properties, such as ______
a) DNA bendability
b) Signals such as the TATA box
c) Signals such as initiator box
d) Signals such as CpAA islands
Explanation: As seen, the correct answer is CpG in option d. The hidden layer combines all the features to derive an overall likelihood for a site being a promoter. Another unique feature is that McPromoter does not require that certain patterns must be present, but instead the combination of all features is important. For instance, even if the TATA box score is very low, a promoter prediction can still be made if the other features score highly. The program is currently trained for Drosophila and human sequences.
8. TSSW is a web program that distinguishes promoter sequences from non-promoter sequences based on a combination of unique content information such as hexamer/trimer frequencies and signal information such the TATA box in the promoter region.
Explanation: As mentioned here, TSSW uses unique content information such as hexamer/trimer frequencies and signal information such the TATA box in the promoter region. The values are fed to a linear discriminant function to separate true motifs from background noise.
9. Which of the following is incorrect regarding CONPRO?
a) It is a web-based program that uses a consensus method
b) It is used to identify promoter elements for human DNA
c) cDNA does not play a role in prediction
d) The program uses the information to search the human genome database for the position of the gene
Explanation: To use the program, a user supplies the transcript sequence of a gene (cDNA). It then uses the GENSCAN program to predict 5’ untranslated exons in the upstream region. Once the 5’-most exon is located, a further upstream region (1.5 kb) is used for promoter prediction, which relies on a combination of five promoter prediction programs, TSSG, TSSW, NNPP, PROSCAN, and PromFD.
10. In CONPRO, for each program, the highest score prediction is taken as the promoter in the region.
Explanation: If three predictions fall within a 100-bp region, this is considered a consensus prediction If no three-way consensus is achieved, TSSG and PromFD predictions are taken. Because no coding sequence is used in prediction, specificity is improved relative to each individual program.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics Assessment Questions, here is complete set of 1000+ Multiple Choice Questions and Answers.