Bioinformatics Questions and Answers – Statistical Methods for Aiding Alignment

This set of Bioinformatics Multiple Choice Questions & Answers (MCQs) focuses on “Statistical Methods for Aiding Alignment”.

1. The Expectation Maximization algorithm has been used to identify conserved domains in unaligned proteins only.
a) True
b) False
View Answer

Answer: b
Explanation: This algorithm has been used to identify both conserved domains in unaligned proteins and protein-binding sites in unaligned DNA sequences (Lawrence and Reilly 1990), including sites that may include gaps (Cardon and Stormo 1992). Given are a set of sequences that are expected to have a common sequence pattern and may not be easily recognizable by eye.

2. Which of the following is untrue regarding Expectation Maximization algorithm?
a) An initial guess is made as to the location and size of the site of interest in each of the sequences, and these parts of the sequence are aligned
b) The alignment provides an estimate of the base or amino acid composition of each column in the site
c) The column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences
d) The row-by-column composition of the site already available is used to estimate the probability
View Answer

Answer: d
Explanation: The EM algorithm then consists of two steps, which are repeated consecutively. In step 1, the expectation step, the column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences. These probabilities are used in turn to provide new information as to the expected base or amino acid distribution for each column in the site.

3. Out of the two repeated steps in EM algorithm, the step 2 is ________
a) the maximization step
b) the minimization step
c) the optimization step
d) the normalization step
View Answer

Answer: a
Explanation: In step 2, the maximization step, the new counts of bases or amino acids for each position in the site found in step 1 are substituted for the previous set. Step 1 is then repeated using these new counts. The cycle is repeated until the algorithm converges on a solution and does not change with further cycles. At that time, the best location of the site in each sequence and the best estimate of the residue composition of each column in the site will be available.
advertisement
advertisement

4. In EM algorithm, as an example, suppose that there are 10 DNA sequences having very little similarity with each other, each about 100 nucleotides long and thought to contain a binding site near the middle 20 residues, based on biochemical and genetic evidence. the following steps would be used by the EM algorithm to find the most probable location of the binding sites in each of the ______ sequences.
a) 30
b) 10
c) 25
d) 20
View Answer

Answer: b
Explanation: When examining the EM program MEME, the size and number of binding sites, the location in each sequence, and whether or not the site is present in each sequence do not necessarily have to be known. For the present example, the following steps would be used by the EM algorithm to find the most probable location of the binding sites in each of the 10 sequences.

5. In the initial step of EM algorithm, the 20-residue-long binding motif patterns in each sequence are aligned as an initial guess of the motif.
a) True
b) False
View Answer

Answer: a
Explanation: The base composition of each column in the aligned patterns is then determined. The composition of the flanking sequence on each side of the site provides the surrounding base or amino acid composition for comparison. Each sequence is assumed to be the same length and to be aligned by the ends.
Note: Join free Sanfoundry classes at Telegram or Youtube

6. In the intermediate steps of EM algorithm, the number of each base in each column is determined and then converted to fractions.
a) True
b) False
View Answer

Answer: a
Explanation: For example, that there are four Gs in the first column of the 10 sequences, then the frequency of G in the first column of the site, fSG = 4/10 = 0.4. This procedure is repeated for each base and each column.

7. For the 10-residue DNA sequence example, there are _______ possible starting sites for a 20-residue-long site.
a) 30
b) 21
c) 81
d) 60
View Answer

Answer: c
Explanation: For the 10-residue DNA sequence example, there are 100 – 20 +1 possible starting sites for a 20-residue-long site. Where the first one is at position 1 in the sequence ending one at 20 and the last beginning at position 81 and ending at 100 (there is not enough sequence for a 20-residue-long site beyond position 81).
advertisement

8. An alternative method is to produce an odds scoring matrix calculated by dividing each base frequency by the background frequency of that base.
a) True
b) False
View Answer

Answer: a
Explanation: In this method, the probability of each location is then found by multiplying the odds scores from each column. An even simpler method is to use log odds scores in the matrix. The column scores are then simply added. In this case, the log odds scores must be converted to odds scores before position probabilities are calculated.

9. Which of the following about MEME is untrue?
a) It is a Web resource for performing local MSAs (Multiple Sequence Alignment) by the above expectation maximization method is the program MEME
b) It stands for Multiple EM for Motif Elicitation
c) It was developed at developed at the University of California at San Diego Supercomputing Center
d) The Web page has multiple versions for searching blocks by an EM algorithm
View Answer

Answer: d
Explanation: The Web page for two versions of MEME, ParaMEME, a Web program that searches for blocks by an EM algorithm (Described below), and a similar program MetaMEME (which searches for profiles using HMMs, described below).The Motif Alignment and Search Tool (MAST) for searching through databases for matches to motifs.
advertisement

10. Which of the following about the Gibbs sampler is untrue?
a) It is a statistical method for finding motifs in sequences
b) It is dissimilar to the principle of the EM method
c) It searches for the statistically most probable motifs
d) It can find the optimal width and number of given motifs in each sequence
View Answer

Answer: b
Explanation: It is another statistical method for finding motifs in sequences is the Gibbs sampler. The method is similar in principle to the EM method described above, but the algorithm is different. A combinatorial approach of the Gibbs sampler and MOTIF may be used to make blocks at the BLOCKS Web site.

Sanfoundry Global Education & Learning Series – Bioinformatics.

To practice all areas of Bioinformatics, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.