This set of Bioinformatics Multiple Choice Questions & Answers (MCQs) focuses on “Phylogenetic Tree Evaluation”.
1. Which of the following is incorrect about Bootstrapping?
a) It is a statistical technique that tests the sampling errors of a phylogenetic tree
b) It does the tests by repeatedly sampling trees through slightly perturbed datasets
c) A newly constructed tree is not biased at all
d) The robustness of the original tree can be assessed here
Explanation: The rationale for bootstrapping is that a newly constructed tree is possibly biased owing to incorrect alignment or chance fluctuations of distance measurements. To determine the robustness or reproducibility of the current tree, trees are repeatedly constructed with slightly perturbed alignments that have some random fluctuations introduced.
2. A truly robust phylogenetic relationship should have enough characters to support the relationship even if the dataset is perturbed in such away.
Explanation: Otherwise, the noise introduced in the resampling process is sufficient to generate different trees, indicating that the original topology may be derived from weak phylogenetic signals. Thus, this type of analysis gives an idea of the statistical confidence of the tree topology.
3. Which of the following is incorrect about nonparametric bootstrapping?
a) A new multiple sequence alignment of the same length is generated with random duplication of some of the sites
b) A new multiple sequence alignment of the distinct lengths is generated with random duplication of some of the sites
c) Certain sites are randomly replaced by other existing sites
d) Certain sites may appear multiple times, and other sites may not appear at all in the new alignment
Explanation: In nonparametric bootstrapping, a new multiple sequence alignment of the same length is generated with random duplication of some of the sites (i.e., the columns in an alignment) at the expense of some other sites. This process is repeated 100 to 1,000 times to create 100 to 1,000 new alignments that are used to reconstruct phylogenetic trees using the same method as the originally inferred tree.
4. Which of the following is incorrect about nonparametric bootstrapping?
a) All the bootstrapped trees are summarized into a consensus tree based on a majority rule
b) The most supported branching patterns shown at each node are labeled with bootstrap values
c) The most supported branching patterns are the percentage of appearance of a particular clade.
d) This test doesn’t provide a measure for evaluating the confidence levels of the tree topology.
Explanation: The bootstrap test provides a measure for evaluating the confidence levels of the tree topology. Analysis has shown that a bootstrap value of 70% approximately corresponds to 95% statistical confidence, although the issue is still a subject of debate.
5. Which of the following is incorrect about Caveats?
a) Unusually high GC content in the original dataset is the potential cause for generating biased trees
b) Unusually accelerated evolutionary rates is the potential cause for generating biased trees
c) Unusually accelerated evolutionary rates is the potential cause for generating biased bootstrap estimates
d) Not a large number of bootstrap re-sampling steps are needed to achieve yielding results
Explanation: In addition, from a statistical point of view, a large number of bootstrap resampling steps are needed to achieve meaningful results. It is generally recommended that a phylogenetic tree should be bootstrapped 500 to 1,000 times. However, this presents a practical dilemma.
6. Which of the following is incorrect statement?
a) In this method one half of the sites in a dataset are randomly deleted
b) It creates datasets half as long as the original
c) Each new dataset is subjected to phylogenetic tree construction using the different methods as the original
d) One criticism of this approach is that the size of datasets has been changed into one half and that the datasets are no longer considered replicates
Explanation: Each new dataset is subjected to phylogenetic tree construction using the same method as the original. The advantage of jackknifing is that sites are not duplicated relative to the original dataset and that computing time is much shortened because of shorter sequences.
7. Which of the following is incorrect about Bayesian Simulation?
a) It does not require bootstrapping
b) It requires bootstrapping
c) The MCMC procedure itself involves thousands or millions of steps of resampling
d) Posterior probabilities are assigned at each node of a best Bayesian tree as statistical support
Explanation: Because of fast computational speed of MCMC tree searching, the Bayesian method offers a practical advantage over regular ML and makes the statistical evaluation of ML trees more feasible. Unlike bootstrap values, Bayesian probabilities are normally higher because most trees are sampled near a small number of optimal trees. Therefore, they have a different statistical meaning from bootstrap.
8. In phylogenetic analysis, it is also important to test whether two competing tree topologies can be distinguished and whether one tree is significantly better than the other.
Explanation: The task is different from bootstrapping in that it tests the statistical significance of the entire phylogeny, not just portions of it. For that purpose, several statistical tests have been developed specifically for each of the three types of tree reconstruction methods, distance, parsimony, and likelihood. A test devised specifically for MP trees is called the Kishino–Hasegawa (KH) test.
9. The KH test sets out to test the null hypothesis that the two competing tree topologies are not significantly different.
Explanation: A paired Student t-test is used to assess whether the null hypothesis can be rejected at a statistically significant level. In this test, the difference of branch lengths at each informative site between the two trees is calculated.
10. In Shimodaira–Hasegawa Test, The degree of freedom used for the analysis depends on the substitution model used. It relies on the following test formula d = 2(ln LA – ln LB) = 2 ln(LA/LB). Here, is the log likelihood ratio score and ln LA and ln LB are likelihood scores for tree A and tree B, respectively.
Explanation: A frequently used statistical test for ML trees is the Shimodaira–Hasegawa (SH) test (likelihood ratio test). It tests the goodness of fit of two competing trees using the χ2 test. For this test, log likelihood scores of two competing trees have to be obtained first.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics, here is complete set of 1000+ Multiple Choice Questions and Answers.