This set of Bioinformatics Multiple Choice Questions & Answers (MCQs) focuses on “Statistical Significance of Sequence Alignment”.

1. The truly statistically significant sequence alignment will be able to provide evidence of homology between the sequences involved.

a) True

b) False

Explanation: When given a sequence alignment showing a certain degree of similarity, it is often important to determine whether the observed sequence alignment can occur by random chance or the alignment is indeed statistically sound. When a statistically significant sequence alignment is under consideration, it will be able to provide evidence of homology between the sequences involved.

2. By calculating alignment scores of a large number of ______ sequence pairs, a distribution model of the ______ sequence scores can be derived.

a) related, randomized

b) unrelated, randomized

c) unrelated, unrandomized

d) related, unrandomized

Explanation: Solving the statistical significance problem requires a statistical test of the alignment scores of two unrelated sequences of the same length. From the distribution, a statistical test can be performed based on the number of standard deviations from the average score.

3. Many studies have demonstrated that the distribution of similarity scores assumes a peculiar shape that resembles a highly skewed normal distribution with a long tail on one side. The distribution matches the _______

a) Gumble elective value distribution

b) Gumble extreme void distribution

c) Gumble end value distribution

d) Gumble extreme value distribution

Explanation: The mentioned Distribution pattern matches the Gumble extreme value distribution for which a mathematical expression is available. This means that, given a sequence similarity value, by using the mathematical formula for the extreme distribution, the statistical significance can be accurately estimated.

4. Which of the following is a part of the statistical test of sequences?

a) An optimal alignment between two chosen sequences is obtained at the end

b) Unrelated sequences of the same length are then generated through a randomization process

c) Unrelated sequences of the different length are then generated through a randomization process

d) Related sequences of the same length are then generated through a randomization process

Explanation: Unrelated sequences of the same length are then generated through a randomization process in which one of the two sequences is randomly shuffled. And the next step is that a new alignment score is computed for the shuffled sequence pair.

5. In the statistical test, randomization process in which one of the two given sequences is randomly shuffled.

a) True

b) False

Explanation: After the mentioned step, computation for the alignment score for the shuffled sequence pair is done. Further, More such scores are similarly obtained through repeated shuffling.

6. What is used to generate parameters for the extreme distribution?

a) The pool of alignment scores from the shuffled sequences

b) A single score of a shuffled sequence

c) The pool of alignment scores from the unshuffled sequences

d) The basic optimal score computed at the beginning of the test

Explanation: Maximum scores are obtained through repeated shuffling. Then the pool of alignment scores from the shuffled sequences is used to generate parameters for the extreme distribution. The original alignment score is then compared against the distribution of random alignments to determine whether the score is beyond random chance.

7. If the score is located in the extreme margin of the distribution, that means that the alignment between the two sequences is ______ due to random chance and is thus considered ______

a) unlikely, significant

b) unlikely, insignificant

c) unlikely, insignificant

d) very likely, significant

Explanation: The extreme margin of the distribution denotes the likeliness and thus significance. A P-value is given to indicate the probability that the original alignment is due to random chance.

8. It is not known whether the Gumble distribution applies equally well to gapped alignments.

a) True

b) False

Explanation: The statistics in the test were derived from ungapped local sequence alignments. Hence, it is not known whether the Gumble distribution applies equally well to gapped alignments. However, for all practical purposes, it is reasonable to assume that scores for gapped alignments essentially fit the same distribution. A frequently used software program for assessing statistical significance of a pairwise alignment is the PRSS program.

9. Which of the following is untrue about the PRSS program?

a) It stands for Probability of Random Shuffles

b) It is a web-based program that can be used to evaluate the statistical significance of DNA or protein sequence alignment

c) It first aligns two sequences using the Needleman-Wunsch algorithm and calculates the score

d) It holds one sequence in its original form and randomizes the order of residues in the other sequence.

Explanation: It first aligns two sequences using the Smith–Waterman algorithm and calculates the score. The shuffled sequence is realigned with the unshuffled sequence. The resulting alignment score is recorded. This process is iterated many (normally 1,000) times to help generate data for fitting the Gumble distribution.

10. The major disadvantage of the PRSS program is that it doesn’t allow partial shuffling.

a) True

b) False

Explanation: The major feature of the program is that it allows partial shuffling. For example, shuffling can be restricted to residues within a local window of 25–40, whereas the residues outside the window remain unchanged.

