This set of Bioinformatics Interview Questions and Answers for Experienced people focuses on “Assessing the Significance of Sequence Alignments”.
1. On analysis of the alignment scores of random sequences will reveal that the scores follow a different distribution than the normal distribution called the _________
a) Gumbel equal value distribution
b) Gumbel extreme value distribution
c) Gumbel end value distribution
d) Gumbel distribution
Explanation: Originally, the significance of sequence alignment scores was evaluated on the basis of the assumption that alignment scores followed a normal statistical distribution. If sequences are randomly generated in a computer by a Monte Carlo or sequence shuffling method, as in generating a sequence by picking marbles representing four bases or 20 amino acids out of a bag, the distribution may look normal at first glance. But on further analysis the above result was obtained.
2. The statistical analysis of alignment scores is much better understood for ________ than for _______
a) global alignments, local alignments
b) local alignments, global alignments
c) global alignments, any other alignment method
d) Needleman-Wunsch alignment, Smith-Waterman alignment
Explanation: Smith-Waterman alignment algorithm and the scoring system used to produce a local alignment are designed to reveal regions of closely matching sequence with a positive alignment score. In random or unrelated sequence alignments, these regions are rarely found. Hence, their presence in real sequence alignments is significant, and the probability of their occurring by chance alignment of unrelated sequences can be readily calculated.
3. When random or unrelated sequences are compared using a global alignment method, they can have ____________ reflecting the tendency of the global algorithm to match as many characters as possible.
a) very low scores
b) very high scores
c) moderate scores
d) low scores
Explanation: The significance of the scores of global alignments, is more difficult to determine. Using the Needleman-Wunsch algorithm and a suitable scoring system, there are many ways to produce a global alignment between any pair of sequences, and the scores of many different alignments may be quite similar hence the scores obtained might be unusually high.
4. Which of the following are not related to Needleman-Wunsch alignment algorithm?
a) Global alignment programs use this algorithm
b) The output is a positive number
c) Small changes in the scoring system can produce a different alignment
d) Changes in the scoring system can produce the same alignment
Explanation: In general, global alignment programs use the Needleman-Wunsch alignment algorithm and a scoring system that scores the average match of an aligned nucleotide or amino acid pair as a positive number. Hence, the score of the alignment of random or unrelated sequences grows proportionally to the length of the sequences. In addition, there are many possible different global alignments depending on the scoring system chosen, and small changes in the scoring system can produce a different alignment.
5. Waterman, in1989, provided a set of means and standard deviations of global alignment scores between random DNA sequences, using mismatch and gap penalties that produce a linear increase in score with _______ a distinguishing feature of global alignments.
a) alignment score
b) sequence score
c) sequence length
d) scoring system
Explanation: In the algorithm provided by Waterman, the score of the alignment of random or unrelated sequences grows proportionally to the length of the sequences. However, these values are of limited use because they are based on a simple gap scoring system.
6. Who suggested that the global alignment scores between unrelated protein sequences followed the extreme value distribution, similar to local alignment scores? And when?
a) Abagyan and Batalov, in 1981
b) Chvátal and Lipman, in 1984
c) Abagyan and Batalov, in 1997
d) Chvátal and Sankoff, in 1995
Explanation: Abagyan and Batalov, in 1997, suggested the above observation. However, since the scoring system that they used favored local alignments, these alignments they produced may not be global but local. Unfortunately, there is no equivalent theory on which to base an analysis of global alignment scores as there is for local alignment scores.
7. _______ analyzed the distribution of scores among 100 vertebrate nucleic acid sequences and compared these scores with randomized sequences prepared in different ways.
a) Lipman, in 1984
b) Batalov, in 1964
c) Waterman, in 1987
d) Lipman, in 1967
Explanation: When the randomized sequences were prepared by shuffling the sequence to conserve base composition, as was done by Dayhoff and others, the standard deviation was approximately one-third less than the distribution of scores of the natural sequences. Thus, natural sequences are more variable than randomized ones, and using such randomized sequences for a significance test may lead to an overestimation of the significance.
8. If the random sequences were prepared in a way that maintained the local base composition by producing them from overlapping fragments of sequence, the distribution of scores has a_______ standard deviation that is closer to the distribution of the natural sequences.
Explanation: The conclusion from the above is that the presence of conserved local patterns can influence the score in statistical tests such that an alignment can appear to be more significant than it actually is. Although this study was done using the Smith-Waterman algorithm with nucleic acids, the same cautionary note applies for other types of alignments.
9. The GCG alignment programs have a RANDOMIZATION option, which shuffles the second sequence and calculates similarity scores between the unshuffled sequence and each of the shuffled copies.
Explanation: If the new similarity scores are significantly smaller than the real alignment score, the alignment is considered significant. This analysis is only useful for providing a rough approximation of the significance of an alignment score and can easily be misleading.
10. Dayhoff, 1978- 1983, devised a second method for testing the relatedness of two protein sequences that can accommodate some local variation. Where this method is useful?
a) For finding repeated regions within a sequence
b) For finding similar regions that are in a different order in two sequences
c) For finding small conserved region such as an active site
d) For finding huge regions within sequences
Explanation: As used in a computer program called RELATE (Dayhoff 1978), all possible segments of a given length of one sequence are compared with all segments of the same length from another. An alignment score using a scoring matrix is obtained for each comparison to give a score distribution among all of the segments. A segment comparison score in standard deviation units is calculated as the difference between the values for real sequences minus the average value for random sequences divided by the standard deviation of the scores from the random sequences.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics for Interviews, here is complete set of 1000+ Multiple Choice Questions and Answers.