This set of Bioinformatics Interview Questions and Answers for freshers focuses on “Motif Discovery in Unaligned Sequences”.
1. For what type of sequences Gibbs sampling is used?
a) Closely related sequences
b) Distinctly related sequences
c) Distinctly related sequences that share common motifs
d) Closely related sequences that share common motifs
Explanation: Often, distantly related sequences that share common motifs cannot be readily aligned. For example, the sequences for the helix-turn-helix motif in transcription factors can be subtly different enough that traditional multiple sequence alignment approaches fail to generate a satisfactory answer. For detecting such subtle motifs, more sophisticated algorithms such as expectation maximization (EM) and Gibbs sampling are used.
2. Which of the following is untrue about Expectation Maximization (EM) method?
a) It is used to find hidden motifs
b) The method works by first making a random or guessed alignment of the sequences to generate a trial PSSM
c) The trial PSSM is used to compare with each sequence individually
d) The log odds scores of the PSSM are modified at the end of the process
Explanation: The log odds scores of the PSSM are modified in each iteration to maximize the alignment of the matrix to each sequence. During the iterations, the sequence pattern for the conserved motifs is gradually “recruited” to the PSSM.
3. Which of the following is true about Expectation Maximization (EM) method?
a) The log odds scores of the PSSM are modified at the end of the process
b) The procedure stops prematurely if the scores reach convergence
c) The final result is not sensitive to the initial alignment
d) Local optimum is an advantage of EM method
Explanation: The final result is sensitive to the initial alignment. The Local optimum is actually a drawback of EM method. It is same as the fact that the procedure stops prematurely if the scores reach convergence.
4. MEME stands for _____________
a) Multiple Expectation Maximization for Motif Elicitation
b) Multiple Expectation Maximization for Motif Extraction
c) Mega Expectation Maximization for Motif Elicitation
d) Micro Expectation Maximization for Motif Extraction
Explanation: Multiple Expectation Maximization for Motif Elicitation is a web-based program that uses the EM algorithm to find motifs either for DNA or protein sequences. It uses amodified EM algorithm to avoid the local minimum problem.
5. In the web-based program MEME, the computation is a _____ step procedure.
Explanation: In constructing a probability matrix, it allows multiple starting alignments and does not assume that there are motifs in every sequence. Also, the computation is a two-step procedure which includes generation of sequence motif and finding highest score.
6. Gibbs is a web-based program that uses the Gibbs sampling approach to look for _____ gap-free segments for either DNA or protein sequences.
a) short, partially conserved
b) long, partially conserved
c) long, conserved
d) short, not conserved
Explanation: Gibbs sampling approach to look for short, partially conserved gap-free segments for either DNA or protein sequences. To ensure accuracy, more than twenty sequences of the exact same length should be used.
7. A multiple sequence alignment or a motif is often represented by a graphic representation
called a ______
Explanation: In a logo, each position consists of stacked letters representing the residues appearing in a particular column of a multiple alignment. This graphic representation called a logo.
8. The overall height of a logo position reflects how conserved the position is, and the _____ of each letter in a position reflects the _______ of the residue in the alignment.
a) height, relative frequency
b) width, relative frequency
c) height, amplitude
d) width, amplitude
Explanation: The height expresses the data about the extent of the conservation of the position and each letter shows the frequency of that particular residue. The amplitude, here in this case, is irrelevant option.
9. Conserved positions have _____ residues and bigger symbols.
Explanation: The options maximum and minimum are comparatively obsolete as there involves the studies of alignment. Conserved positions have fewer residues and bigger symbols; whereas less conserved positions have a more heterogeneous mixture of smaller symbols stacked together. In general, a sequence logo provides a clearer description of a consensus sequence.
10. _______ is an interactive program for generating sequence logos.
Explanation: In WebLogo, a user needs to enter the sequence alignment in FASTA format to allow the program to compute the logos. A graphic file is returned to the user as a result.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all areas of Bioinformatics for Interviews, here is complete set of 1000+ Multiple Choice Questions and Answers.