This set of Bioinformatics written test Questions & Answers focuses on “Comparative Genomics – 2”.
a) The problem of deciding which sequences to include in the same group or cluster and which to separate into different groups or clusters is a recurring one
b) Divergence is necessary, but the sequences chosen should be clearly related based on inspection of each pair-wise alignment and a statistical analysis
c) The conservative approach is to group distinct sequences
d) The adventurous approach is to choose a set of marginally alignable sequences to pursue the difficult task of making a multiple sequence alignment and then to make profile models that may recognize divergence but will also give false predictions
Explanation: The conservative approach is to group only very similar sequences together. However, in making a conservative multiple sequence alignment with only very alike sequences, it is not possible to analyze the evolutionary divergence that may have occurred in a family of proteins. Furthermore, if a matrix or profile model is made from this alignment, that model will not be useful for identifying more divergent members of a family.
2. Which of the given statements is incorrect about Clusters of orthologous groups?
a) Using the protein from one of the organisms to search the proteome of the other for high-scoring matches should identify the ortholog as the highest- scoring match, or best hit
b) When entire proteomes of the two organisms are available, orthologs may be identified
c) a pair of orthologous genes in two organisms share so much sequence similarity that they may be assumed to have arisen from a common ancestor gene
d) each of the orthologs belongs to a family composed of paralogous sequences but irrelevant or not related to each other
Explanation: In many cases, each of the orthologs belongs to a family composed of paralogous sequences related to each other by gene duplication events. Hence, in the above database search, the ortholog will not only match the orthologous sequence in the second proteome but also these other paralogous sequences. The objective of the clusters of orthologous groups (COG) approach is to identify all matching proteins in the organisms; defined as an orthologous group related by both speciation and gene duplication events.
3. Which of the given statements is incorrect about Clusters of orthologous groups?
a) Paralogs may include a best hit or a high-scoring match of one of the sequences by another, but the reciprocal match can have low similarity that does not have to be significant
b) Paralogs defined by sets of three matching sequences in the selected organisms were kept separated from the clusters
c) Orthologous pairs were first defined by the best hits in reciprocal searches
d) To produce COGs, similarity searches were performed among the proteomes of phylogenetically distinct clades of prokaryotes
Explanation: Paralogs defined by sets of three matching sequences in the selected organisms were also added to these clusters. Sixty percent of the original set of 720 COGs does not include paralogs, or includes paralogs from one lineage only, suggesting that there has not been extensive duplication of this group.
4. Which of the given statements is incorrect about the Comparison of proteomes to EST databases of an organism?
a) ESTs are single DNA sequence reads that contain a small fraction of incorrect base assessments, insertions, and deletions
b) Many sequences arise from near the 5’ end of the mRNA, although every effort is usually made to read as far 3’ as possible into the upstream portion of the cDNA
c) EST libraries are useful for preliminary identification of genes by database similarity searches
d) An EST database of an organism can be analyzed for the presence of gene families,
orthologs, and paralogs
Explanation: Many sequences arise from near the 3’ end of the mRNA, although every effort is usually made to read as far 5’ as possible into the upstream portion of the cDNA. Because not all of the genes may be expressed in the tissues chosen for analysis, the library will often not be complete.
a) Searches of EST databases for matches to a query sequence routinely produce minimal amounts of output that must be searched manually for significant hits
b) ESTs with a high percent identity with the query sequence, a long alignment with the query sequence, and a very low E value of the alignment score represent groups of paralogous and orthologous genes
c) To identify orthologs as the most closely related sequence, ESTs were aligned using the amino acid alignment as a guide
d) To identify orthologs as the most closely related sequence, a phylogenetic tree was produced by the maximum likelihood method
Explanation: The Searches of EST databases for matches to a query sequence routinely produce large amounts of output that must be searched manually for significant hits. an automatic method was described in 1999 utilizing a computer script, FAST-PAN, that scans EST databases with multiple queries from a protein family, sorts the alignment scores, and produces charts and alignments of the matches found.
6. Which of the given statements is incorrect about Family and Domain Analysis?
a) Gene identification of predicted proteins in the genome is designed to discover the metabolic features of an organism
b) In a particular organism or group of organisms, one particular domain can be expanded to perform a particular function
c) Comparison of the domain content of an entire proteome with that of another proteome cannot help in revealing the biological roles of diverse domains in different organisms
d) Different proteins are mosaics of domains that occur in different combinations in a given protein
Explanation: In a particular organism or group of organisms, various domains can be expanded to perform a particular function. More than 2000 fly and worm proteins are multidomain proteins, compared to about one-third this number in yeast.
7. Which of the given statements is incorrect about Ancient Conserved Regions?
a) The method involves database similarity searches of the SwissProt database with human, worm, yeast, or E. coli genes and identification of matches with sequences from a different phylum than the query sequence
b) An analysis of ACRs that predate the radiation of the major animal phyla some 580–540 million years ago suggested that 50–60% of coding sequences are ACRs
c) These ACRs may represent proteins present at the time of the prokaryotic–eukaryotic divergence
d) Phylogenetically diverse groups of organisms have been analyzed for the presence of conserved proteins and protein domains that have been conserved over long periods of evolutionary time, called ancient conserved regions or ACRs
Explanation: The analysis of ACRs 580–540 million years ago suggested that 20–40% of coding sequences are ACRs. For example, a search with 1916 E. coli proteins detected 266 ACRs found in 439 sequences, roughly one-quarter of the SwissProt database.
8. Which of the given statements is incorrect about Horizontal Gene Transfer?
a) The genomes of most organisms are derived by vertical transmission, the inheritance of chromosomes from parents to offspring from one generation to the next
b) It is the acquisition of genetic material from a different organism
c) The transferred material becomes a temporary addition to the recipient genome
d) An extreme example is the proposed endosymbiont origin of mitochondria in eukaryotic cells and chloroplasts in plants
Explanation: The transferred material becomes a permanent addition to the recipient genome. Although these exchanges do not occur very often on a generation-to-generation basis, a significant number can occur over a period of hundreds of millions of years.
a) It is a significant source of genome variation in bacteria, allowing them to exploit new environments
b) Such transfer is rendered possible by a variety of natural mechanisms in bacteria for transferring DNA from one species to another
c) Detection of HT is made possible by the fact that each genome of each bacterial species has a unique base composition
d) The time of transfer of DNA cannot be estimated by the composition of the HT DNA
Explanation: The time of transfer of DNA may be estimated by the degree to which the composition of the HT DNA has blended into that of the recipient genome. Transfer of a portion of a genome from one organism to another can generally be detected as an island of sequence of different composition in the recipient. If the amino acid composition of transferred genes is typical, these islands may be detected by a codon usage analysis.
10. Annotation is based on finding significant alignment to sequences of known function in database similarity searches.
Explanation: Accurate annotation of genome sequences is an important first step in genome analysis. Matches of lesser significance provide only a tentative or hypothetical prediction and should be used as a working hypothesis of function.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all written questions on Bioinformatics, here is complete set of 1000+ Multiple Choice Questions and Answers.