This set of Tough Bioinformatics Interview Questions & Answers focuses on “Sequence Assembly and Gene Identification – 3”.
1. Which of the given statement is incorrect about proteome analyses?
a) For BLAST, setting an effective database size appropriate for each search and program is important
b) Due to the large number of comparisons that must be made in these types of analyses and due to the volume of program output, the procedure must be automated on a local machine using Perl scripts or a similar method and a database system
c) BLAST is used for obtaining a correct statistical evaluation of alignment scores
d) BLAST does not give statistical evaluation of alignment scores
Explanation: Each protein encoded by the genome is used as a query in database similarity searches to identify similar database proteins, some having a known structure or function. Additional searches of EST databases can be used to identify additional relatives of the query sequence.
2. An all-against-all analysis requires first making a database of the proteome. This database is then sequentially searched by each individual protein sequence of the proteome using a rapid database similarity search tool such as _______
Explanation: P values of WU-BLAST are similar to E values of NCBI BLAST (Rubin et al. 2000) for values of P and E < 0.05. This analysis generates a matrix of alignment scores, each with an E value and corresponding alignment for each pair of proteins. Also, the E value of an alignment score is the probability that an alignment score as good as the one found would be observed between two random or unrelated sequences in a search of a database of the same size.
3. Evolutionary modeling can include a various types of analyses. Which is mostly not one of them?
a) The prediction of chromosomal rearrangements
b) Eu/Hetero-chromatin structures
c) Duplications at gene, chromosomal and full genome level
d) Duplications at the protein domain level
Explanation: Option b indicates the structural studies. Also as mentioned, analysis of the prediction of chromosomal rearrangements that preceded the present arrangement is done (e.g., a comparison of mouse and human chromosomes).
4. Which of the given statement is incorrect about Clusters of functionally related genes?
a) In microbial genomes, genes specifying a metabolic pathway may be contiguous on the genome where they are coregulated transcriptionally in an operon by a common promoter
b) In related organisms, gene order on the chromosome is least likely to be conserved
c) As the relationship between the organisms decreases, local groups of genes remain clustered together, but chromosomal rearrangements move the clusters to other locations
d) The function of a particular gene can sometimes be predicted, given the known function of a neighboring, closely linked gene
Explanation: In related organisms, both gene content of the genome and gene order on the chromosome are likely to be conserved.
5. Which of the given statement is incorrect about Orthologs?
a) In comparing two proteomes, a common standard is to require that for each pair of orthologs, the first of the pair is the best hit when the second is used to query the proteome of the first
b) To identify orthologs, each protein in the proteome of an organism is used as a query in a similarity search of a database comprising the proteomes of only one different organism
c) The best hit in each proteome is likely to be with an ortholog of the query gene
d) Orthologs are genes that are so highly conserved by sequence in different genomes that the proteins they encode are strongly predicted to have the same structure and function and to have arisen from a common ancestor through speciation
Explanation: To identify orthologs, each protein in the proteome of an organism is used as a query in a similarity search of a database comprising the proteomes of one or more different organisms.
6. In protein/domain analysis, each protein in the predicted proteome is again used as a query of a curated protein sequence database such as ____ in order to locate similar domains and sequences. To find orthologs, very low E value scores (E<10<20) for the alignment score and an alignment that includes 60–80% of the query sequence are generally required in order to avoid matches to paralogs.
Explanation: The domain composition of each protein is also determined by searching for matches in domain databases such as Interpro. The analysis reveals how many domains and domain combinations are present in the proteome, and reveals any unusual representation that might have biological significance. The number of expressed genes in each family can also be compared to the number in other organisms to determine whether or not there has been an expansion of the family in the genome.
7. In all-against-all self comparison, A comparison is made in which every protein is used as a query in a similarity search against a database composed of the rest of the proteome, and the significant matches are identified by a low expect value.
Explanation: Many proteins comprise different combinations of a common set of domains, proteins that align along most of their lengths (80% identity is a conservative choice). Hence they are chosen to select those that have a conserved domain structure.
8. Processed pseudogenes are also derived from a functional gene and they contain introns and a promoter.
Explanation: Processed pseudogenes are also derived from a functional gene, but they do not contain introns and lack a promoter; hence, they are not expressed. The origin of these pseudogenes is probably due to reverse transcription of the mRNA of the functional gene and insertion of the cDNA copy into a new chromosomal location by a LINE1 reverse transcriptase.
9. Pseudogenes are DNA sequences that were derived from distinct genes but which have acquired mutations that are deleterious to function in the same period of time.
Explanation: Pseudogenes are DNA sequences that were derived from a functional copy of a gene but which have acquired mutations that are deleterious to function. For e.g. the pseudogene TRY5 is similar to the nearby functional gene TRY4.
10. New gene functions are thought to be gained by duplication of an existing gene creating two tandem copies.
Explanation: Functional differentiation then occurs between the copies by mutation and selection. However, because most mutations are deleterious, and because only one gene copy may be needed for function, there is a strong tendency of one copy to accumulate mutations that render the gene nonfunctional.
Sanfoundry Global Education & Learning Series – Bioinformatics.
To practice all tough interview questions on Bioinformatics, here is complete set of 1000+ Multiple Choice Questions and Answers.