119 + Interview Questions in Sequence Alignment in Bioinformatics Page 2 InterviewSolution

51.	Which of the following is not a disadvantage of Needleman-Wunsch algorithm?(a) This method is comparatively slow(b) There is a need of intensive memory(c) This cannot be applied on genome sized sequences(d) This method can be applied to even large sized sequencesI have been asked this question in an international level competition.Question is taken from Global Sequence Alignment in chapter Sequence Alignment of Bioinformatics
Answer» RIGHT answer is (d) This method can be applied to even LARGE sized sequences The explanation is: This method cannot be applied on genome sized sequences. But this is INDEED useful in determining SIMILARITIES and evolutionary RELATIONSHIPS.

Discussion

52.	In a type of probability, analysis is to calculate the odds score for one event OR a second event, or of a series of events. In this case, the odds scores are _______(a) multiplied(b) subtracted(c) added and multiplied(d) addedI had been asked this question in an internship interview.The doubt is from Bayesian Statistics in chapter Sequence Alignment of Bioinformatics
Answer» Right option is (d) added To explain: An EXAMPLE is the calculation of the ODDS score for a given sequence alignment using a series of alternative PAM SCORING matrices. The alignment scores are calculated in log odds units and then CONVERTED into odds scores.

Discussion

53.	The more conserved amino acids in similar proteins from different species are ones that play an essential role in structure and function and the less conserved are in sites that can vary without having a significant effect on function.(a) True(b) FalseThis question was posed to me in an interview.The doubt is from Use of Scoring Matrices and Gap Penalties in Sequence Alignments topic in section Sequence Alignment of Bioinformatics
Answer» The CORRECT OPTION is (a) True To EXPLAIN I would say: There are MANY factors that influence both the location and TYPES of amino acid changes that occur in proteins. Wilbur (1985) has tested the Markov model of evolution and has shown that it can be valid if certain changes are made in the way that the PAM matrices are calculated.

Discussion

54.	Which of the following is false in case of the CDART and its algorithm?(a) CDART is a domain search program that combines the results from RPS-BLAST, SMART, and Pfam(b) The program is now an integral part of the regular BLAST search function(c) CDART is a substitute for individual database searches(d) It stands for Conserved Domain ArchitectureI have been asked this question in an interview.I want to ask this question from Motif and Domain Databases Using Statistical Models topic in portion Sequence Alignment of Bioinformatics
Answer» Right option is (c) CDART is a substitute for individual DATABASE searches The BEST I can EXPLAIN: CDART is a domain search program that combines the results from various database searches. As with InterPro, CDART is not a substitute for individual database searches because it often MISSES certain features that can be FOUND in SMART and Pfam.

Discussion

55.	Which of the following is false in case of the database InterPro and its algorithm?(a) InterPro is an integrated pattern database designed to unify multiple databases for protein domains and functional sites(b) This database integrates information from PROSITE, Pfam, PRINTS, ProDom, and SMART databases(c) Only overlapping motifs and domains in a protein sequence derived by all five databases are included(d) All the motifs and domains in a protein sequence derived by all five databases are includedI have been asked this question in an interview for internship.My query is from Motif and Domain Databases Using Statistical Models topic in section Sequence Alignment of Bioinformatics
Answer»

Discussion

56.	If the random sequences were prepared in a way that maintained the local base composition by producing them from overlapping fragments of sequence, the distribution of scores has a _______ standard deviation that is closer to the distribution of the natural sequences.(a) lowest(b) higher(c) lower(d) moderateI got this question during an online interview.This key question is from Assessing the Significance of Sequence Alignments topic in division Sequence Alignment of Bioinformatics
Answer» RIGHT option is (c) lower Explanation: The conclusion from the above is that the presence of conserved local patterns can influence the score in statistical tests such that an alignment can appear to be more significant than it ACTUALLY is. Although this study was done using the Smith-Waterman algorithm with NUCLEIC ACIDS, the same cautionary note applies for other TYPES of alignments.

Discussion

57.	Which of the following are not related to Needleman-Wunsch alignment algorithm?(a) Global alignment programs use this algorithm(b) The output is a positive number(c) Small changes in the scoring system can produce a different alignment(d) Changes in the scoring system can produce the same alignmentThis question was addressed to me in my homework.This question is from Assessing the Significance of Sequence Alignments topic in chapter Sequence Alignment of Bioinformatics
Answer» RIGHT answer is (d) Changes in the scoring system can produce the same alignment To elaborate: In general, global alignment programs use the Needleman-Wunsch alignment ALGORITHM and a scoring system that scores the AVERAGE match of an ALIGNED NUCLEOTIDE or amino acid pair as a positive number. Hence, the score of the alignment of random or unrelated sequences grows proportionally to the length of the sequences. In addition, there are many possible different global alignments depending on the scoring system chosen, and small changes in the scoring system can produce a different alignment.

Discussion

58.	The statistical analysis of alignment scores is much better understood for ________ than for _______(a) global alignments, local alignments(b) local alignments, global alignments(c) global alignments, any other alignment method(d) Needleman-Wunsch alignment, Smith-Waterman alignmentThis question was posed to me in examination.The above asked question is from Assessing the Significance of Sequence Alignments topic in section Sequence Alignment of Bioinformatics
Answer» The correct CHOICE is (b) local alignments, GLOBAL alignments Best explanation: Smith-Waterman alignment algorithm and the SCORING system USED to produce a local alignment are designed to reveal regions of closely matching sequence with a positive alignment score. In random or unrelated sequence alignments, these regions are rarely found. Hence, their presence in real sequence alignments is significant, and the PROBABILITY of their occurring by chance alignment of unrelated sequences can be readily calculated.

Discussion

59.	A feature of the dynamic programming algorithm is that the alignments obtained depend on the choice of a scoring system for comparing character pairs and penalty scores for gaps.(a) True(b) FalseThis question was posed to me in examination.The origin of the question is Dynamic Programming Algorithm for Sequence Alignment in division Sequence Alignment of Bioinformatics
Answer» Right answer is (a) True Explanation: For an ALGORITHM, the output depends on the choice of a scoring system. For protein sequences, the simplest system of COMPARISON is one based on identity. A match in an alignment is only SCORED if the two aligned amino acids are identical. However, one can also examine related protein sequences that can be aligned EASILY and find which amino acids are commonly SUBSTITUTED for each other.

Discussion

60.	Which of the following is not a software for dot plot analysis?(a) SIMMI(b) DOTLET(c) DOTMATCHER(d) LALIGNThis question was addressed to me by my college professor while I was bunking the class.This key question is from Dot Matrix Sequence Comparison in section Sequence Alignment of Bioinformatics
Answer» The correct OPTION is (a) SIMMI Easiest EXPLANATION: For the purpose of DOT plot interpretation there are various softwares currently present. Among these SIM is used for these KINDS of alignments through dot-plot method that is WRONGLY abbreviated.

Discussion

61.	Which of the following does not describe BLAST?(a) It stands for Basic Local Alignment Search Tool(b) It uses word matching like FASTA(c) It is one of the tools of the NCBI(d) Even if no words are similar, there is an alignment to be consideredThe question was asked during an interview.Question is taken from Local Sequence Alignment in section Sequence Alignment of Bioinformatics
Answer» Right answer is (d) Even if no WORDS are similar, there is an alignment to be considered To explain I would SAY: If no words are similar, there is no alignment i. e. it will not find MATCHES for very short SEQUENCES. But it is considerably accurate as compared to other tools and hence is QUITE popular.

Discussion

62.	Local alignments are more used when _____________(a) There are totally similar and equal length sequences(b) Dissimilar sequences are suspected to contain regions of similarity(c) Similar sequence motif with larger sequence context(d) Partially similar, different length and conserved region containing sequencesI had been asked this question in an international level competition.The question is from Local Sequence Alignment in division Sequence Alignment of Bioinformatics
Answer» RIGHT choice is (a) There are totally SIMILAR and equal length SEQUENCES To explain: The given DESCRIPTION is suitable for global alignment. It attempts to align maximum of the entire sequence UNLIKE local alignment where the partially similar sequences are analyzed.

Discussion

63.	Which of the following does not describe global alignment algorithm?(a) Score can be negative in this method(b) It is based on dynamic programming technique(c) For two sequences of length m and n, the matrix to be defined should be of dimensions m+1 and n+1(d) For two sequences of length m and n, the matrix to be defined should be of dimensions m and nI got this question in quiz.My question comes from Global Sequence Alignment in portion Sequence Alignment of Bioinformatics
Answer» The correct answer is (d) For TWO sequences of LENGTH m and n, the matrix to be defined should be of dimensions m and n The explanation: For two sequences of length m and n, the matrix to be defined should be of dimensions m+1 and n+1so that there is margin for ADDITION of the score along the diagonal. ALSO, corresponding score is further calculated at the end cumulatively.

Discussion

64.	Which of the following statements about PANTHER and TIGRFAMs databases is incorrect regarding its features?(a) TIGRFAMs provides a tool for identifying functionally related proteins based on sequence homology(b) TIGRFAMs is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation(c) Hidden Markov models (HMMs) are not used in PANTHER(d) PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertiseThe question was posed to me in an online quiz.My question comes from Protein Family Databases in chapter Sequence Alignment of Bioinformatics
Answer» Right OPTION is (c) Hidden Markov models (HMMs) are not used in PANTHER Easy explanation: In PANTHER the subfamilies model the DIVERGENCE of specific functions within protein families, allowing more accurate association with FUNCTION (human-curated molecular function and biological process classifications and pathway diagrams), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences.

Discussion

65.	What is the source of protein structures in SCOP and CATH?(a) Uniprot(b) Protein Data Bank(c) Ensemble(d) InterProThis question was posed to me in examination.This intriguing question originated from Protein Family Databases in division Sequence Alignment of Bioinformatics
Answer» The correct option is (B) Protein Data BANK Best EXPLANATION: The source of protein STRUCTURES in SCOP is PDB (Protein Data Bank). PDB is a secondary database which MEANS it has protein structures derived from primary databases that have the protein sequences. UNIPROT is a primary database.

Discussion

66.	Another difficulty in Bayesian methods is deciding on the length of sequence that was duplicated.(a) True(b) FalseI have been asked this question in unit test.My question is from Bayesian Statistics in section Sequence Alignment of Bioinformatics
Answer» The CORRECT choice is (a) True For explanation I would say: In genomes, the presence of repeats may be revealed by LONG regions of matched SEQUENCE POSITIONS dispersed among regions of sequence positions that do not match. However, as the frequency of mismatches is increased, it becomes difficult to determine the extent of the repeated region.

Discussion

67.	If the purpose is to calculate the probability of one event AND a second event, the odds scores for the events are _________(a) added(b) multiplied(c) multiplied and added(d) subtractedI have been asked this question during an internship interview.Question is from Bayesian Statistics in portion Sequence Alignment of Bioinformatics
Answer» Right CHOICE is (B) multiplied To ELABORATE: An example is the calculation of the odds of an alignment of two sequences from the alignment scores for each of the matched pairs of bases or amino acids in the alignment. The odds scores for the pairs are multiplied. Usually, the LOG odds score for the first pair is added to that for the second, etc., until the scores for EVERY pair have been added.

Discussion

68.	Gibbs is a web-based program that uses the Gibbs sampling approach to look for _____ gap-free segments for either DNA or protein sequences.(a) short, partially conserved(b) long, partially conserved(c) long, conserved(d) short, not conservedThe question was asked in quiz.My question is taken from Motif Discovery in Unaligned Sequences in portion Sequence Alignment of Bioinformatics
Answer» Right choice is (a) SHORT, partially conserved To explain I would SAY: Gibbs sampling approach to look for short, partially conserved gap-free segments for either DNA or protein SEQUENCES. To ensure ACCURACY, more than twenty sequences of the exact same LENGTH should be used.

Discussion

69.	For what type of sequences Gibbs sampling is used?(a) Closely related sequences(b) Distinctly related sequences(c) Distinctly related sequences that share common motifs(d) Closely related sequences that share common motifsThis question was addressed to me in an online interview.My doubt stems from Motif Discovery in Unaligned Sequences topic in section Sequence Alignment of Bioinformatics
Answer» The correct option is (c) DISTINCTLY related SEQUENCES that share common MOTIFS For explanation I would say: Often, distantly related sequences that share common motifs cannot be readily aligned. For example, the sequences for the helix-turn-helix motif in transcription factors can be subtly different ENOUGH that traditional multiple sequence alignment approaches fail to generate a satisfactory answer. For detecting such subtle motifs, more sophisticated algorithms such as expectation maximization (EM) and Gibbs sampling are USED.

Discussion

70.	Which of the following does not describe local alignment algorithm?(a) Score can be negative(b) Negative score is set to 0(c) First row and first column are set to 0 in initialization step(d) In traceback step, beginning is with the highest score, it ends when 0 is encounteredThis question was addressed to me in an international level competition.This intriguing question comes from Local Sequence Alignment topic in portion Sequence Alignment of Bioinformatics
Answer» Correct CHOICE is (a) Score can be negative For explanation I would say: Score can be negative. When any element has a score lower than zero, it means that the sequences up to this position have no SIMILARITIES; this element will then be set to zero to eliminate influence from PREVIOUS ALIGNMENT. In this way, calculation can continue to find alignment in any position AFTERWARD.

Discussion

71.	In which of the following multipurpose packages Gibbs sampling algorithm is used?(a) Consensus(b) BEST(c) AlignACE(d) PhyloConThe question was asked in examination.I need to ask this question from Motif and Domain Databases Using Statistical Models in chapter Sequence Alignment of Bioinformatics
Answer» Correct choice is (c) AlignACE The BEST explanation: The GIBBS sampling algorithm can identify multiple motifs in a sequence in a sequence set using iterative masking procedure. It is used in AlignACE whereas BEST is a suite of four motif discovery tools integrated in a graphical user interface. Also, CONSENSUS program finds motifs in a set of UNALIGNED sequences and PhyloCon builds on this framework by modeling conservation across ORTHOLOGOUS genes from multiple species.

Discussion

72.	Which of the following statements about SCOP is incorrect regarding its features?(a) Proteins with the same shapes but having little sequence or functional similarity are placed in different super families, and are assumed to have only a very distant common ancestor(b) Proteins having the same shape and some similarity of sequence and/or function are placed in ‘families’, and are assumed to have a closer common ancestor(c) SCOP was created in 1994 in the Centre of Protein Engineering and the University College London(d) It aims to determine the evolutionary relationship between proteinsThe question was posed to me in an online quiz.This intriguing question originated from Protein Family Databases topic in section Sequence Alignment of Bioinformatics
Answer» The correct answer is (c) SCOP was created in 1994 in the CENTRE of Protein Engineering and the University College London The best explanation: SCOP, Structural Classification of Proteins, was created in 1994 in the Centre of Protein Engineering and the LABORATORY of Molecular Biology. It was maintained by Alexey G. Murzin and his COLLEAGUES in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, ENGLAND.

Discussion

73.	The GCG alignment programs have a RANDOMIZATION option, which shuffles the second sequence and calculates similarity scores between the unshuffled sequence and each of the shuffled copies.(a) True(b) FalseI have been asked this question by my school principal while I was bunking the class.This interesting question is from Assessing the Significance of Sequence Alignments topic in portion Sequence Alignment of Bioinformatics
Answer» RIGHT option is (a) True The best explanation: If the new similarity scores are significantly SMALLER than the real alignment score, the alignment is considered significant. This analysis is only useful for providing a rough APPROXIMATION of the significance of an alignment score and can EASILY be MISLEADING.

Discussion

74.	The assumption in this evolutionary model is that the amino acid substitutions observed over short periods of evolutionary history can be extrapolated to longer distances.(a) True(b) FalseThis question was addressed to me by my college director while I was bunking the class.Origin of the question is Use of Scoring Matrices and Gap Penalties in Sequence Alignments in portion Sequence Alignment of Bioinformatics
Answer» The correct choice is (a) True To elaborate: The BLOSUM matrices are based on scoring SUBSTITUTIONS found over a RANGE of evolutionary periods and reveal that substitutions are not always as predicted by the PAM MODEL. The PURPOSE of assumption in this evolutionary model is to make predictions.

Discussion

75.	For significantly aligning sequences what is the resulting structure on the plot?(a) Intercrossing lines(b) Crosses everywhere(c) Vertical lines(d) A diagonal and lines parallel to diagonalThis question was posed to me in final exam.My question is taken from Dot Matrix Sequence Comparison in chapter Sequence Alignment of Bioinformatics
Answer» RIGHT option is (d) A diagonal and lines parallel to diagonal Explanation: If there is ALIGNMENT of sequences there is a significantly BOLD diagonal visible on the plot. And if the is a bit imperfect, the diagonal is shattered too to an extent and FORMS small parallel lines to it.

Discussion

76.	Which of the following does not describe local alignment?(a) A local alignment aligns a substring of the query sequence to a substring of the target sequence(b) A local alignment is defined by maximizing the alignment score, so that deleting a column from either end would reduce the score, and adding further columns at either end would also reduce the score(c) Local alignments have terminal gaps(d) The substrings to be examined may be all of one or both sequences; if all of both are included then the local alignment is also globalI got this question by my school principal while I was bunking the class.My question is taken from Local Sequence Alignment topic in chapter Sequence Alignment of Bioinformatics
Answer» Right choice is (c) Local ALIGNMENTS have terminal gaps To EXPLAIN: Local alignments never have terminal gaps, because a higher SCORE could be OBTAINED by deleting the gaps (which always have negative scores, i.e. penalties). In case of global alignment there are terminal gaps while analyzing.

Discussion

77.	Which of the following does not describe PAM matrices?(a) These matrices are used in optimal alignment scoring(b) It stands for Point Altered Mutations(c) It stands for Point Accepted Mutations(d) It was first developed by Margaret DayhoffThe question was asked during an interview.My enquiry is from Global Sequence Alignment in section Sequence Alignment of Bioinformatics
Answer» Correct option is (B) It stands for Point Altered Mutations To EXPLAIN I would say: PAM stands for Point Accepted Mutations. PAM matrices are calculated by OBSERVING the differences in closely related proteins. One PAM unit (PAM1) specifies one accepted point mutation per 100 amino acid residues, i.e. 1% change and 99% remains as such.

Discussion

78.	In Bayesian methods, difficulty with making estimations is that the estimate depends on the following Assumption. (Assumption – The mutation rate in sequences has been constant with time and that the rate of mutation of all nucleotides is the same.)(a) True(b) FalseThis question was addressed to me by my college director while I was bunking the class.Enquiry is from Bayesian Statistics in division Sequence Alignment of Bioinformatics
Answer» Correct answer is (a) True The explanation is: The assumption mentioned above (the molecular CLOCK hypothesis) is MADE to REDUCE the complications. Such problems may be SOLVED by scoring different portions of a sequence with a different scoring matrix, and then using the above Bayesian methods to calculate the best EVOLUTIONARY distance.

Discussion

79.	Which of the following is untrue about Expectation Maximization (EM) method?(a) It is used to find hidden motifs(b) The method works by first making a random or guessed alignment of the sequences to generate a trial PSSM(c) The trial PSSM is used to compare with each sequence individually(d) The log odds scores of the PSSM are modified at the end of the processI had been asked this question in unit test.My doubt stems from Motif Discovery in Unaligned Sequences in section Sequence Alignment of Bioinformatics
Answer» Correct option is (d) The log ODDS scores of the PSSM are modified at the end of the process The explanation is: The log odds scores of the PSSM are modified in each iteration to maximize the alignment of the matrix to each sequence. During the ITERATIONS, the sequence pattern for the conserved motifs is GRADUALLY “RECRUITED” to the PSSM.

Discussion

80.	Dayhoff, 1978- 1983, devised a second method for testing the relatedness of two protein sequences that can accommodate some local variation. Where this method is useful?(a) For finding repeated regions within a sequence(b) For finding similar regions that are in a different order in two sequences(c) For finding small conserved region such as an active site(d) For finding huge regions within sequencesI got this question by my college professor while I was bunking the class.The origin of the question is Assessing the Significance of Sequence Alignments topic in chapter Sequence Alignment of Bioinformatics
Answer» Right answer is (d) For finding HUGE regions within SEQUENCES Explanation: As used in a computer program called RELATE (Dayhoff 1978), all possible segments of a given length of one sequence are compared with all segments of the same length from another. An alignment score using a scoring matrix is obtained for each comparison to GIVE a score distribution among all of the segments. A segment comparison score in STANDARD deviation units is calculated as the difference between the values for real sequences minus the average value for random sequences DIVIDED by the standard deviation of the scores from the random sequences.

Discussion

81.	The Dayhoff model of protein evolution is not a Markov process.(a) True(b) FalseThe question was posed to me during an internship interview.Asked question is from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in portion Sequence Alignment of Bioinformatics
Answer» Right ANSWER is (b) False The best explanation: The Dayhoff Model of PROTEIN Evolution as Used in PAM Matrices is a Markov process. In Analysis of the Dayhoff Model, each amino acid site in a protein can change at any time to any of the other 20 amino acids with probabilities GIVEN by the PAM table, and the changes that OCCUR at each site are independent of the amino acids found at other sites in the protein and depend only on the current amino acid at the site.

Discussion

82.	Which of the following is untrue regarding dynamic programming algorithm?(a) The method compares every pair of characters in the two sequences and generates an alignment(b) The output alignment will include matched and mismatched characters and gaps in the two sequences that are positioned so that the number of matches between identical or related characters is the maximum possible(c) The dynamic programming algorithm provides a reliable computational method for aligning DNA and protein sequences(d) This doesn’t allow making evolutionary predictions on the basis of sequence alignmentsThe question was posed to me in class test.Query is from Dynamic Programming Algorithm for Sequence Alignment in chapter Sequence Alignment of Bioinformatics
Answer» Correct choice is (d) This doesn’t allow making EVOLUTIONARY predictions on the basis of sequence alignments Explanation: Optimal alignments provide useful INFORMATION to biologists concerning sequence relationships by giving the best possible information as to which characters in a sequence should be in the same column in an alignment, and which are insertions in one of the SEQUENCES (or deletions on the other). This information is important for making functional, STRUCTURAL, and evolutionary predictions on the basis of sequence alignments.

Discussion

83.	Which of the following is untrue regarding the gap penalty used in dynamic programming?(a) Gap penalty is subtracted for each gap that has been introduced(b) Gap penalty is added for each gap that has been introduced(c) The gap score defines a penalty given to alignment when we have insertion or deletion(d) Gap open and gap extension has been introduced when there are continuous gaps (five or more)I had been asked this question during an interview for a job.My question comes from Local Sequence Alignment topic in chapter Sequence Alignment of Bioinformatics
Answer» The correct option is (b) Gap PENALTY is added for each gap that has been introduced Best explanation: Dynamic programming algorithms use gap PENALTIES to maximize the biological meaning. The open penalty is always applied at the START of the gap, and then the other GAPS FOLLOWING it is given with a gap extension penalty which will be less compared to the open penalty. Typical values are –12 for gap opening, and –4 for gap extension.

Discussion

84.	Which of the following is untrue regarding the scoring system used in dynamic programming?(a) If the residues are same in both the sequences the match score is assumed as +5 which is added to the diagonally positioned cell of the current cell(b) If the residues are not same, the mismatch score is assumed as -3(c) If the residues are not same, the mismatch score is assumed as 3(d) The score should be added to the diagonally positioned cell of the current cellThis question was posed to me by my college director while I was bunking the class.My enquiry is from Global Sequence Alignment in division Sequence Alignment of Bioinformatics
Answer» The CORRECT choice is (c) If the residues are not same, the mismatch score is assumed as 3 The explanation is: If the residues are not same, the mismatch score is assumed as -3 and it has to be negative. HOWEVER, these SCORES are not unique, they can be user DEFINED also, but the mismatch and gap penalty should be the negative values.

Discussion

85.	Waterman, in1989, provided a set of means and standard deviations of global alignment scores between random DNA sequences, using mismatch and gap penalties that produce a linear increase in score with _______ a distinguishing feature of global alignments.(a) alignment score(b) sequence score(c) sequence length(d) scoring systemI got this question in an online quiz.My question is based upon Assessing the Significance of Sequence Alignments in chapter Sequence Alignment of Bioinformatics
Answer» Correct ANSWER is (c) SEQUENCE LENGTH To elaborate: In the algorithm provided by WATERMAN, the score of the alignment of random or unrelated sequences grows proportionally to the length of the sequences. However, these values are of limited use because they are based on a simple GAP scoring system.

Discussion

86.	Vertical frame shifts show ______ while the horizontal ones show _______(a) insertion, insertion(b) insertion, deletion(c) deletion, deletion(d) deletion, insertionThis question was posed to me in unit test.My question comes from Dot Matrix Sequence Comparison topic in division Sequence Alignment of Bioinformatics
Answer» RIGHT choice is (b) insertion, DELETION Best explanation: Deletion and insertion of NUCLEOTIDES is quite common in ALIGNMENT process. The dot plot easily represents them with vertical and horizontal SHIFTS. And the mutations are totally out of the diagonal zone.

Discussion

87.	With the application of Bayesian methods, the most probable repeat length and evolutionary time since the repeat was formed may be derived.(a) True(b) FalseThe question was posed to me in an international level competition.My question is taken from Bayesian Statistics topic in section Sequence Alignment of Bioinformatics
Answer» CORRECT choice is (a) True To elaborate: SEQUENCES of this type originated from gene duplication events in the yeast and Caenorhabditis ELEGANS genomes. When there are multiple mismatches between such repeated sequences, it is difficult to determine the most LIKELY length of the REPEATS. Here the methods can be used.

Discussion

88.	Which of the following is untrue about the modification of PAM matrices?(a) At one time, the PAM250 scoring matrix was modified in an attempt to improve the alignment obtained(b) All scores for matching a particular amino acid were normalized to the same mean and standard deviation, and all amino acid identities were given the same score to provide an equal contribution for each amino acid in a sequence alignment(c) This took place in 1976(d) These modifications were included as the default matrices for the GCG sequence alignment programs in versions 8 and earlier and are optional in later versionsThis question was addressed to me during an interview for a job.My question comes from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in section Sequence Alignment of Bioinformatics
Answer»

Discussion

89.	A multiple sequence alignment or a motif is often represented by a graphic representation called a ______(a) logo(b) motto(c) algorithm(d) algoThe question was posed to me by my college professor while I was bunking the class.My enquiry is from Motif Discovery in Unaligned Sequences topic in section Sequence Alignment of Bioinformatics
Answer»

Discussion

90.	Among the following which one is not the approach to the local alignment?(a) Smith-Waterman algorithm(b) K-tuple method(c) Words method(d) Needleman-Wunsch algorithmI had been asked this question by my school teacher while I was bunking the class.This interesting question is from Local Sequence Alignment topic in division Sequence Alignment of Bioinformatics
Answer» The correct choice is (d) Needleman-Wunsch algorithm Explanation: LOCAL alignment can be DISTINGUISHED on two broad approaches, Smith-Waterman algorithm and word methods, also KNOWN as k-tuple methods and they are implemented in the well-known families of PROGRAMS FASTA and BLAST.

Discussion

91.	In the web-based program MEME, the computation is a _____ step procedure.(a) one(b) two(c) three(d) fourThis question was addressed to me during an online exam.My question is from Motif Discovery in Unaligned Sequences topic in section Sequence Alignment of Bioinformatics
Answer» The correct option is (b) two For EXPLANATION I would say: In CONSTRUCTING a probability matrix, it allows multiple starting alignments and does not assume that there are motifs in every sequence. ALSO, the computation is a two-step procedure that INCLUDES generation of sequence motif and finding HIGHEST score.

Discussion

92.	Which of the following is true about Expectation Maximization (EM) method?(a) The log odds scores of the PSSM are modified at the end of the process(b) The procedure stops prematurely if the scores reach convergence(c) The final result is not sensitive to the initial alignment(d) Local optimum is an advantage of EM methodThis question was posed to me in class test.My question is taken from Motif Discovery in Unaligned Sequences in chapter Sequence Alignment of Bioinformatics
Answer» Right answer is (B) The procedure stops prematurely if the scores reach convergence To explain: The FINAL result is SENSITIVE to the initial alignment. The LOCAL optimum is actually a drawback of EM method. It is same as the fact that the procedure stops prematurely if the scores reach convergence.

Discussion

93.	When did Needleman-Wunsch first describe the algorithm for global alignment?(a) 1899(b) 1970(c) 1930(d) 1950I got this question in class test.The question is from Global Sequence Alignment in division Sequence Alignment of Bioinformatics
Answer» Correct answer is (b) 1970 For explanation: NEEDLEMAN and Wunsch were among the first to DESCRIBE dynamic programming algorithm for global SEQUENCE. In global sequence alignment, an attempt to align the entirety of TWO different sequences is made, up to and including the ends of sequences.

Discussion

94.	Which of the following does not describe dynamic programming?(a) The approach compares every pair of characters in the two sequences and generates an alignment, which is the best or optimal(b) Global alignment algorithm is based on this method(c) Local alignment algorithm is based on this method(d) The method can be useful in aligning protein sequences to protein sequences onlyI got this question in exam.My question is based upon Global Sequence Alignment in section Sequence Alignment of Bioinformatics
Answer» Correct choice is (d) The method can be USEFUL in ALIGNING protein sequences to protein sequences only The BEST I can explain: The method can be useful in aligning nucleotide to protein sequences as WELL. These programs first perform pair-wise alignment on each pair of sequences. Then, they perform local re-arrangements on these results, in ORDER to optimize overlaps between multiple sequences.

Discussion

95.	Which of the following statements about PRINTS and ProDom databases is incorrect regarding its features?(a) PRINTS is a compendium of protein fingerprints(b) Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space(c) Current versions of ProDom are built using a novel procedure based on recursive BLAST searches(d) ProDom domain database consists of an automatic compilation of homologous domainsThe question was asked in unit test.Enquiry is from Protein Family Databases topic in chapter Sequence Alignment of Bioinformatics
Answer» Correct answer is (c) Current versions of ProDom are built using a novel procedure based on recursive BLAST SEARCHES The best I can explain: Current versions of ProDom are built using a novel procedure based on recursive PSI-BLAST searches and not just BLAST searches. And PRINTS is INDEED a compendium of protein fingerprints. A fingerprint is a group of CONSERVED motifs used to characterise a protein family; its diagnostic power is refined by ITERATIVE scanning of UniProt.

Discussion

96.	For palindromic sequences, what is the structure of the dot plot?(a) 2 intersecting diagonal lines at the midpoint(b) One diagonal(c) Two parallel diagonals(d) No diagonalI have been asked this question in an online interview.This question is from Dot Matrix Sequence Comparison in portion Sequence Alignment of Bioinformatics
Answer» Right option is (a) 2 intersecting diagonal lines at the MIDPOINT To elaborate: For perfectly aligned sequences there is a diagonal formation of dot plot. For PALINDROMIC sequences i. E. for sequences that are SYMMETRICAL from the midpoint of the sequence, there exist 2 intersecting diagonals on the plot.

Discussion

97.	Which of the following statements about CATH-Gene3D and HAMAP databases is incorrect regarding its features?(a) CATH-Gene3D describes protein families and domain architectures in complete genomes(b) In CATH-Gene3D the functional annotation is provided to proteins from single resource(c) HAMAP profiles are manually created by expert curators they identify proteins that are part of well-conserved bacterial, archaeal and plastid-encoded proteins families or subfamilies.(d) HAMAP stands for High-quality Automated and Manual Annotation of microbial ProteomesI have been asked this question during an online interview.The origin of the question is Protein Family Databases topic in chapter Sequence Alignment of Bioinformatics
Answer»

Discussion

98.	Which of the following wrongly describes protein domains?(a) They are made up of one secondary structure(b) Defined as independently foldable units(c) They are stable structures as compared to motifs(d) They are separated by linker regionsI have been asked this question in an interview.Query is from Protein Motifs and Domain Prediction topic in section Sequence Alignment of Bioinformatics
Answer» Right choice is (a) They are made up of ONE secondary structure To elaborate: PROTEIN domains are made up of two or more MOTIFS i.e. the secondary structure to form stable and folded 3-D structures. They are conserved PART of the protein sequence and can evolve, function, and exist independently of the rest of the protein chain.

Discussion

99.	The protein structural motif domain- helix loop helix are contained by all of the following except ________(a) Scleraxis(b) Neurogenins(c) Transcription Factor 4(d) Leucine zipperThe question was asked by my school teacher while I was bunking the class.This interesting question is from Protein Motifs and Domain Prediction topic in division Sequence Alignment of Bioinformatics
Answer» The correct answer is (d) Leucine zipper To EXPLAIN I WOULD say: Leucine zipper is associated with gene regulation and contains alpha helix with leucine at every 7th AMINO acid. While rest of them are under one of the largest families of dimerizing TRANSCRIPTION factors.

Discussion

100.	What is the length of a motif, in terms of amino acids residue?(a) 30- 60(b) 10- 20(c) 70- 90(d) 1- 10I got this question in a job interview.My query is from Protein Motifs and Domain Prediction topic in chapter Sequence Alignment of Bioinformatics
Answer» CORRECT choice is (b) 10- 20 For EXPLANATION: A typical MOTIF is 10-20 AMINO acids long. For e.g. Zn-finger motif. Hence it is also referred to as super secondary structure. This motif is seen in transcription factors.

Discussion

Explore topic-wise InterviewSolutions in .

Which of the following is not a software for dot plot analysis?(a) SIMMI(b) DOTLET(c) DOTMATCHER(d) LALIGNThis question was addressed to me by my college professor while I was bunking the class.This key question is from Dot Matrix Sequence Comparison in section Sequence Alignment of Bioinformatics

What is the source of protein structures in SCOP and CATH?(a) Uniprot(b) Protein Data Bank(c) Ensemble(d) InterProThis question was posed to me in examination.This intriguing question originated from Protein Family Databases in division Sequence Alignment of Bioinformatics

Another difficulty in Bayesian methods is deciding on the length of sequence that was duplicated.(a) True(b) FalseI have been asked this question in unit test.My question is from Bayesian Statistics in section Sequence Alignment of Bioinformatics

In which of the following multipurpose packages Gibbs sampling algorithm is used?(a) Consensus(b) BEST(c) AlignACE(d) PhyloConThe question was asked in examination.I need to ask this question from Motif and Domain Databases Using Statistical Models in chapter Sequence Alignment of Bioinformatics

The Dayhoff model of protein evolution is not a Markov process.(a) True(b) FalseThe question was posed to me during an internship interview.Asked question is from Use of Scoring Matrices and Gap Penalties in Sequence Alignments in portion Sequence Alignment of Bioinformatics

In the web-based program MEME, the computation is a _____ step procedure.(a) one(b) two(c) three(d) fourThis question was addressed to me during an online exam.My question is from Motif Discovery in Unaligned Sequences topic in section Sequence Alignment of Bioinformatics

When did Needleman-Wunsch first describe the algorithm for global alignment?(a) 1899(b) 1970(c) 1930(d) 1950I got this question in class test.The question is from Global Sequence Alignment in division Sequence Alignment of Bioinformatics

What is the length of a motif, in terms of amino acids residue?(a) 30- 60(b) 10- 20(c) 70- 90(d) 1- 10I got this question in a job interview.My query is from Protein Motifs and Domain Prediction topic in chapter Sequence Alignment of Bioinformatics