Biostatistics and Bioinformatics 2020
1. a) What is Bioinformatics?
Ans→ Bioinformatics is the use of computational tools and techniques to analyze and interpret biological data such as DNA sequences, protein structures and genetic information.
b) Define negative correlation.
Ans→ Negative correlation is a statistical relationship where one variable increases while the other data decreases.
c) What do you mean by bimodal distribution?
Ans→ A bimodal distribution is a probability distribution Where the dataset has two distinct modes or peaks.
d) What is the full form of OMIM?
Ans→ Online Mendelian Inheritance in Man
e) What is the significance of – p<0.05 m t-test.
Ans→ If the p value is less than 0.05 (p<0.05) indicates the results are statistically significant.
f) Name one specialized biological database.
Ans→ Flybase
g) What do you mean by correlation coefficient?
Ans→ The correlation coefficient is a numerical measure that quantifies the strength and direction of the relationship between two variables.
h) SWISS-PROT is a ______ database.
Ans→ Secondary
i) What will be a researcher conclude if the value of a calculated correlation coefficient is near to zero?
Ans→ If the value of a calculated correlation coefficient is near to zero, a researcher would conclude that there is little to no linear relationship between the two variables.
j) Write down the names of two primary nucleotide sequence databases.
Ans→ GenBank and EMBL (European Molecular Biology Laboratory)
k) What do you mean by Secondary data?
Ans→ Secondary refers to existing data that has been collected by others or for another purpose, which researchers analyze for their own studies.
l) Comment on the distribution when a given dataset has mean=median= mode.
Ans→ When a dataset has mean= median= mode, it typically indicates a symmetric distribution or normal distribution where the data points are evenly distributed around the central value.
m) According to GenBank what is the meaning of [ACCN].
Ans→ In GemBank, ACCN stands for Accession Number which is a cinique identifier assigned to each sequence,
n) Comment, if a regression line can be straight or curved?
Ans→ A regression line can be either straight or curved depending on the nature of relationship between two variables.
2. a) Differentiate between discrete and continuous data with a suitable diagram.
Ans→
Discrete Data | Continuous Data |
1. Discrete data consists of distinct values with clear separations between them. | 1. Continuous data can take any value within a range or interval. |
2. Countable | 2. Measurable |
3. Shows isolated points | 3. Shows connected points |
4. Days of the week | 4. Market price of a product |
b) What is skewed distribution? How does it differ from normal distribution?
Ans→ A skewed distribution is refers to asymmetrical distribution around the mean where the distributions tends to have a tail that extends more to one side than the other.
Skewed distributions are asymmetric and indicate non-uniform distribution of mean, median or mode where normal distribution is symmetric and indicate uniform distribution of data points.
c) How could you differentiate interval and ratio scale.
Ans→
Interval Scale | Ratio Scale |
1. The zero point is arbitrary and does not represent the absence of the measured data. | 1. The zero point represent the absence of the measured attribute |
2. Ratios between values are not meaningful. | 2. Ratios between values are meaningful. |
3. Example: Temperature measured in Celsius or Fahrenheit. | 3. Example: Weight measured in kilograms. |
d) Write down the two major goals of Bioinformatics.
Ans→ 1. Bioinformatics aims to develop tools and techniques for managing and analyzing large volumes of biological data.
2. Bioinformatics seeks to gain insights into biological processes. and systems by integrating computational methods with experimental data.
e) Explain “And/Or” situation in the light of sum rule and product rule.
Ans→ Sum rule: Used for the “or” situation where the probability of either event A or event B occurring is calculated by adding their individual probabilities.
Product rule: Used for the “and” situation where the probability of both events A and B occurring together independently is calculated by multiplying their individual probabilities.
f) What are the contents of GenBank and PubMed?
Ans→ GenBank: A database for genetic research maintained by NCBI containing DNA and RNA sequences along with associated metadata.
pubMed: A database of biomedical literature containing abstracts and citations from scientific journals like medicine biology health science
g) How does correlation differ from regression?
Ans→ Correlation measures the strength and direction of a linear relationship between two variables indicates how much one variable changes when the other changes.
Regression explores the relationship between one dependent variable and one or more independent variables.
h) In which situation it-test is used?
Ans→ When the sample size is relatively small (<30) and to compare are two groups if there is a statistically significant difference between them.
Biostatistics and Bioinformatics 2021
1.a) What does FASTA’ stands for?
Ans→ FAST-All
b) Write one limitation of Bioinformatics.
Ans→ One limitation of Bioinformatics is the quality and completeness of biological data where inaccurate or poorly annotated data can lead to erroneous conclusions.
c) Give an application of Bioinformatics.
Ans→ An application of Bioinformatics is in identifying genetic conditions associated with diseases, which can help in developing targeted therapies.
d) what is the significance of Correlation ?
Ans→ Correlation measures the strength and direction of a linear to relationship between two variables.
e) What is sample space?
Ans→ The sample space is the set of all possible outcomes of a random experiment or event.
Example: Flipping a coin {Head,Tail}
f) What is the importance of ‘Genebank’?
Ans→ GenBank is a comprehensive database of genetic (nucleotide) sequences from various organisms providing information for biomedical research.
g) What is the full form of EMBL?
Ans→ European Molecular Biology Laboratory
h) What is degree of freedom?
Ans→ Degree of freedom is the number of independent pieces of information that are used to make a statistical estimate.
i) What is ‘BLAST’ used for?
Ans→ BLAST (Basic Local Alignment Search Tool) is used to compare nucleotide or protein sequences against databases to find similar sequences.
j) What is pharmacogenomics?
Ans→ Pharmacogenomics is the study of how genes or genom affect a person’s response to drugs, aiming to personalize medical treatment.
k) What do you mean by ‘degree of freedom?
Ans→ Degree of freedom refers to the number of independent pieces of information that are used to make a statistical estimate.
I) Who is the father of Bioinformatics.
Ans→ Margaret Oakley Dayhoff
m) What is mode?
Ans→ Mode refers to the value or values that occur most frequently in a dataset.
n) What is the name one proteomic database?
Ans→ Uniprot
o) What is the content of the SRS database?
Ans→ SRS (Sequence Retrieval system) database contains biological data such as nucleotide sequence, protein sequences and other molecular biology information.
2. a) What is normal distribution? Explain with a diagram.
Ans→ The Normal distribution is a symmetric probability distribution where data are around a central mean value.
b) How is t-test different from ANOVA?
Ans→ A t-test is used to compare the means of two groups whether there is a significant difference between them.
On the other hand, ANOVA (Analysis of variance) is used to compare the mean of three or more groups simultaneously.
c) What do you mean by kurtosis? Ans→ Kurtosis is a statistical measure that describes the tailedness and peak of a distribution.
d) What are the scopes of Bioinformatics?
And→Scopes of Bioinformatics are :
- Genomics: study of genomes and genetic variations
- Proteomics: Analysis of protein structures, functions and interactions.
- Pharmacogenomics: Personalized medicine based on genetic data.
- Bioinformatic database: Development and management of biological data.
e) What is probability and its importance?
And→ Probability is the chance of an event occurring expressed as a in various fields.
Importance: It helps to make predictions, and make informed decisions in various fields.
f) What is Random variables? Name the different types of random variables.
Ans→ Random variables are variables whose values depend on the outcomes of a random event or experiment.
Two types: 1. Discrete random variables 2. Continous random variables.
g) What is entrez? Which organization developed and maintained the database?
Ans→ Entrez is a molecular biology database system that provides integrated access to nucleotide and protein sequence data.
NCBI developed and maintained the database.
h) Write about the thumbs rule of skewness.
Ans→ Thumb rule of skewness
- If skewness is less than 1 or greater than 1, the distribution is highly skewed.
- If skewness is between 1 and 0.5 or between, the distribution is moderately skewed.
- If skewness is between -0.5 and 0.5 the distribution is approximately symmetric.
Biostatistics and Bioinformatics 2022
1.a) What do you mean by an “outliers”?
Ans→ Outlier refers to a data point that differs significantly from other observations in a dataset.
b) Define negative correlation.
Ans→ Negative correlation is a statistical relationship where one variable increases while the other data decreases.
c) What are secondary database? Ans→ Secondary database comprise data derived from the results of analysing primary data.
d) What is the content of PubMed?
Ans→ PubMed is a database of biomedical literature containing abstracts and citations from scientific journals like medicine biology health science.
e) What do you understand by the term ‘positive correlation’ ?
Ans→ Positive correlation refers to a relationship between two variables where they both tend to increase together.
f) Name two primary sequence databases.
Ans→ GenBank, PDB.
g) Mention one importance of the t-test.
Ans→ It helps to determine if there is a significant difference between the means of two groups.
h) Write one limitation of Bioinformatics.
Ans→ One limitation of Bioinformatics is the quality and completeness of biological data where inaccurate or poorly annotated data can lead to erroneous conclusions.
i) What is the ideal range of coefficient of correlation?
Ans→ -1 to +1
j) Primary protein structure database is available in which type?
Ans→ Sequence database such as – PIR (Protein Information Resource), PDB (Protein Data Bank)
k) What is the search field for keyword “according to GenBank?
Ans→ Sequence titles, accession number, gene names, organisms names and other associated annotations.
I) when mode is greater than mean or median, the skewness is known to be ________.
Ans→ Negative skewed frequency distribution.
m) How could you define bimodal distribution?
Ans→ Bimodal distribution is a statistical distribution characterized by two distinct peaks or modes.
n) Explain z-score.
Ans→ z-score represents the number of standard deviations of a data point from the mean of a dataset.
o) State the “Sum Rule” of probability.
Ans→ The probability of the occurrence of one event or the other of two mutually exclusive events, is the sum of their individual probabilities.