Skip to main content

Statistics for Data Science

Statistics and Visualizations: Topics we have to cover!


What is Statistics?

Statistics is nothing but a science of collecting, organizing and analyzing the data.

What is a Population?

Population is nothing but a complete dataset or entire dataset or entire observations. 

What is a Sample?

Sample is subset of population. 

What are the sampling techniques?
1. Simple Random Sampling: Every member of population(N) has an equal chance of bing selected for your sample(n).
2. Stratified Random Sampling: Where the population(N) is split into non-overlapping groups means strata's.
3. Systematic Random Sampling: The random sampling method that requires selecting samples based on a system of intervals in a numbered population. 
4. Convenience Sampling: It is a non-sampling method where units are selected for inclusion in the sample because they are the easiest for the researcher to access.

What is variable?
A variable is a property that can take on any value. We have two types of variables. Quantitative and Qualitative variable.

What are the four type of measurement variables?
1. Nominal Data
2. Ordinal
3. Interval
4. Ratio

What is Central Tendency? What are they?

Central Tendency is nothing but the measured used to determine the center of the  distribution of the data.

What is a mean?

Average value of the data.

What is a median?

Middle value of the ordered data.

What is a mode?

Most frequent value in the data.

What is Descriptive Statistics?

It consist of organizing and summarizing the data.

What is Inferential Statistics?

It is technique where in we used the data that we have measured to form of conclusions.

What is a Range?

Difference between the two data points.
Range=Max-min.

What is Quartile?

Quartile is nothing but the three values that splits sorted data into four parts, each with an equal number of observations.


What is an Interquartile Range(IQR)?

The interquartile range is the difference between upper and lower quartiles.

IQR = Q3 - Q1.


What is an Outlier?

It is a value that 'lies outside' most of the other values in a set of data. For example in the scores 25,29,3,32,85,33,27,28. here both 3 and 85 are "outliers".


What is Variance?

The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. Variance tells us the degree of spread in our data set.


What is Standard Deviation?

A standard deviation is a measure of how dispersed the data is in relation to the mean. Low standard deviation means data are clustered around the mean, and high standard deviation indicates data are more spread out.



What is the difference between Variance and Standard Deviation?
Variance and Standard Deviation are the two important measurements in statistics. Variance is a measure of how data points vary from the mean, whereas standard deviation is the measure of the distribution of statistical data. The basic difference between variance and the standard deviation is in their units. 

What are the types of data?

We have two types of data. Numerical and Continues data.

Concept of Probability:


What is an Experiment?
Experiment is nothing but an outcome which cannot be predictable.

What is Sample Space?
Sample space is a possible set of outcome of random experiment.

What is an event?

Outcome of an experiment.


What is the Probability?

Measure of the likelihood of an event.


Probability vs Statistics?

Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events.


What are the types of Probabilities?

Mostly we have used three types of probabilities those are conditional, marginal and joint probability.

What is a Conditional Probability?

The likelihood of an event or outcome occurring, based on the occurrence of a previous event or outcome.


What is a Marginal Probability?

Marginal probability is the probability f an event irrespective of the outcome of another variable.


What is the Joint Probability?
Joint probability is the probability of two events occurring simultaneously.

What is Bayes Theorem?

Bayes Theorem allows you to find reverse probabilities, and to revise original probabilities based on new information


What is random variable?

In probability, a real-valued function, defined over the sample space of a random experiment, is called a random variable.

What are types of Distributions?

Mostly we have two types of distributions. Those are the Discrete Distribution and Continuous Distribution.

What is Discrete Distribution?

A discrete probability distribution is used to model the probability of each outcome of a discrete random variable. This distribution is used when the random variable can only take on finite countable values.

It's probability mass function


What is Continuous Distribution?

A continuous distribution is one in which data can take on any value within a specified range(which may be infinite).


What is Normal Distribution?


What is Bernoulli Distribution?


What is Poisson Distribution?


What is Binomial Distribution?



What is Skewness?

Shape of the curve.



What is Kurtosis?

Height of the Curve.

What is Z Distribution?

Standard Normal distribution is nothing but Z- distribution. 

What is the Central Limit Theorem?

Suppose we have a population with mean and standard deviation which are i.i.n.d and if we take sufficiently large random samples from the population with replacement, then the distribution of sample means will be normally distributed.

What is Confidence Interval?

A probability that a parameter will falls a set of values.


What is Confidence Level?
Confidence Level: The percentage of all possible samples that are expected to include the true population parameter.

What is Hypothesis Testing?

Hypothesis means it is statement or claims that we want to test.

What are the types of Hypothesis Testing's?

We have two types of Hypothesis. Those are Null Hypothesis and Alternative Hypothesis.

What is Null Hypothesis?

There is no significant difference between the population mean.

What is Alternative Hypothesis?

There is a significant difference between the population mean.

What is a Significance Level?

The significance level of an event is the probability that the event could have occurred by chance.


What is Margin of Error?

Margin of Error: Max distance the true population mean is expected to be from the sample estimate

What is a Critical Region?

A critical region, also known as the rejection region, is a set of values for the test statistic for which the null hypothesis is rejected.

What is a One-tailed test?



What is a Two-tailed test?



What is a Degree of Freedom?



What is Chi-square Distribution?




What is the Chi-square test  and Use Cases?



What is F Distribution?





What is the F-test and Use Cases?


Comments

Popular posts from this blog

Important Topics in Statistics for Data Sce

  Some important topics in Statistics: - Central Limit Theorem - Mean CLT Statement: For large sample sizes, the sampling distribution of means will approximate to normal distribution even if the population distribution is not normal. If we have a population with mean μ and standard deviation σ and take large random samples (n ≥ 30) from the population with replacement, then the distribution of the sample means will be approximately normally distributed. Why CLT is important? The field of statistics is based on fact that it is highly impossible to collect the data of entire population. Instead of doing that we can gather a subset of data from a population and use the statistics of that sample to draw conclusions about the population. In practice, the unexpected appearance of normal distribution from a population distribution is skewed (even heavily skewed). Many practices in statistics such as hypothesis testing make this assumption that the population on which they work is normall...