Introduction

  • Z distribution is a special normal distribution with mean 0 and deviations 1.
  • Quantiles is splitting the distribution into different sizes based on probability mass.
  • SEM (Standard Error of means) is the same as the variance of the sample.
  • 95% of the normal distribution is within 2 ฯƒ\sigma (standard deviations) of the mean.
  • 0.025 quanta:
  • Median is the point of the 50% quantile.

Lecture 2

  • ฮผ\mu and ฯƒ\sigma are usually unknown. Thats why we use Greek letters. They are for the gods to know.

  • E(x)E(x) is the expected value, which is our measurement of the mean. For m samples: E(x)=โˆ‘i=1mfiXi=โˆ‘1mXi E(x) = \sum_{i=1}^{m} f_i X_i = \sum \frac{1}{m} X_i

  • The variance is defined as: S2=E(โˆ‘i=1m(Xiโˆ’m)2)=โˆ‘1nโˆ’1(xiโˆ’m)2 S^2 = E \Big(\sum_{i=1}^{m}(X_i - m)^2 \Big) = \sum \frac{1}{n-1} \Big( x_i - m \Big)^2

  • In R:

    • dnorm : is the Probaility Density Function.
    • pnorm : is the Cumulative Density Function.
    • qnorm : is the Inverse CDF. Get XX from p(X)p(X).
  • For CLT, we need random independent and identical (iid) variables. But for large nn and finite variance for each of the variables, the identical criteria can be dropped.

  • For small sample sizes, we use the quantiles of the t-distribution instead of the z-distribution.

Lecture 3

Maximum Likelihood Estimation

  • We consider the exponential distribution: f(x,ฮป)=ฮปeโˆ’ฮปx f(x, \lambda) = \lambda e^{- \lambda x}

  • Maximum Likelihood Estimation of the distribution is done to find the correct parameter. [Curve Fitting]

Hypothesis Testing

  • A ฯ‡2\chi^2 distribution is defined as a sum of square of kk Normal Distributions N(0,1)\N(0, 1). kk is also the degrees of freedom, which is the only parameter that the distribution takes as input. ฯ‡2=โˆ‘ikZi2 \chi ^2 = \sum_{i}^{k} Z_i^2

  • Chi squared distribution

  • Emperical Cumulative Distribution Graph

  • Difference between one tail and two tail

  • X

    non-parametric

    parametric

    proportions

    Wilcox-Test

    Kraken-Wallas...

    ANOVA

    t-test

  • Why is proportions a unique data type?

  • Fisher test can only be used for 2X2 table, whereas ฯ‡2\chi^2 test can be used for n-column tables.

  • Fisher -> you get Hyper-geometric distribution

  • In R the functions are fisher.test and chisq.test.

  • Fisher Test: Odds ratio