Statistics
- Should falsify an explatory model, not a null model.
Bayesian Statistics
It is a framework to build statistical models.
It extends proportional logic (true/flase) to continuous plausability.
More complex to obtain analytical results: but easy to compute computationally (MCMC technique).
Models with more ways to realize the data are more plausable.
What is a Random-walk Metropolis ?
Outliers: Robust linear regression
- Instead of discarding outliers, we can do Rank-tests (Mann-Whitney test)
- For outliers: Use distributions with fatter tails. The outliers are then penalised less, and a better linear regression model is achieved which includes outliers.
- Robust linear regression is just a linear regression, but with Student T-distribution.
Pooling of Data
- Complete pooling: ignore group structure and estimate global effects.
- Unpooled data: estimate effect in each group seperately.
- This has low quality of estimates in smaller groups.
- Partial pooling: assume each group can be different, but they are all part of some larger group. There could be deviations from the large group but not arbitrary.
Poisson Regression Coefficients
- Link function for probabilities: log-odds to map probabilities to a linear
$$
- \infty < log \frac {p}{1-p} < \infty $$
- Inverse of log-odds or the intercept:
$$ \frac {1}{1 + e^{\beta_0}} $$
Zero-inflated models
$$ ZIP(k, \lambda, p_0) = \begin{array} { & \begin{array} \ p_{0} + (1-p_0)Poi(0, \lambda) & k=0 \ (1-p_0)Poi(k, \lambda) & k > 0 \end{array} \end{array} $$
We now have 2 generalised linear models:
- One for the probability of zeros (binomial)
- One for the number of eggs (Poisson)
In R:
obs = pm.ZeroInflatedPoisson("eggs", psi=1-p0, theta=mu, observed=data)
Survival Analysis
- Track subject and “death” events.
- Right censoring: the event has not occured at the end of the study.
- Left censoring: TODO
- Traditionally used to study lifespan, but can be used to study other things as well:
- Survival Function:
$$ S(t) = P(T > t) $$
How to included censoring in your model? | Kaplan-Meier
- Includes Right-censoring events.
$$ \hat S = \sum_{t_i < t} \frac {n_i - d_i}{n_i} $$
$n_i$ : number of at-risk individuals at time.
$d_i$ : number of deaths until time $t_i$.
Standard Error formula: $$ SE(S(t)) = S(t) \sqrt{\sum \frac {d_i}{n_i(n_i - d_i)}} $$