🎲 Probaility Mass Function (Discrete Data)
Used when outcomes are countable — like coin flips, number of visitors, or email opens.
| Distribution | Variable Type | Shape / Nature | Typical Use Case | Data Science Example |
|---|---|---|---|---|
| Bernoulli | Binary (0/1) | Two outcomes only | Success/Failure | Spam (1) or Not Spam (0) |
| Binomial | Count of successes in n trials | Symmetric for p=0.5 | Series of coin flips | A/B test results |
| Poisson | Count of events in time or space | Skewed right | Number of arrivals | Website hits/hour |
| Geometric | Trials until first success | Exponentially decreasing | Waiting for first event | First customer purchase |
| Negative Binomial | Trials until r-th success | Similar to geometric | Repeat experiments | Marketing conversions |
| Multinomial | Categorical outcomes (>2) | Multiple discrete bars | Multi-class probabilities | NLP topic modeling |
| Hypergeometric | Sampling without replacement | Finite population | Defects in batch | Quality control |
🧩 PMF Graph Look: Bars or spikes — each bar = probability of one discrete value.
🌊 Probaility Density Function (Continuous Data)
Used when values are measurable and continuous — like height, time, or temperature
| Distribution | Variable Type | Shape / Nature | Typical Use Case | Data Science Example |
|---|---|---|---|---|
| Normal (Gaussian) | Continuous | Bell curve, symmetric | Natural phenomena | Regression errors, GMM |
| Uniform | Continuous | Flat line | Equal probability | Random sampling, simulation |
| Exponential | Continuous (positive) | Rapid decay | Time between events | Time-to-failure prediction |
| Log-Normal | Continuous (positive) | Right-skewed | Positive-only variables | Income, transaction values |
| Gamma | Continuous (positive) | Right-skewed, flexible | Duration modeling | Reliability analysis |
| Beta | Continuous (0–1) | Bounded (0–1) | Probabilities, ratios | Bayesian modeling |
| Chi-Square | Continuous (positive) | Right-skewed | Variance modeling | Hypothesis testing |
| Weibull | Continuous (positive) | Variable skew | Lifetimes, failure | Survival analysis |
| Pareto | Continuous (positive) | Heavy-tailed | Inequality, extremes | Customer lifetime value |
| Cauchy / t-distribution | Continuous | Heavy tails | Outlier-robust modeling | Bayesian inference |
🧩 PDF Graph Look: Smooth curve — area under curve = total probability = 1.
💡 Quick Comparison Table
| Category | Used For | Key Examples | Graph Shape |
|---|---|---|---|
| PMF | Discrete probability | Bernoulli, Poisson, Binomial | Spikes / Bars |
| Continuous probability | Normal, Exponential, Beta | Smooth curve | |
🧠In Data Science Pipelines
| Model Type | Underlying Distribution | Function Used |
|---|---|---|
| Classification (binary/multi-class) | Bernoulli, Multinomial | PMF |
| Regression | Normal (residuals) | |
| Clustering | Gaussian (each cluster) | |
| Anomaly Detection | Gaussian, Pareto | PDF/CDF |
| Reliability / Survival | Weibull, Gamma | PDF + CDF |
| Logistic Regression | Logistic | CDF |
| Bayesian Inference | Beta, Normal |
0 Comments