Question 1

What is multiple regression and when should I use it?

Accepted Answer

Multiple regression models relationship between one outcome (Y) and multiple predictors (X₁, X₂, ...): Y = a + b₁X₁ + b₂X₂ + ... Use when: (1) Multiple factors affect outcome, (2) Want to control for confounds, (3) Improve prediction accuracy. Example: House price from size, bedrooms, location. Each coefficient shows effect holding others constant. Benefits: (1) More accurate predictions, (2) Partial effects (isolate each variable), (3) Control confounding. Better than multiple simple regressions which ignore variable relationships.

Question 2

How do I interpret coefficients in multiple regression?

Accepted Answer

Each coefficient (b) = change in Y for 1-unit increase in that X, HOLDING ALL OTHER Xs CONSTANT (partial effect). Example: Salary = 20,000 + 3,000*Education + 2,000*Experience. Education coefficient: Each extra year of education adds $3,000, assuming same experience. Experience coefficient: Each extra year adds $2,000, assuming same education. This differs from simple regression which ignores confounding. Intercept = Y when all Xs = 0 (often meaningless). Sign: positive coefficient = positive effect, negative = inverse effect.

Question 3

What is R-squared vs Adjusted R-squared?

Accepted Answer

R^2: Proportion of variance explained (0-1). Always increases when adding variables, even random ones! ADJUSTED R^2: Penalizes adding weak predictors. Better for comparing models. Formula: R^2_adj = 1 - [(1-R^2)(n-1)/(n-p-1)] where p=number of predictors. Example: 2 predictors, n=50, R^2=0.80 → R^2_adj=0.79. Add useless 3rd predictor: R^2=0.81 (higher!), R^2_adj=0.78 (lower, correctly shows model got worse). Use R^2 for single model description. Use adjusted R^2 for model comparison. Rule: R^2_adj much lower than R^2 suggests overfitting.

Question 4

What is multicollinearity and why is it a problem?

Accepted Answer

Multicollinearity = predictors highly correlated with each other. Problems: (1) Unstable coefficients (small data changes cause huge coefficient changes), (2) Large standard errors (reduced significance), (3) Wrong signs, (4) Hard to interpret partial effects. Example: Predict weight from height_inches and height_cm (r=1.0) → coefficients meaningless. Detection: VIF (variance inflation factor) >10, correlation matrix, condition number >30. Solutions: (1) Remove redundant variables, (2) Combine correlated variables (average, PCA), (3) Ridge regression. Note: Multicollinearity affects interpretation, not prediction accuracy!

Question 5

How do I test overall model significance (F-test)?

Accepted Answer

F-test: Tests if ANY predictors are useful vs all coefficients = 0. H₀: �^2₁=�^2₂=...=�^2ₚ=0 (no relationship). F = (R^2/p) / [(1-R^2)/(n-p-1)]. Large F, small p → model significant. Example: 3 predictors, n=50, R^2=0.60 → F=(0.60/3)/[(0.40/46)]=23.0, p<0.001 (highly significant model). Compare to t-tests for individual coefficients (can have significant F but no significant individual t). F significant but all t non-significant suggests multicollinearity. Always report F-statistic, df, and p-value.

Question 6

How do I test individual predictor significance (t-tests)?

Accepted Answer

Each predictor gets t-test: H₀: �^2ᵢ=0 (no effect after controlling others). t = b/SE_b with df=n-p-1. Example: Education coefficient b=3,000, SE=800, n=100, p=3 → t=3.75, df=96, p<0.001 (significant). Confidence interval: b +/- t**SE_b. If CI includes 0 → not significant. Common issue: Significant in simple regression but not multiple (due to confounding or multicollinearity). Or non-significant alone but significant when controlling others (suppressor variable). Report: coefficient, SE, t, p-value, CI.

Question 7

How many predictors can I include in my model?

Accepted Answer

Rule of thumb: Need at least 10-20 observations per predictor. Example: 100 observations → maximum 5-10 predictors. More predictors: Pros: Better R^2, control more confounds. Cons: Overfitting, multicollinearity, harder to interpret, need more data. Too many predictors (p ≈ n): Perfect fit on training data but terrible predictions on new data. Solutions: (1) Variable selection (stepwise, LASSO), (2) Domain knowledge (include only theory-based predictors), (3) Cross-validation, (4) Regularization (ridge/LASSO). Quality over quantity - 3 good predictors beat 10 mediocre ones!

Question 8

What is the difference between standardized and unstandardized coefficients?

Accepted Answer

UNSTANDARDIZED (b): Original units. "1-unit increase in X increases Y by b." Compare within same units only. Example: b_height=2 means 1 inch adds 2 lbs. STANDARDIZED (�^2, beta): In standard deviation units. "1 SD increase in X increases Y by �^2 SDs." Allows comparing importance across different units. Example: �^2_height=0.6, �^2_age=0.3 → height twice as important. Calculate: �^2 = b * (SD_x/SD_y). Use unstandardized for: interpretation, prediction. Use standardized for: comparing relative importance. Report both in publications.

Question 9

How do I select the best subset of predictors?

Accepted Answer

Methods: (1) FORWARD: Start empty, add best predictor, repeat. (2) BACKWARD: Start with all, remove worst, repeat. (3) STEPWISE: Combination of forward/backward. (4) ALL SUBSETS: Try every combination, pick best. (5) THEORY-DRIVEN: Include based on domain knowledge (best!). Criteria: Adjusted R^2, AIC (lower better), BIC, cross-validation error. Example: 5 potential predictors → backward elimination removes 2 non-significant → final model with 3 predictors, R^2_adj=0.75. Caution: Stepwise prone to overfitting. Always validate on holdout data. Theory beats algorithms!

Question 10

What assumptions does multiple regression require?

Accepted Answer

LINEAR: Y linear in parameters (can have X^2). INDEPENDENCE: Observations independent. HOMOSCEDASTICITY: Constant error variance. NORMALITY: Residuals normally distributed (for inference). NO MULTICOLLINEARITY: Predictors not too correlated. NO OUTLIERS: Influential points distort results. Check: (1) Scatter plots (linearity), (2) Residual plots (patterns, heteroscedasticity), (3) Q-Q plot (normality), (4) VIF (multicollinearity <10), (5) Cook's distance (outliers <1). Violations: Transform variables, robust regression, bootstrap, remove outliers (with justification).

Question 11

How do I interpret and use dummy variables for categorical predictors?

Accepted Answer

Dummy variables encode categories as 0/1. Need k-1 dummies for k categories (avoid dummy trap). Example: Color (Red/Blue/Green) → Dummy_Blue (0/1), Dummy_Green (0/1), Red=reference (both=0). Coefficient = difference from reference. Salary = 50,000 + 5,000*Blue + 8,000*Green. Blue workers earn $5,000 more than Red. Green earn $8,000 more than Red. Blue vs Green difference = 8,000-5,000 = $3,000. Test overall effect: F-test on all category dummies together. Interactions: Can include Dummy*Continuous (different slopes per group).

Question 12

What is the difference between prediction interval and confidence interval?

Accepted Answer

CONFIDENCE INTERVAL: For average Y at given Xs. Narrower. "We're 95% confident the average house price for 2000 sqft is $180k-$220k." PREDICTION INTERVAL: For individual Y. Wider (includes individual variation). "A specific 2000 sqft house will likely cost $150k-$250k." Formula: PI = ŷ +/- t*SE*sqrt(1 + 1/n + leverage), CI uses sqrt(1/n + leverage). Prediction interval always wider than confidence interval. Use CI for: estimating population means. Use PI for: predicting individual cases (more realistic for most applications).

Multiple Regression Calculator

Formula

Example Calculation

Frequently Asked Questions