Final answer:
The correct statement is that linear regression should be used to predict continuous outcomes and logistic regression for binary outcomes. Logistic regression can predict more than two outcomes, and for linear regression, the normal distribution of errors, not the dependent variable, is assumed.
Step-by-step explanation:
The statement that is true among the provided options is: a. We should use linear regression to predict continuous outcomes and logistic regression to predict binary outcomes. Linear regression is employed when the dependent variable is continuous, and it models the relationship between the dependent and independent variables through a linear equation. Logistic regression, on the other hand, is used for binary or categorical dependent variables and can indeed be adapted to predict more than two outcomes (multinomial logistic regression).
When conducting a linear regression analysis, the following are typically assessed:
- Independent and dependent variables: The independent variable is the predictor or factor that you manipulate, while the dependent variable is the outcome you measure.
- Scatter plot: This graphically shows the relationship between the two variables, where each point represents an observation.
- Line of best fit and correlation coefficient: Regression analysis calculates these to summarize the relationship between the variables; the line of best fit minimizes the distance between itself and all points, while the correlation coefficient measures the strength and direction of this linear relationship.
- Interpretation of the correlation coefficient: The significance of this statistic indicates how strongly the two variables are related.
- Linear relationship analysis: Based on the scatter plot and statistical measures, we can infer if a linear relationship exists.
The slope of the regression line indicates the change in the dependent variable for a one-unit change in the independent variable, while the y-intercept is the value of the dependent variable when the independent variable is zero. The goodness of fit, such as R-squared, indicates how well the regression line models the data.
Regarding errors, while linear regression assumes residuals (errors) are normally distributed with a mean of zero, it does not require the dependent variable itself to be normally distributed. This is a common misunderstanding.