Final answer:
The sum of dummy variables for any given observation equals one if as many dummy variables as there are categories are included. Dummy variables represent different categories, one of which will be left out as the reference category in regression models to avoid multicollinearity.
Step-by-step explanation:
When we include as many dummy variables as there are categories in a regression model, their sum will equal one for any given observation. Dummy variables are used in statistical analysis to represent categorical data with two or more categories. Typically, one category is left out as the reference group, and the other categories are compared to it. For example, if we have three categories (A, B, C) for a variable, we would create two dummy variables: one representing B (with a value of 1 if the observation is in category B, 0 otherwise) and another one representing C (with a value of 1 if the observation is in category C, 0 otherwise). The reference category A does not need a dummy variable since it is represented by the case where both dummy variables are 0.
However, if a researcher decides to create a dummy variable for every category, they must ensure that these dummy variables are not included in the same regression model without modifications — else this would lead to a problem known as multicollinearity. Multicollinearity is when one independent variable in a regression model can be linearly predicted from the others with a substantial degree of accuracy, which causes issues for the statistical analysis. In the case where each category has a dummy variable, adding them together for any single observation would yield the sum of one, because each observation can only belong to one category at a time. It is also important to note that when running the regression, one would need to exclude one dummy variable to serve as the baseline, this method prevents the multicollinearity that would occur if all were included.