Regression and Dummy Variable

A Reason to thank SPSS

A. ًWhat and why to use a Dummy Variable؟

A dummy variable is a variable for which all cases falling into a specific category assume the value of 1 and all cases not falling into that category assume a value of zero.

In other words, dummy variables are used to examine group differences.

 Thus, for the variable region:

South would be 1 and all other regions would be zero. When we regress the attitudes toward abortion on the dummy variable "south" we are trying to measure the difference in attitudes toward abortion between the people of the south versus everybody else.

The idea is very simple:

  • Transform a categorical/nominal variable into a binary variable.

  • Coding convention: 0 for the value that does not have the characteristic and 1 for the value that has the characteristic.

  • So 1 for women and 0 for men for the variable "dumgender" or "female"

Example:

What impact does being from the South have on one’s attitude toward abortion?

Use NES2000 file for this one.

 Let us examine this relationship by converting the nominal/categorical variable “region” into a dummy variable to indicate the “south” individuals versus the rest. Thus, we are going to create a dummy variable for south 1 and non-south 0.

B.  How to create a dummy variable based on a nominal variable?

1-      In SPSS, Transform => Re-code č Recode into Different Variables č Type the name of the Output variable (ex. Dumsouth) and click change.

2-     Click “Old and New Values” č Enter the old value of South (3) and the new value (1) č Add

3-     Click “All other values” and then make the New Value (zero) č add

4-     Click “System-or user-missing” on Old Value and “System-missing” on New Value č add.

5-     Continue č Ok doky…

 

 

6-     Now look at the last column in your dataset. It is called “dumsouth.” It is coded 1 for south in the region column and coded 0 for anything else.

7-     Now you can regress the attitudes toward abortion (DV) on the South variable (dumsouth) as the (IV). 

8-    We will end up with a result similar to the following:

C.  Interpretation:

People who belong to the south have attitudes that are statistically and significantly different from the attitudes of people who do not belong to the south. To be more specific, people who do not belong to the south have a higher support for the abortion than people who belong to the south. We inferred that from the minus coefficient of the dummy variable for the south “dumsouth.”

 To show that have a look at following the regression equations:

Dv

= constant

+ coefficient * IV

Attitudes toward abortion  =

3.089

-.308 * (being from the south)

 

 

 

 

Thus, for a person who is not from the south, the equation will be:

Dv

= constant

+ coefficient * IV

Attitudes toward abortion  =

3.089

-.308 * (0)

                                       =

3.089

- (0)

 Thus, the average non-Southerner will have a mean of 3 which is to support abortion if there is a clear need (according to the ordinal values of the variable abortion as shown below).

 

 

 However, if the respondent were from the south, he/she would be less supportive of abortion according to the same equation

DV

= constant

+ coefficient * IV

Attitudes toward abortion  =

3.089

-.308 * (1)

                                       =

2.781

 

 Note 1: This is not the final answer until we control for some other variables that may intervene with the variable of “dumsouth.”

 Note 2: There are some other techniques that one can use to create 3 dummy variables [or (k-1)] from a 4-value nominal variable. But we will not do that in this class.

Note 3: The constant (intercept) is the "expected value of Y when X is zero." Traditionally, the coefficient (slope) is the "change in Y for a one-unit change in X." But since X here is a dummy variable we expect the  the coefficient to be the change in the dependent variable with the existence or absence of the characteristic of the dummy variable.

Moataz's Formula to Interpret a Dummy Variable:

In case of a (statistically significant) positive relationship:

People who have the (1) characteristic of the Independent variable tend to be more supportive of the MAXIMUM of the Dependent variable.

Ex. People who belong to the south tend to be more religious.

Note: This interpretation indicates that there is a group difference between Southern people's attitudes toward abortion and non-Southern people's attitudes toward abortion.

In case of a (statistically significant) negative relationship:

People who have the (1) characteristic of the Independent variable tend to be less supportive of the MAXIMUM of the Dependent variable.

Ex. People who belong to the south tend to be less supportive of abortion.

 D. Exercise:

1-      Try to see the impact of the dummy variable “dumgender” (for the impact of being female) on the attitudes toward abortion.

2-     Try to see the impact of the dummy variable “dumrace” (for the impact of being white) on the attitudes toward abortion.

3-     Try to put the three dummy variables for south, female and white together in the same model. Compare the coefficients of the full-effect models (for each variable alone) and the partial effect model (for all of them together). This comparison is our introduction to how to control for the effect of certain variables.

 

The full effect model for being white:

 The full effect model for being female:

The full effect model for being from the south:

The three variables in one model (the partial effects model):

 

A graphical example of the impact of the dummy variable.