Scale and Index Development

A scale is a cluster of items (questions) that taps into a unitary dimension or single domain of behavior, attitudes, or feelings.

They are sometimes called composites, subtests, schedules, or inventories.

Aptitude, attitude, interest, performance, and personality tests are all measuring instruments based on scales.

A scale is always unidimensional, which means it has construct and content validity.

A scale is always at the ordinal or interval level, but it's conventional for researchers to treat them as interval or higher.

 Scales are predictive of outcomes (like behavior, attitudes, or feelings) because they measure underlying traits

(like introversion, patience, or verbal ability).

It's probably an overstatement, but scales are primarily used to predict effects, as the following example shows:

 

An Example of a Scale Measuring Introversion:

I blush easily.                    (Strongly Agree .....................Strongly disagree)
At parties, I tend to be a wallflower.                (Strongly Agree .....................Strongly disagree)
Staying home every night is all right with me.           (Strongly Agree .....................Strongly disagree)
I prefer small gatherings to large gatherings.           (Strongly Agree .....................Strongly disagree)
When the phone rings, I usually let it ring at least a couple of times.           (Strongly Agree .....................Strongly disagree)

 

    A great many scales can be found in the literature or in handbooks (Brodsky & Smitherman 1983),

 and beginning researchers are well-advised to borrow or use an established scale before attempting to create one of their own.

However, most researchers are interested in breaking new ground, and have at least some hunch about what are variously called "tipping points", "the last straw", "going over the edge", or "snapping."

There are four ways to construct scales:

  • Thurstone scales
  • Likert scales
  • Guttman scaling
  • Semantic differential    

In this class, we focus only on Likert and Guttman scales.

Likert scales were developed in 1932 as the familiar five-point bipolar response format most people are familiar with today.

 These scales always ask people to indicate how much they agree or disagree, approve or disapprove, believe to be true or false.

There's really no wrong way to do a Likert scale, the most important thing being to at least have five response categories

(for ordinal-treated-as-interval measurment). Some appropriate examples appear below:

Never     Seldom     Sometimes     Often     Always
Strongly Agree     Agree     About 50/50     Disagree     Strongly Disagree     Don't Know
Strongly Approve     Approve     Need more information     Disapprove     Strongly Disapprove
Strongly Opposed    Definitely Opposed    A bit of both     Definitely Unopposed     Strongly Unopposed

 

    The "don't know" is the second example is optional, and some people prefer not to use it since it's an odd response category.

The examples showing "About 50/50", "Need more information", or "A bit of both" are preferable to use.

You can increase the ends of the scale by adding "very" to create 7-point scales, which tends to reach the upper limits of reliability (Nunnally 1978).

 It's best to use as wide a scale as possible since you can always collapse the responses into condensed categories later on for analysis purposes. 

 Guttman scaling was developed in the 1940s and is a technique of mixing questions up in the sequence they are asked

so that respondents don't see that several questions are related. A lot of irrelevant questions surround the important questions.

The scoring system is based on how closely they follow a pattern of ever-increasing hardened attitude toward some topic in  the important questions. Let's take the example of attitude toward capital punishment:

 

For each of the following, indicate if you SA, A, 50/50, D, or SD: 
1. Crime is a serious problem in the United States.           (Strongly Agree .....................Strongly disagree)
2. Police should be given more powers.           (Strongly Agree .....................Strongly disagree)
3. More criminals should be given the death penalty.            (Strongly Agree .....................Strongly disagree)
4. The U.S. ought to do something about drug exporting countries.           (Strongly Agree .....................Strongly disagree)
5. The military ought to be used to patrol our streets.            (Strongly Agree .....................Strongly disagree)
6. Inmates on death row ought to be executed quickly.           (Strongly Agree .....................Strongly disagree)
7. Most politicians are too soft on crime.            (Strongly Agree .....................Strongly disagree)
8. Lethal injection is too merciful for those who deserve it.            (Strongly Agree .....................Strongly disagree)
9. Crime is destroying the social fabric of our society.            (Strongly Agree .....................Strongly disagree)
10. They ought to jack up the voltage when they electrocute criminals.          

    In the above example, items #3, 6, 8, and 10 make up the scale for attitude toward capital punishment.

Everything else is irrelevant. You should see how the relevant items lead progressively to a harder and harsher attitude.

If most of the respondents you study (or the top 27% of them)  hold fast to this hierarchical pattern, you've captured a very one dimensional aspect of your construct. In addition, you can calculate something called the coefficient of reproducibility, which is  simply 1 minus the number of breaks with the hardened response pattern divided by the total number of responses.

Guttman scaling is very appealing, but it's not all that well-received by the scientific community. 

         

INDEX DEVELOPMENT

    An index is a set of items (questions) that structures or focuses multiple yet distinctly related aspects of a dimension or domain of behavior,  attitudes, or feelings into a single indicator or score. They are sometimes called composites, inventories, tests, or questionnaires.

Like scales, they can measure aptitude, attitude, interest, performance, and personality, but the only kind of validity they have is convergent (hanging together), content, and face validity.

It is possible to use some statistical techniques (like factor analysis) to give them better construct validity (or factor weights), but it is a mistake to think of indexes as multidimensional (no such word exists) since even the most abstract constructs are assumed to have  unidimensional characteristics. Indexes are usually at the ordinal, but mostly interval level. Indexes can be predictive of outcomes (again, using statistical techniques like regression), but they are designed mainly for exploring the relevant causes or underlying symptoms of traits (like criminality, psychopathy, or alcoholism). It's probably an overstatement, but indexes are used primarily  to collect causes or symptoms, as the following example shows:

 

An Example of an Index Measuring Delinquency:

I have defied a teacher's authority to their face.              (Strongly Agree .....................Strongly disagree)
I have purposely damaged or destroyed public property.
           (Strongly Agree .....................Strongly disagree)
I often skip school without a legitimate excuse. 
           (Strongly Agree .....................Strongly disagree)
I have stolen things worth less than $50.
I have stolen things worth more than $50.
           (Strongly Agree .....................Strongly disagree)
I use tobacco.
I like to fight.
           (Strongly Agree .....................Strongly disagree)
I like to gamble.
I drink beer, wine, or other alcohol. 
           (Strongly Agree .....................Strongly disagree)
I use illicit drugs. 
           (Strongly Agree .....................Strongly disagree)

 

    Indexes are usually administered in the form of surveys or questionnaires. It's only at the time of report writing that you claim  to have developed an index. You'll need an ideal  response rate of 35% on your questionnaire, and at least a 5-point Likert scale  for the response categories. How we create good questionnaires is the subject of another lecture. There are a variety of ways to do surveys. Factor analysis, cluster analysis, or other advanced statistical techniques are typically used for item analysis of surveys.

FACTOR ANALYSIS AND CLUSTER ANALYSIS

    These are advanced methods of data analysis that require special training and proficiency at using computerized statistics programs  like SPSS. Factor analysis can help develop an index, test the unidimensionality of a scale, assign weights (factor loadings) to items in  an index, and statistically reduce a large number of indicators to a smaller set. It works by a process known as ipsative scoring which places all the numbers in a variance-covariance matrix and then performs multiple iterations (repeats) on this matrix until the most statistically  meaningful common denominators can be found. These meanings may or may not be theoretically significant. If you're lucky, only one factor, or common denominator will be produced. Ordinarily, factor analysis produces 4-5 such factors, and the researcher then has  to justify discarding them in favor of the core set of items for their index or scale. 

    Cluster analysis is a similar technique, but more in keeping with the way reliability coefficients are produced. It involves iterative computer runs on your data matrix that continually resorts and reclassifies your groupings and categories into the most elegant mathematical matrix. The result is a tree and branch diagram which shows you which items are are most connected to the others.

 Both factor and cluster analysis are avoided by many researchers in favor of plain old fashioned looking at inter-item correlations.  

PRACTICUM:
1. Construct a short Guttman scaling series of questions on a topic of your choice.
2. Construct some Likert scale items on a series of questions of your choice.

 

How to do factor analysis in SPSS:

 

1-  You should have some theoretical reasoning why a couple of scales (columns in SPSS, questions in a survey) may be indicators of the same factor (i.e. independent or dependent variable).

2-  Analyze ==> Data reduction

3-  Factor

4-  Select the variables that you suspect to measure/reflect the same dimension.

5-  Rotation.

6-  Varimax.

7-  Ok… doky..

8-  Look at the table of “rotated Component Matrix”

9-  The columns of “Components” represent the number of factors you can extract from the variables you suggested.

10-    Each group of variables with high and close coefficients fall into the same dimension and measure the same factor.

(So you can safely combine them together to come up with one variable (scale/index) instead of many of them).

11-    If you have another column (2), then there is another factor (scale or index)that you can extract.

12-   Your theoretical mind should be always present in deciding what scale you are building.

13-    To make sure that your inferences are ok, visit analyze è scale è Reliability Analysis, add the scales that you inferred to be measuring the same dimension and ok dokky… the Alpha should be at least .65 to claim that the indicators are measuring the same factor.

14-    Now you can safely combine the scales that measure the same thing if they are using the same point scale (5 point scale or 7 point scale or 100 point thermometer) using the “transform è compute” to add them together in a new variable.

15-    If they are not following the same point-scale, we will need to standardize them first through “Analyze è descriptives è Save standardized values as variables”

16-    The standardize variables are going to appear as new columns start with “Z” and they all will be around the (1).

You can add them together to create your new variable.

 

Ex. Index of the “Democratic party appeal,” or “post-materialist attitudes such as  feminism and environmentalism and human rights.” 

 Use NES2000 for this purpose.