Chapter 16
I. Correlation
A. Definition
1. Abbreviated as r.
2. Used to measure and describe a relationship between 2 variables.
3. Usually requires 2 scores for each individual (1 score for each of 2 variables).
4. Scores are often identified as X and Y.
5. Correlations are often depicted in scatterplots.
B. What Does a Correlation Tell You?
1. Direction of the relationship:
2. Form of the relationship:
3. Degree of the relationship:
C. How Are Correlations Used?
1. Prediction:
2.Validity:
3.Reliability:
4. Theory Verification:
II. Pearson Correlation (or Pearson's r)
A. Defintion
1. The most common type of correlation
2. Used for linear relationships between variables
B. Conceptual formula:
the degree to which X and Y vary together (covariability)
r =
------------------------------------------------------------------------
the degree to which X and Y vary separately (variability)
C.Covariance is the extent to which two variables vary together;
the degree to which a change in one leads to a predictable change
in the other.
D.With a perfect (1.00 or 1.00) r, every change in one variable leads to a predictable change in the other.
E.With a 0 r, change in one variable never leads to a predictable change in the other.
III. Correlation and Causation:
A. r tells us that 2 variables are related, not that one causes
the other.
B.If we have a high correlation, we have two things that tend to occur at the same time; however, this does not mean that one is causing the other to occur.
C.Restriction of Range -
D.Outriders -
E.Coefficient of determination: r2 -
IV. Hypothesis Tests - OMIT
V. Spearman's rs
A.Definition - measures the degree of linear relation between two ordinal variables. Can be used when data are interval or ratio by converting to ranks.
(Fig. 15.12 overhead)
1. Summary
a.Spearman is used when original data are ordinal (or when one is
ordinal and one is interval or ratio).
b.Spearman is used with interval or ratio data to convert a
monotonic relation to a linear one.
B. Calculation using Pearson formula
1.Rank data - it doesn't matter whether you rank from
highest-to-lowest or lowest-to-highest as long as both X and Y
are ranked the same way and you make sure that X and Y are ranked
separately
a. ranking tied scores - all ties get mean of tied ranks
| Original | Data |
Ranked |
Data |
||||
| X | Y | X | X2 | Y | Y2 | XY | |
| 3 | 12 | 1 | 1 | 5 | 25 | 58 | |
| 4 | 5 | 2 | 4 | 3 | 9 | 6 | |
| 5 | 6 | 3 | 9 | 4 | 16 | 12 | |
| 10 | 4 | 4 | 16 | 2 | 4 | 8 | |
| 13 | 3 | 5 | 25 | 1 | 1 | 5 | |
| á | 15 | 55 | 15 | 55 | 36 |
2. Compute SSX, SSY, and SP
3. Compute rS
C.Calculation using Spearman formula - use only when there are
few ties, otherwise use Pearson formula
1.Rank data - same as with Pearson
a. compute D, the difference between X and Y ranks
| Original | Data |
Ranked |
Data |
|||
| X | Y | X | Y | D | D2 | |
| 3 | 12 | 1 | 5 | -4 | 16 | |
| 4 | 5 | 2 | 3 | -1 | 1 | |
| 5 | 6 | 3 | 4 | -1 | 1 | |
| 10 | 4 | 4 | 2 | 2 | 4 | |
| 13 | 3 | 5 | 1 | 4 | 16 | |
| á | 15 | 15 | 0 | 38 |
2. Spearman formula
3. Compute rS
VI. Regresion
A.Definition - the statistical technique for finding the
best-fitting straight (linear) line for a set of XY pairs is
called regression, and the line is called the regression
line.
B. Equation for regression line using the slope-intercept form:
1. slope, b:
2. intercept, a:
(Fig. 15.14, 15.16 & 15.17 overheads)
C. Computation of the regression equation:
| X | Y | X2 | Y2 | XY |
| 7 | 11 | 49 | 121 | 77 |
| 4 | 3 | 16 | 9 | 12 |
| 6 | 5 | 36 | 25 | 30 |
| 3 | 4 | 9 | 16 | 12 |
| 5 | 7 | 25 | 49 | 35 |
| á = 25 5 |
30 6 |
135 | 220166 |
D. Limitations for prediction
1. Do not use for scores outside the range of the original data
2. Prediction will not be perfect unless r = +/- 1.0
3. Do not use if relation is nonlinear
E. Partitioning of variance