**REGRESSION ANALYSIS:**

(Note: The following comments and examples of regression analysis are meant to complement the
readings on regression analysis: SPSS Applications Guide, Chapter 12 and Munro, Chapter 12 up
to p.274. They are not meant as a complete guide to regression analysis.)

**(A) Regression and Analysis of Variance:**

Like Analysis of Variance and T-Test, Regression Analysis is also part of the group of statistical
models known as the General Linear Model (GLM). They all have in common that they start with
a measure of the total variation of the scores in the outcome/dependent variable. This measure is
known as the total sum of squares (TSS), which is defined as: Sum (Y_{i} - Y(bar))^{2}. In other
words, we start with the sum of all squared deviations around the overall (sample or population)
mean. Incidentally, since the GLM starts with the squared deviations from the mean of the
outcome variable, it must be possible to compute a mean and the distances/deviations from the
mean must be defined. This requires that the outcome variable is measured at the interval or ratio
level. (Together, the two kinds of variables are often called 'continuous' variables).

The differences between analysis of variance models and regression models primarily have to do
with the properties of the independent or predictor variables. In analysis of variance models, the
independent variables tend to be nominal-level (or categorical) variables with arbitrary values
assigned to the categories. (Note: independent variables in analysis of variance models are often
called 'factors'.) Independent variables in regression analysis are often themselves continuous
(interval or ratio-level) variables. Both regression and analysis of variance models can handle all
types of independent variables; however, it is generally easier to use regression, when most of the
independent variables are continuous (since categorical variables must be represented through the
often cumbersome method of 'dummy-coding'), and to use analysis of variance, when most of the
indepedent variables are categorical factors. (In analysis of variance models, continuous
independent variables are treated as 'covariates', in which case the models are also called 'analysis
of covariance'.)

One last piece of terminology should be mentioned here. We started by saying that all GLM
models begin with the TSS. Then the fundamental question becomes: how much of the variation
in the particular outcome measure considered can be associated with or attributed to any and all
of the independent or predictor variables? This question is answered by decomposing the TSS into
two fundamental components: the 'explained' sum of squares and the 'unexplained' or 'error'
sum of squares. The 'explained' sum of squares is the amount of variation in the dependent
variable 'associated with' all the independent variables or factors. The 'unexplained' sum of
squares is the amount of variation in the dependent variable that is completely independent of the
independent variables or factors. Unexplained or error sum of squares may represent measurement
error in the outcome variable or it may represent ystematic variation that is NOT associated with
the predictor variables that are part of the model. It is always true, by definition, that TSS =
explained SS + unexplained SS. It is important to note here, that in analysis of variance and
regression we use slightly DIFFERENT terminology for the SAME sums of squares. In analysis
of variance, the unexplained sum of squares is called 'within group sum of squares' or WGSS,
while the same measure is called 'residual sum of squares' or 'error sum of squares' (ESS) in
regression analysis. Likewise, in analysis of variance, the sytematic variation associated with the
independent variable(s) is usually called 'between-group-sum-of-squares' or BGSS; this same sum
of squares is called 'regression sum of squares' or RSS in regression analysis. Thus, we get in
analysis of variance: TSS = BGSS + WGSS, and in regression analysis: TSS = RSS +ESS.

**(B) Simple Regression:**

So much for the preliminary terminology. Now, let's focus on regression analysis. Our first
example introduced in class was one of a SIMPLE (linear) regression. In a simple regression,
there is only one independent variable. Being a regression model, BOTH the dependent and
independent variables are continuous. Here, we focus on patient depression as the
outcome/dependent variable and the count of symptoms reported by the cancer patients as the
independent variable. It goes without saying that our implied research hypothesis is: as the
number of reported symptoms (like fatigue, nausea, vomiting, bleeding, swelling, etc.) increase,
we expect the depression score to increase as well.

The tables below show the output from the SPSS regression run.

Descriptive Statistics

Mean |
Std. Deviation | N | |

PCESD depression score (patient) |
11.02 |
7.73 |
783 |

PSYMCNT count of all reported symptoms |
7.96 |
4.42 |
783 |

This table shows the descriptive information about both the dependent and independent variables.

It tells us that the mean CESD depression score in this sample of 783 cancer patients is 11.02 with considerable variation in scores as indicated by the standard deviation of 7.73. (Incidentally, this table does not show that the depression scores range from a minimum of 0 to a maximum of 42 among these 783 cases.) The number of reported symptoms average almost 8 for the sample of 783 cases (they range from 0 to 27 - not shown in this table) and also display considerable variation: the standard deviation equals 4.42.

Correlation

PSYMCNT count of all reported symptoms | ||

Pearson Correlation | PCESD depression score |
.494 |

Sig. (1-tailed) |
.000 | |

N | 783 |

The correlation table shows that, as expected, symptom count and depression scores correlate positively
(which means: higher numbers of symptoms are associated with higher depression scores) and fairly
strongly. The correlation is .494. It is highly significant (p-value < .0005), which means that we reject the
null-hypothesis that the observed sample correlation is the result of a sampling fluke.

Model Summary

Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |

1 | .494 | .244 | .243 | 6.7245 |

a Predictors: (Constant), PSYMCNT count of all reported symptoms

The model summary provides useful information about the regression analysis. First, in the
'model' column, it tells us that there is only one independent variable that was entered into this
model. The next column represents the 'multiple R'. This is the correlation between the actually
observed dependent variable and the dependent variable as it is predicted by the regression
equation. In a simple regression with only one independent variable, this is the same as the simple
Pearson's correlation between the dependent and independent variable. 'R square' is the square of
R and is also known as the 'coefficient of determination'. It tells us what proportion (or
percentage) of the (sample) variation in the dependent variable can be attributed to the
independent variable(s). In our example, we can say that 24.4% of the variation in depression
scores among these cancer patients appears to be accounted for by the variation in their reported
number of symptoms. (The 'adjusted R square' refers to the best estimate of R square for the
population from which the sample was drawn.) Finally, the 'standard error of estimate' tells us
that, on average, observed depression scores deviate from the predicted regression line by a score
of 6.7. This is not surprising, since we already know that our regression model explains 24.4% of
the variation, it can NOT account for the other 75.6% which most likely represents both
measurement error in depression as well as other factors that influence depression that we have
not considered.

ANOVA

Model | Sum of Square |
df |
Mean Square |
F |
Sig. | |

1 | Regression | 11412.06 | 1 | 11412.06 | 252.38 | .000 |

Residual | 35315.61 | 781 | 45.22 | |||

Total | 46727.67 | 782 |

a Predictors: (Constant), PSYMCNT count of all reported symptoms

b Dependent Variable: PCESD depression score (patient)

This table represents the results from the analysis of variance associated with the regression
analysis. In the 'sum of square' column you find the decomposition of the total sum of squares
into regression sum of squares (= 'explained') and residual sum of squares (= 'error'). If you
divide the regression SS into the TSS, or 11,412.06/46,727.67 = .244, you get 'R square' the
proportion of variation in the dependent variable accounted for by the independent variable. The
'mean square' column shows the average variation associated with the regression and the
residuals and is computed by dividing the SSs by their respective degrees of freedom. (Note: The
full explanation of the concept of degrees of freedom is beyond this course. It can be shown that,
if one already knows the equation for the regression line and one knows N-2 individual observed
sample values for the CESD score, one can reconstruct the final 2 sample values from this
knowledge. Thus only N-2 sample values (=residual df.) are truly free to vary in the sampling
process. The mathematical proof requires calculus and is a bit involved.) Important for your
understanding in interpreting this output is the next column, containing the F- statistic. The F-statistic is a ratio of two numbers, the mean square (or average variation) associated with the
regression and the mean square (or average variation) associated with the residuals or errors. In
other words, the F-statitistic represents a ratio of explained variance to unexplained variance. It i
clear that if the independent variable(s) is/are not at all associated with or predict variation in the
dependent variable, then the regression sum of square equals 0 and F equals 0 also. The larger the
regression sum of squares in relation to the residual sum of squares, the more of the variation in
the dependent variable is explained by the independent variable(s). Of course, if RSS grows
relative to ESS, so does the F-value. This is the basis for an important statistical test. The null-hypothesis underlying the F-test is: all independent variables combined have NO effect on the
dependent variable, thus F = 0 in the population from which the sample was drawn. However,
because of sampling fluctuation, we would never expect to observe an F-value of exactly zero. So
our usual question becomes: Is the observed sample F-value so large, that it is unlikely that mere
random sampling fluctuation could have produced it? Our observed F-value of 252.38 has a p-value of .000 associated with it. Thus fewer than 1 in a thousand samples would randomly
produce such a large F-value if the samples come from a population in which the true F-value is 0.
Thus, we reject the null-hypothesis that the independent variable(s) don't explain any variation in
the dependent/outcome variable. Conclusion: in the population of canecr patients from which this
sample was drawn, depression is associated with symptom counts. (Note, since in a simple
regression there is only one independent variable, a significant F-test already tells us that the
(only) independent variable has a significant effect.

Coefficients

(Constant) |
PSYMCNT count of all reported symptoms | ||

Unstandardized Coefficients | B | 4.14 | .865 |

Std. Error | .496 | .054 | |

Standardized Coefficients | Beta | .494 | |

t | 8.35 | 15.886 | |

Sig. | .000 | .000 | |

95% Confidence Interval for B | Lower Bound | 3.16 | .758 |

Upper Bound | 5.12 | .971 |

a Dependent Variable: PCESD depression score (patient)

This table gives the actual estimates for the regression equation. The row of 'unstandardized
coefficients' or 'Bs' gives us the necessary coefficient values for the simple regression model. The
'constant' of 4.14 represents the intercept in the equation and the coefficient in the column labeled
by the independent variable (X = symptom count) represents the slope coefficient. At the bottom
of the table, we are told that the dependent variable (= Y) is the depression score. Thus, this
regression equation is:

Y(hat) = 4.14 + .865X, where Y(hat) is the predicted value of Y (or the predicted depression
score, and X = the symptom count, which is the predictor variable.

(Comment: How was this regression equation arrived at? The values for the intercept and the
slope coefficient were chosen in such a way that the squared deviations of observed scores from
the regression line are minimized. In this sense, the regression line is the 'line of best fit'. (How
are the values of the intercept and the slope arrived at? Again, through the application of
calculus, it is actually possible to compute the exact values for the intercept and the slope, based
solely on sample information on the Xs and Ys. Any advanced textbook on regression analysis
contains the derivation of these so-called 'normal equations'. Since the method leads to minimum
squared deviations of observed from predicted values, it has come to be known as the method of
'ordinary least squares' estimation.)

Let's go back to the regression equation.

The equation Y(hat) = 4.14 + .865 X is a sample estimate of the true population equation. Thus,
before we can interpret it, we need to answer our usual question: are the observed sample values
indicative of real effects in the population. As usual, we start with a null-hypothesis of 'no effect'.
In this case of a simple regression, the null-hypothesis would be: symptoms do not influence
depression in cancer patients. This verbal null-hypothesis would translate into a population
regression equation, in which the slope of the regression equation equals zero. Why, if the slope
coefficient (associated with X) equals zero, then any changes in X would have no effect on Y, the
dependent depression score. As we know by now, even though the slope coefficient may be
exactly equal to zero in the population, repeated sampling from this population will likely result in
a sample estimates of the slope coefficient that differ from one sample to the next. Thus our
'eternal question' is again: are the sample estimates of the slope coefficient so large, that it is
unlikely that mere sampling fluctuations cold have produced them? As always, we decide this
questions by comparing the observed sample estimate to the size of its standard error (which
indicates by how much such sample estimates vary, on average, from one sample to the next). In
our case, the standard error associated with the slope coefficient is .054 which is extremely small
in relation to the sample estimate of .865. In fact, if you divide .865 by .054, yo get the t-value of
15.886. It tells you that the observed sample slope coefficient is almost 16 standard errors larger
than zero. Since the sampling distribution of the slope coefficient follows the t-distribution, we
only need to ask the question, how likely is it that mere random sampling produces a slope
coefficient that differs from zero by almost 16 standard errors. The probability of that happening
by chance is practically non-existent. As the associated p-value of .000 tells us, we reject the null-hypothesis that this sample was drawn from a population in which the symptom count does not
affect the regression score. The same logic applies to the intercept coefficient. It too differs
significantly from zero since it is more than 8 standard errors from zero. (Note: since regression
coefficients follow the t-distribution, researchers actually have adopted a simple rule of thumb:
any coefficient that is larger than twice its standard error is 'statistically significant' at the .05
level.) At the bottom of the 'coefficients' table, you also see information about the 95%
confidence intervals. What do they tell us? We are 95% confident that the true population
intercept coefficient lies somewhere between 3.16 and 5.12 and the true population slope
coefficient lies somewhere between .758 and .971.

Finally, we are ready to interpret the regression equation. Again, our best estimate of the
population regression equation is:

Y(hat) = 4.14 + .865X, with both estimated coefficients differing from zero.

In our example, the observed range of the independent variable actually includes zero. Thus, there
are cancer patients who do not report any of these symptoms. This regression equation now tells
us, that we expect such cancer patients to have, on average, a depression score of 4.14 (equal to
the intercept since X = 0). Now, let's look at a cancer patient who reports 10 symptoms. We
predict that such a patient will typically have a depression score of 12.79 (= 4.14 + .865 x 10).
Thus, we see that the slope coefficient provides us with the most important information: it shows
us by how much the dependent (depression) score changes for a a change in the independent
(symptom) score by one unit.

**(C) Multiple Regression:**

The following tables represent results from a multiple regression analysis. It has again only one
dependent variable, but contains more than 1 independent or predictor variables. The dependent
or outcome variable is again the CES-D depression scale. This time, however, we have added
patient sex (1=female, 0=male), patient age (in years, ranging from 64 to 98), number of
comorbid conditions (a count of chronic diseases, such as arthritis, diabetes, etc., ranging from 0
to a maximum of 9), the physical functioning subscale of the SF-36 (ranging from 0 = complete
immobility to 100 = 'perfect' functioning) and the patient symptom count (0-27). The first table,
again shows the descriptive sample statistics for all these variables.

Descriptive Statistics

Mean |
Std. Deviation |
N | |

PCESD depression score (patient) | 10.84 | 7.65 | 746 |

PSEX2 patient sex (recoded) | .47 | .50 | 746 |

PAGE patient age (in years) | 72.20 | 4.99 | 746 |

PCOMORBI Patient Comorbity Count | 2.71 | 1.68 | 746 |

MOSPF: Pt. Phys.Functioning | 63.98 | 28.16 | 746 |

PSYMCNT count of reported symptoms | 7.87 | 4.42 | 746 |

Notice that, except for patient sex, all of the independent variables are continuous variables with
meaningful, interpretable means and standard deviations. (E.g., the mean physical functioning
score in the sample is 63.98 with the average deviation around the mean being 28.16, etc.) The
only variable that is NOT an interval level variable is sex. Regression can accommodate such
nominal-level categorical variables, if they have only two categories and are 'dummy-coded', that
is to say, one category takes on the value '1', the other the value '0'. (Which one is coded one or
zero is arbitrary.) In this special case, the mean of .47 simply indicates that 47% of the sample are
female (since we coded 1 = female). As we will see below, regression coefficients of such dummy
variables also have simple interpretations.

Model Summary

Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |

5 | .520 | .271 | .266 | 6.5565 |

a Predictors: (Constant), PSYMCNT count of all reported symptoms, PSEX2 patient sex (recoded),
PAGE patient age (in years), PCOMORBI Patient Physical Comorbity (Count) , MOSPFCU SF-36:
Pt.Physical Functioning-(CU)

This 'Model Summary' table looks exactly like the one for simple regression, and the statistics in
it have exactly the same interpretation. This time, we have a model with five independent or
predictor variables of the dependent variable, which is depression. The multiple 'R' again
indicates size of the correlation between the observed outcome variable and the predicted
outcome variable (based on the regression equation). R^{2} or the coefficient of determination again
indicates the amount of variation in the dependent scores attributable to ALL independent
variables combined, with the 'adjusted R^{2 }' again giving an estimate for the population value of
R^{2}. Finally, the 'standard error of estimate' gives us an indication of the average spread of
observed depression scores around the predicted regression line. When you compare these results
to the same table from the simple regression above, you will see great similarities, except that 'R-square' is now a bit larger (.271 instead of .244) and the standrad error of the estimate is slightly
smaller (6.5565 instead of 6.7245). What this means is that the additional independent variables
allow us to predict cancer patient depression a little bit better (now, we can account for 27.1% of
the variation in depression scores instead of 24.4%). As a result, the average 'error' or
unexplained variation around the regression line is a bit smaller.

ANOVA

Model | Sum of Squares | df | Mean Square | F | Sig. | |

1 | Regression | 11803.415 | 5 | 2360.683 | 54.915 | .000 |

Residual | 31811.282 | 740 | 42.988 | |||

Total | 43614.697 | 745 |

a Predictors: (Constant), PSYMCNT count of all reported symptoms, PSEX2 patient sex (recoded), PAGE patient age (in years), PCOMORBI Patient Physical Comorbity (Count) , MOSPFCU SF-36: Pt.Physical Functioning-(CU)

b Dependent Variable: PCESD depression score (patient)

The ANOVA table again has the same interpretation as the one for a simple regression. It
decomposes the total sum of squares into regression (=explained) SS and residual (=unexplained)
SS. The ratio of regressionn SS over total SS, or 11,803.415/43,614.697 = .271 is, of course,
identical to R^{2}. Finally, the F-test is again the ratio of the average deviations of the regression line
from the sample mean (mean regression SS) and the squared deviations from the regression line
(= mean residual SS). Thus it represents the relative magnitude of explained to unexplained
variation. The F-statistic is highly significant (p=.000), thus we reject the null-hypothesis that
none of the independent variables predicts the depression scores in the population.

Coefficients

Model | |||||||

1 | |||||||

(Constant) | PSEX2 patient sex (1=female, 0=male) | PAGE patient age (in years) | PCOMORBI Patient Physical Comorbity | MOSPF: Pt.Physical Functioning | PSYMCNT count of all reported symptoms | ||

Unstandardized Coefficients | B | 8.978 | 1.514 | -.02411 | .07200 | -.04414 | .701 |

Std. Error | 3.666 | .493 | .049 | .154 | .010 | .062 | |

Standardized Coefficients | Beta | .099 | -.016 | .016 | -.162 | .405 | |

t | 2.449 | 3.069 | -.489 | .467 | -4.474 | 11.271 | |

Sig. | .015 | .002 | .625 | .640 | .000 | .000 | |

95% Confidence Interval for B | Lower Bound | 1.781 | .545 | -.121 | -.230 | -.064 | .579 |

Upper Bound | 16.175 | 2.483 | .073 | .374 | -.025 | .824 |

a Dependent Variable: PCESD depression score (patient)

Again, we use the 'coefficients' table to construct the regression equation. It is:

Y = 8.978 + 1.514 X_{1} - .024 X_{2} + .072 X_{3} - .044 X_{4} + .701 X_{5},

where X_{1} = Patient Sex, X_{2} = Patient Age, X_{3} = Count of Patient Comorbidities, X_{4} = Patient
Physical Functioning Score, and X_{5} = Symptom Count.

Do any of these regression or slope coefficients differ significantly from zero? We answer this question by looking at the magnitude of the coefficients in relation to their standard errors. The row of t-values gives us the ratio of the regression coefficients to their standard errors; for instance, the t-value for Patient sex is 3.069 which equals 1.514/.493. What does this tell us? Right below the t-values, you see the p-values or significance values associated with the t-values. This gives us all the pieces we need to draw statistical inferences about the population. We start with null-hypothesis for each independent variable, namely, that it has no effect on the outcome variable (here: depression). This is the same as saying that the particular regression coefficient we are focusing on is assumed to be zero in the population from which the sample is drawn. Now we start again with our familiar assumption: let us say, we assume that patient sex has no effect on depression. In that case, the true regression coefficient in the population associated with the sex variable ought to be zero. In our sample, however, we observe a sex regression coefficient of 1.51. How likely is that to happen as a result of mere sampling chance? The answer is that a coefficient of 1.514 is actually more than 3 standard errors larger than zero, and that occurs by chance in only 2 out of a 1000 samples drawn from this population. Thus we reject the null hypothesis (conventionally, as long as p<.05): we are quite confident that sex does have a real effect on depression. The same logic applies to all the other regression coefficients. As you can see, two of them, the coefficients for patient age and the number of comorbid conditions do NOT differ significantly from zero. In fact, there is a more than 60% likelihood that each of these observed sample coefficients are the result of sampling chance. Thus, we don't take them to be 'real' and conclude that the population regression coefficients for these variables equal zero.

This simplifies our regression equation to:

Y = 8.978 + 1.514 X_{1} - .044 X_{4} + .701 X_{5}, since only X_{1}, X_{4} and X_{5} are significant predictors of
depression.

If we now substitute particular values for these variables, we can compute the predicted
depression score. For instance, a female cancer patient with moderate physical functioning (score
of 60) and 5 reported cancer symptoms will, on average, have a depression score of 11.357 (=
8.978 + 1.514 x 1 - .044 x 60 + .701 x 5).

The 'coefficients' table also contains the standardized regression coefficients. These are useful
only in multiple regressions with at least two independent variables. The problem with the
unstandardized coefficients is that they are measured in different units of measurement (an
increment of one unit means, for instance, the difference between m\female and male in the sex
variable, one additional symptom in the symptom variable, or one unit score on the physical
functioning score). With these different units of measurement, we can not directly answer the
question, which of these variables has the strongest effect on depression because we are
comparing 'apples' and 'oranges'. This answer is given by the standardized coefficients, often also
called 'betas'. They tell us by how many standard deviations the dependent variable changes for a
change in the independent variables by one standard deviation. Using this common yardstick, we
easily see that the reported symptoms have the strongest effect on depression.

**(D) Hierarchical Regression Models:**

In this section we briefly discuss hierarchical ordering of (groups of) independent variables in multiple regressions. If we include the same independent variables in our regression model, it does not matter which order we enter them into the equation, the regression or slope coefficient will be exactly the same. (They only change if we add or omit an independent variable, because the multiple regression procedure also adjusts all estimates for the presence of the other variables in the equation). However, even though the regression coefficients do not change as long as we have the same independent variables, the order in which they are entered does affect the amount of variation that is 'explained' by (or attributed to) the independent variables. Except for the limiting case, where all correlation among the independent variables are zero, it is actually impossible to attribute variation in the dependent variable uniquely to each of the independent variables. When there is overlap (or correlated independent variables), part of the explained variation in the dependent variable is explained jointly by two of more independent variables. In this case, regression analysis (as well as analysis of variance) attributes the joint variation always to the variables entered earlier into the equation. The result is, that changing the order of entering changes the amount of variation attributed to various independent variables. The following two

summary tables show that, they are from the same multiple regression model as before. Only that this time, variables are entered block-wise or in groups, with age and sex entered first, followed by comorbid conditions and physical functioning and, finally, by the symptom count. Concentrate on two columns, the one for R square and R square change, they tell the essential story. In the first table, the model attributes and additional 12.9% of variation in depression scores to comorbid conditions and physical functioning and another 12.5% to symptoms. Of course, altogether, all 5 variables (with the demographics included) account for 27.1% of the variation in depression scores. Now look at the second sumary table. The only change made was that the symptom variable was entered BEFORE the comorbid conditions and physical functioning. Now, 23.3% of the variation in depression is attributed to symptoms and only an additional 2.2% to comorbid conditions and physical functioning. Clearly, the way in which one presents one's tables may sway the uninitiated reader in one direction or another. Just remember the most important result: except in the case of uncorrelated independent variables or factors (which usually occur only in clinical trials where factors are unrelated by design as a result of random assignment), it is NOT possible to attribute variation in the outcome variable uniquely to one or the other independent variable, and research reports using regression of analysis of variance models on observational models should generally NOT emphasize 'amounts of variation attributed o one or the other independent variable'.

Model Summary

R | R Square | Adjusted R Square | Std. Error of the Estimate | Change Statistics | |||||

Model | R Square Change | F Change | df1 | df2 | Sig. F Change | ||||

1 | .128 | .016 | .014 | 7.5990 | .016 | 6.151 | 2 | 743 | .002 |

2 | .381 | .145 | .141 | 7.0922 | .129 | 55.988 | 2 | 741 | .000 |

3 | .520 | .271 | .266 | 6.5565 | .125 | 127.028 | 1 | 740 | .000 |

a Predictors: (Constant), PAGE patient age (in years), PSEX2 patient sex (recoded)

b Predictors: (Constant), PAGE patient age (in years), PSEX2 patient sex (recoded), PCOMORBI Patient Physical Comorbity (Count) , MOSPFCU SF-36: Pt.Physical Functioning-(CU)

c Predictors: (Constant), PAGE patient age (in years), PSEX2 patient sex (recoded), PCOMORBI
Patient Physical Comorbity (Count) , MOSPFCU SF-36: Pt.Physical Functioning-(CU), PSYMCNT
count of all reported symptoms

Model Summary

R | R Square | Adjusted R Square | Std. Error of the Estimate | Change Statistics | |||||

Model | R Square Change | F Change | df1 | df2 | Sig. F Change | ||||

1 | .128 | .016 | .014 | 7.5990 | .016 | 6.151 | 2 | 743 | .002 |

2 | .499 | .249 | .246 | 6.6435 | .233 | 230.090 | 1 | 742 | .000 |

3 | .520 | .271 | .266 | 6.5565 | .022 | 10.907 | 2 | 740 | .000 |

a Predictors: (Constant), PAGE patient age (in years), PSEX2 patient sex (recoded)

b Predictors: (Constant), PAGE patient age (in years), PSEX2 patient sex (recoded), PSYMCNT count of all reported symptoms

c Predictors: (Constant), PAGE patient age (in years), PSEX2 patient sex (recoded), PSYMCNT count
of all reported symptoms, PCOMORBI Patient Physical Comorbity (Count) , MOSPFCU SF-36:
Pt.Physical Functioning-(CU)