Summer 1999
Faculty
H.S. Teitelbaum, D.O., Ph.D., M.P.H.
Department of Internal Medicine
B319 W. Fee Hall
Telephone: 3553361 (Office) 3321881 (Home)
Office Hours: Monday 3:007:00.
Course Format:
This is a two (2) credit course which consists of:
1. Lectures.
2. Required text readings.
3. Readings from the scientific, medical and other journals.
4. Two examinations.
5. An electronic literature search.
The instructional mode will be primarily lecture with class participation
in discussing the various articles as the need arises.
Course Goals:
This is an introductory course. It assumes a working knowledge of arithmetic
and elementary algebra and a college reading level. The main emphasis will
be on the interpretation of medical literature both from a substantive
perspective as well as a statistical perspective. In practice, most computational
work of any complex nature is done by computer packages, hence only rudimentary
computations will be required. The major emphasis will be placed on the
use of the statistics, the appropriateness of the statistic and the understanding
of the most commonly appearing statistics used in the professional and
popular literature. Published articles will be used to supplement the topics
discussed and reference will also be made to portions of the required text.
Required Text :
A Study Guide to Epidemiology and Biostatistics. Morton, Richard
F., Hebel, J. Richard, McCarter, Robert J., Fourth Edition, Aspen Publication,
1996.
Evaluation and Grading:
There will be one (1) midterm examination and one (1) final examination.
The material will be from lectures and readings. The format of the examinations
will be primarily, but not exclusively, multiple choice answers. There
will also be computational problems and short essay. The midterm will be
weighted 40% and the final weighted 60% of the final grade. An average
of 70% will be the minimum passing level for the course. An electronic
search must also be submitted to fulfill the course requirements.
Remediation :
For students who do not reach the minimum passing level, remediation
can be accomplished by:
1. Taking and passing another examination covering the entire course;
or
2. Reenrolling and subsequently passing OSS 512 the next time it is
offered; or
3. Enrolling in an independent study course which meets equivalent objects
of OSS 512. The determination of equivalency is at the discretion of the
instructor of record.
N.B. The date of any remediation examination will be at the discretion
of the instructor of record.
Expectations 1
Basic Terminology 4
Sensitivity and Specificity 30
Levels of Measurement 39
Prospective Studies 62
CaseControl Studies 65
Establishing a Statistical Association or Statistical Relationship 68
Sample Size 85
Readings 99
Supplementary Explanations 100
Homework Set 1 H  1
Homework Set 2 H  4
Homework Set 3 H  7
Homework Set 4 H  10
Suggested Answers to Homework Set 1 A  1
Suggested Answers to Homework Set 2 A  5
Suggested Answers to Homework Set 3 A  6
Suggested Answers to Homework Set 4 A  16
Index I  1
Expectations
1. Become facile with the terms risk, incidence (and its cognates),
case fatality, survival, and prevalence as well as independent and dependent
variables. Discern the correct usage of these terms when presented with
articles using these terms. Be able to calculate these epidemiological
measures.
2. Be able to define common indices of medical and health care.
3. Be able to compute common measures of central tendency.
4. Identify indices of dispersion and apply them in reading articles.
5. Be able to identify the correct levels of measurement associated
with common medical variables.
6. Be able to identify a casecontrol study when presented with an article
or an abstract of an article.
7. Become familiar with the advantages, disadvantages and biases associated
with casecontrol studies.
8. Be able to critique a casecontrol study.
9. Be knowledgeable about the strict criteria of causation as used in
medicine.
10. Be able to compute an odds ratio when presented with a 2x2 table,
a description of a medical situation and/or an article.
11. Be able to identify a prospective study when presented with an article
or an abstract of an article.
12. Become familiar with the advantages, disadvantages and biases associated
with prospective studies.
13. Be able to critique a prospective study.
14. Be able to compute a relative risk for a prospective study.
15. Be able to interpret tables of association from retrospective and
prospective studies.
16. Be able to compute and interpret sensitivity, specificity and predictive
values from tables, articles or contrived problems.
17. Be able to discern if the sample size a study uses is sufficient
to support the conclusions the author proffers.
18. Be able to deduce the null and alternative hypotheses from an article
or paper case.
19. Be able to define statistical significance and apply it to articles.
20. Be knowledgeable about the framework for statistical testing.
21. Be able to interpret confidence intervals.
22. Be knowledgeable about the correct use of common statistical tests.
23. Be able to identify the common disease states in the United States
and Michigan.
24. Be able to solve elementary probability problems.
Epidemiology
What it is!
Prevalence (Point)
Generally speaking
1. Existing Cases
2. In a defined population
3. A time period needs to be specified (it is many times implied like one year or on a given day).
4. Formula for computation:
where C = Existing cases in the time period and N
= Number of persons in the population.
Incidence
Generally speaking
1. NEW cases
2. In a defined population
Incidence Rate
1. New Cases
2. In a defined population
3. A time period must be specified.
4. Denominator is PERSON TIME.
5. Formula for Computation:
where A = Number of NEW cases and PT = total number
of (weeks, months, years, etc.) that are observed.
Cumulative Incidence or Risk
1. A probability
2. New Cases
3. In a defined population
Where A = the number of NEW cases of disease and
N represents the number of people in the population under study. Unless
specifically stated the time period is usually implied as 1 year. BE CAREFUL
to note the time period.
Example:
1. How many female medical students are currently
pregnant?
2. How many medical students become pregnant during
the first year of medical school?
There is a relationship between PREVALENCE and RISK
(Cumulative incidence)
Sometimes this will be written as:
P = I x D.
Think what this equation means in illness:
Vaccination
Antibiotics
Isolation
Can Prevalence decrease yet Incidence increase?






What is measured  Probability of disease  Percent of population with Disease  Rapidity of disease occurrence 
Units  None  None  Cases/persontime 
Time of disease diagnosis  Newly diagnosed  Existing  Newly diagnosed 
Synonyms  Cumulative Incidence 

Incidence Density 
Example 1:
Consider the following example:
A 60 year old white male refinery worker recently
developed shortness of breath and nosebleeds. On physical examination,
he was pale and his pulse was elevated at 110 beats per minute. His hematocrit
was 20% (low), indicating anemia, his white blood cell count was 20,000/L
(elevated), his platelet count was 15,000/L (low), and examination of his
peripheral blood smear revealed atypical myeloblasts. He was hospitalized
for suspected acute myelocytic leukemia. The diagnosis was confirmed and
chemotherapy was started. About 3 weeks after his temperature rose abruptly
to 102F and his granulocyte count dropped to 100/L (abnormally low). Cultures
were taken of his blood and urine since no apparent source of infection
was evident.
Since we don't have the cultures back SHOULD ANTIBIOTICS
BE STARTED NOW, AND IF SO AGAINST WHAT ORGANISM?
A strategy to answer the above question may be:
1. Are cancer patients prone to develop infections
(i.e. bacterial) that are treatable by antibiotics.
2. Are patients with a profile similar to the above
patient more or less likely to develop a NOSOCOMIAL infection?
3. If they are, what are the most likely organisms?
4. What are the antibiotics available that are best
to treat the most probable organism?
A literature review showed a study with over 5000
patients, who had cancer and developed a nosocomial infection. The DEFINITION
of a case was a culture proven infection, beginning 48 hours after admission
and occurred no more than 48 hours after discharge. Results:
5,031 patients
596 cases meeting the definition.
Find the RISK:
Conclusion:
Our patient had a fever and granulocytopenia. Thus
if we can find a subset of the population studied that has these qualities
it may refine our RISK estimate. A study showed the following information:
1,022 cancer patients were studied.
530 developed a clinically documented bacterial infection.
WHAT IS THE RISK NOW?
Conclusion relative to treatment with an antibiotic?
Recall your microbiology and your clinical correlations
and determine the most likely organism. In the rare event that you can't
recall, you can again use the literature to help out. One study showed
a group of patients at an outpatient clinic and showed the following:
96 patients were chosen for study, since they had
no apparent skin infection.
62 patients were positive for Staphylococcus aureus.
WHAT IS THE PREVALENCE OF S. Aureus?
The 5,031 patients remained under observation for
a total 127,859 patientdays (or an average of 25.4 days). A total of 596
patients developed an infection that met the definition for a hospital
acquired infection.
WHAT IS THE INCIDENCE RATE FOR THIS POPULATION?
Survival:
Probability of remaining alive for a specified period
of time.
Calculated by:
Where S = survival
A = number of newly diagnosed cases
D = number of deaths observed in the newly diagnosed
cases after following them for a specified period of time.
For acute leukemia the 5 year survival rate has been
reported to be only 9% on average. For those younger than 65 years of age,
the survival rate is approximately 14%, for those 65 years or older the
figure is only 2%. We calculate these figures by means of a life table.
Case Fatality
The proportion of people who die from a given disease.
It is calculated by:
Where D = number of deaths
A = Number of diagnosed patients
Below are some of the commonly occurring measures
of Natality ( measures concerning birth related events), Morbidity
(measures concerning illness related events) and Mortality (measures
concerning Death related events)
or ratio 
(x) 
(y) 
per number at risk 
Death Rate  Total number of deaths reported during a specified time interval.  Estimated midinterval population.  per 1,000
per 10,000 per 100,000 
Birth rate  Number of LIVE births reported during a specified time interval.  Estimated midinterval population.  per 1,000 
Fertility rate  Number of live births reported during a specified time interval from mothers 1544 years.  Estimated number of women in age group 1544 at midinterval.  per 1,000 
Low birth weight ratio (This is really a Proportion)  Number of live births under 2,500 grams (51/2 lb.) during a specified time interval  Number of LIVE births reported during the same time interval.  per 100 
Incidence Rate  Number of NEW cases of a specified disease reported during a specified time interval.  Estimated midinterval population at risk.  per 100
per 1,000 etc. 
Attack Rate WARNING! This is really a proportion  Number of new cases of a specified disease reported having a specified time interval.  Susceptible population at risk during the same time interval.  per 100
per 1,000 etc. 
Point prevalence ratio
Period prevalence ratio 
Number of current cases of a disease existing
at a specified point in time.
Number of current cases of a specified disease during a specified time interval. 
Estimated population at risk at the
same point in time.
Estimated midinterval population at risk. 
per 100
per 1,000 per 10,000 etc.
per 100 per 1,000 etc. 
Proportionate mortality ratio (PMR)  Number of DEATHS assigned to a SPECIFIC cause.  TOTAL number of deaths from ALL causes reported during the same interval.  per 100 or 1,000 
Infant Mortality Rate  Number of deaths UNDER 1 yr. of age reported during a specified time interval, usually a calendar year.  Number of LIVE births reported during the same time interval.  per 1,000 
Fetal Death rate  Number of fetal deaths of 28 weeks or more gestation reported during a specified time interval usually a calendar year.  Number of fetal deaths of 28 weeks or more gestation reported during the same time interval PLUS the number of live births occurring during the same time interval.  per 1000 
Fetal death RATIO (often confused with the above)  Number of fetal deaths of 28 weeks of more gestation reported during a specified time interval.  Number of live births reported during the same time interval.  per 1,000 
Causespecific death rate  The number of deaths assigned to a SPECIFIED CAUSE during a specified time interval.  Midinterval population.  per 100,000 

Additional terms used in the medical literature.
Ratio
Proportion
Rate
Some things from your past:



If the value of the Fraction is GREATER than 1, then
the NUMERATOR is ____________________ the DENOMINATOR.
If the value of the Fraction is LESS than 1, then
the NUMERATOR is ____________________ the DENOMINATOR.
If the value of the Fraction is equal to 1, then
the NUMERATOR is ____________________ the DENOMINATOR.
Consider the following statement:
Of the people who turned in the Biographical Sheet
at the first lecture, __ were men and __ were women.
What is the ratio of MEN to WOMEN?
What is the ratio of WOMEN TO MEN?
What is the proportion of the responders were
WOMEN?
Consider the following statement:
The crude death rate in Florida is 10.9, the crude
death rate in Alaska is 4.4. Therefore, if you want to live longer, go
to Alaska.
We must be able to account for variables that
effect the DEPENDENT variable of a study. The phrase you will see in the
literature is ADJUSTED FOR. The most common variables ADJUSTED FOR are
age, sex, and race. This technique SHOULD be used anytime you are COMPARING
two groups that differ on a variable (like age) that is related to your
outcome.
There are several techniques that one can use
to ADJUST FOR a variable. One is a statistical
technique called covariance, which we will not discuss; another is called
DIRECT ADJUSTMENT and is used when reporting certain indices of health.
The following examples illustrate the difference
between unadjusted (crude) and adjusted figures and how a direct standardization
is done.





Infants Born 


Birth Weight  N in 1000s 



N in 1000s 



<1500g 








1500  2499g 








>2499g 








Total 






Crude Infant mortality rate for Country A =
Crude Infant mortality rate for Country B =
Is this reasonable  or is there something that
confuses the comparison?
Weight Specific Rate for developed country:
<1500g = (870/20,000) = .0435 = 43.5 per 1000
1500  2499 = (480/30,000) = .016 =
>2499 =
We now need a common linkage of WEIGHT between the
two counties. For simplicity let us use the weight distribution of the
Country A. The question now becomes:
If the developing country had the same weight distribution
as the developed country, how would the death rates compare? THIS COMMON
DISTRIBUTION ALLOWS FOR AN "EQUAL" COMPARISON.
Adjusted Rate =
{(62 x 20) + (20 x 30) + (9 x 150)} / (20 + 30 +
150)
= 3190/200,000 = .01595 = 15.95 per 1000
Table 1. Calculation of the Ageadjusted Mortality Rates from all Causes by the Direct Method: United States, 1950 and 1960.  
Mortality from All Causes per 100,000 Population 
Standard Population: Total U.S. Enumerated Population per 1,000,000  Expected Number of Deaths that would Occur in Standard Population at Rates in:  
Age Group
(Years) 
1950
(1) 
1960
(2) 
1940
(3) 
1950
(4) 
1960
(5) 
<1  3,299.2  2,696.4  15,343  506.2  413.7 
14  139.4  109.1  64,718  90.2  70.6 
514  60.1  46.6  170,355  102.4  79.4 
1524  128.1  106.3  181,677  232.7  193.1 
2534  178.7  146.4  162,066  289.6  237.6 
3544  358.7  299.4  139,237  499.4  416.9 
4554  853.9  756.0  117,811  1,006.0  890.7 
5564  1,901.0  1,735.1  80,294  1,526.4  1,393.2 
6574  4,104.3  3,822.1  48,426  1,987.5  1,850.9 
7584  9,331.1  8,745.2  17,303  1,614.6  1,513.2 
85+  20,196.9  19,857.5  2,770  559.5  550.4 
Total death rate all ages  963.8  954.7       
Total population      1,000,000     
Total expected number of deaths        8,414.5  7,609.7 
Ageadjusted death rate per 100,000        841.45  760.97 
In trying to determine the etiology of illness (diagnosis)
it is many times necessary to use the laboratory for additional information.
This is so common that many practicing physicians and other health professionals
that the tests for granted. Two points MUST be kept in mind:
1. The laboratory test can (should) only CONFIRM
the diagnosis.
2. BASIC ASSUMPTION is that the lab test can be trusted.
For your own benefit and that of your patient check out:
a. Is anything going on that would alter the lab
test:
1. Medications that conflict with the lab test?
2. Comorbidity that will interfere with the lab test?
3. Proper procedure followed by the PATIENT, PHYSICIAN,
LAB PERSONAL?
b. Is the test valid?
Validity is loosely defined as "appropriate for the task". That is to say, is the right test being ordered for the question you have in mind? Should you do a brain biopsy for suspected diabetes? If you determine that the test is appropriate, can you trust the results? This question of trust is what makes the notion of sensitivity and specificity important.
The table below is the reference point for all the
terms that follow:
Disease  
+    
Test  +  a  b  a + b 
  c  d  c + d  
a + c  b + d  a + b + c + d 
1. Sensitivity:
The ability to correctly identify those who have the condition or disease. This translates statistically as THE PROPORTION OF THOSE WITH THE DISEASE WHO TEST POSITIVE.
2. Specificity:
The ability to correctly identify those who DO
NOT HAVE the condition or disease. This translates statistically
as THE PROPORTION OF THOSE WHO DO NOT HAVE THE DISEASE WHO TEST NEGATIVE.
3. Prevalence:
The proportion of those people in the "at risk" population
who currently have the disease. This translates statistically as
4. Positive Predictive Value:
The proportion of those people WHO TEST POSITIVE
who actually have the disease. This translates statistically as
5. Negative Predictive Value:
The proportion of those people WHO TEST NEGATIVE
who are actually free of disease. This translates statistically as
6. False Positive Rate:
The proportion of those people WHO ARE DISEASE FREE,
who have positive tests. This translates statistically as
7. False Negative Rate:
The proportion of those people WHO ARE SICK, whose
lab test is negative.
Steps to take in determining indices of diagnostic tests
Target Disorder  



Lab Test  + 



_ 






STEP:
1 = Arbitrary Sample Size
2 = Prevalence x (1)
3 = (1)  (2)
4 = Sensitivity x (2)
5 = Specificity x (3)
6 = (2)  (4)
7 = (3)  (5)
EXAMPLE: Assume you are looking for a disease
that has a PREVALENCE of 2% in the population of interest. The test you
are going to use has a SENSITIVITY of 90% and a SPECIFICITY of 95%. What
is the PPV and NPV of the test?
Points to Help Interpret Test Results
1. Sensitivity and Specificity do not depend
on Prevalence of Disease.
2. Predictive value positive and Predictive Value
Negative DO depend on Prevalence of disease.
3. If the prevalence of disease is low in your patient
population (rare), then most of your predictive positive test results will
be FALSELY POSITIVE.
4. If the prevalence of disease is high in your patient
population, then most of your predictive positive test results will be
TRUE
POSITIVES.
Final Comments
While it is desirable to be able to calculate the
various indices of a test it is critical that you understand their application
to a medical or health situation. As physicians, we want to RULE IN or
RULE OUT disease states or conditions. That is to say, we want to say to
someone YES, you have had a heart attack or NO you did NOT have a heart
attack. When presented with a patient you do a history and physical (to
the extent possible) and while doing so you begin to generate a list of
possible etiologies for the signs or symptoms elicited from the patient.
This is loosely called generating a differential diagnosis. As you go along
you begin to eliminate explanations from the list. When you have reduced
the list as far as you can you then appeal to ancillary sources  like
the laboratory or consultants or other medical students etc.. The goal
is to reduce the list so that fewer and fewer possible explanations for
the signs, symptoms or hypothesized disease state of the patient remain.
This elimination process is called
RULING OUT etiologies. Those
that remain are ruled in and must be investigated thoroughly. Since
we do not know for sure what the disease is we might seek additional laboratory
results to help. WE THUS WILL BE IN A POSITION TO KNOW ONLY THE LAB TEST
RESULTS. If the lab test is POSITIVE we would like to say you have disease
X. The statistical translation of this process is to determine the probability
that someone with a positive test AND that the person also has the suspected
disease. This is the positive predictive value of the test. What
we have to agonize over is the fact that UNFORTUNATELY there will be someone
who tests positive for the disease and yet WILL ACTUALLY NOT HAVE THE DISEASE
IN QUESTION. We will erroneously tell someone YOU HAVE THE DISEASE when
in fact they do not. They were falsely positive or FALSEPOSITIVES.
Similarly, if their test results were NEGATIVE, we would like to declare
them to be FREE OF DISEASE. The statistical definition of this situation
would be the ascertainment of the probability that one who tests negative
AND is really free of disease is called the negative predictive value
of the test. Again, we have to agonize over the fact that someone who tests
negative WILL ACTUALLY HAVE THE DISEASE. These people are falsely negative
or FALSE NEGATIVES. In the 2 x 2 table given in lecture and which precedes
this explanation, the FALSE POSITIVES are indicated by the letter b.
The FALSE NEGATIVES are indicated by the letter c. The cell
identified as a is usually called the TRUE POSITIVES since
these individuals have POSITIVE test results AND are indeed diseased. Similarly,
the cell identified as d is termed the TRUE NEGATIVES since
these people have NEGATIVE test results and are indeed free of disease.
To appreciate these terms let us consider the disease
of Acquired Immunodeficiency Syndrome (AIDS). The ultimate definition of
the disease rests on certain lab tests as well as physical presentation.
The precursor of the disease rests on a laboratory test which detects the
presence or absence of antibodies to the virus itself. Since the detection
of the antibody is cheaper than growing the virus and the fact that the
one develops an antibody only after exposure to a virus, one infers that
if one is antibody positive then one has been exposed to the "AIDS VIRUS"
(which is more appropriately referred to as HIV  Human Immunodeficiency
Virus). If I tell someone YOU are HIV positive, based on a positive lab
test, but the person is really FALSELY POSITIVE what are the consequences?
We know that the disease engenders panic and many times despair in the
individual as well as the public. There are consequences for the persons
sexual partner, potential childbearing, employment, insurance, longterm
plans, and a plethora of social as well as medical disasters. On the other
hand, if the persons test result is negative, and I tell the person there
are not infected, but the person is FALSELY NEGATIVE, what are the consequences?
Certainly false hope is engendered. The person may donate blood, engage
in risky behavior, and from a social point of view, become an unknowing
risk to society. Thus the consequences of a testing situation must be evaluated
before a decision to run a lab test is done. While I have chosen a real
situation, it is admittedly dramatic. One forgets that the same thought
processes must be thought of even when the test is, as some people say,
routine. There is always the risk to the patient that must be weighed against
the benefit to the patient. Will the test CHANGE YOUR TREATMENT PLAN? If
the answer is NO, then you should not do the test. If the answer is YES,
then at least you have a context within which to interpret the test. There
is much talk about defensive medicine these days. You will here repeatedly,
I did the test for legal reasons, not for medical reasons. Studies done
to evaluate this claim estimate that only about 10% of the lab work done
can be termed defensive. The rest may really be unnecessary. You will be
taking a course in medical jurisprudence later in your stay here, ask the
instructor about this figure and advice on how to proceed.
Cutoff Point set too low
Cutoff point of greater sensitivity
Cutoff point of greater specificity
True positives for cutoff point X
The term implies something that changes. How much is the change? This
implies something that is:
1. Measurable
2. Discernible (or Observable)
Independent Variable:
Those features which describe or discern individuals or groups of individuals
prior
to the start of the study. AGE, SEX, PRIOR LAB VALUES, EXPOSURE TO A DRUG,
ABSENCE OF EXPOSURE, ETC. One usually sees this defined in statistical
texts as those variables under the control of the investigator.
This is true if you understand that this means YOU can choose how to categorize
or pick patients for a study.
Dependent Variable:
That which serves to assess the OUTCOME of the study.
Synonyms for Independent Variables  predictors, precursors, "cause"
Synonyms for Dependent Variables  predicted, outcome, result, consequent,
effect
Example:
1) Married men will incur more physician visits than single men.
Independent variable ___________________________________________
Dependent variable ____________________________________________
Marital Status > # of physician visits
# of physician visits > Marital status
2) What is the relationship between menstrual cycle phase at the time
of surgery and the onset of the first postoperative menses?
Independent variable __________________________________________
Dependent variable ___________________________________________
1. Nominal (Attribute, Qualitative)
Naming, categorical
a. Dichotomous
b. Polychotomous
2. Ordinal (Qualitative)
Ranked or ordered along some property of the variable.
Example: Stages of Cancer
Stage I
Stage II
Stage III
Example: Satisfaction with Personal Physician.
1. Very Satisfied
2. Somewhat Satisfied
3. Somewhat Unsatisfied
4. Very Unsatisfied
3. Interval (Quantitative, Continuous)
"Ratio" Interval + a real (or meaningful) zero. This course will
treat ratio data as interval data. We will also, from here on refer to
ratio data as results formed when one number is divided by another.
For purposes of this course and for general medicine, INTERVAL
will be the most precise form of measurement we will need, You can think
of this as forming an interval by subtracting the value of one end of a
spectrum from the opposite end of the spectrum in question. Consider the
variable temperature. A temperature of 101^{o}F yesterday and a
current temperature of 98^{o}F today gives and an interval of 3
degrees. The INTERVALS between successive measurements are EQUAL. There
are two things implicit in this level of measurement:
a. The variable is continuous. That is to say, we can get as
precise as we want in the measurement of the variable.
b. An interval of a given length is interpretable no matter where it
is on the scale.
Example: Temperature, Height, Weight.
Example: Exercise levels.
0. No exercise
1. Moderate exercise, no sweating.
2. Exercise to the point of sweating.
3. Strenuous exercise to the point of sweating 30 minutes a day.
4. Strenuous exercise for at least 1 hour per day.
Teaching point to remember for LEVELS OF MEASUREMENT
1. All categories of measurement must be mutually exclusive.
2. All categories of measurement must be jointly exhaustive.
3. Levels of measurement is a "oneway street".
Independent Variables should temporally and logically precede Dependent
Variables
Measures of Central Tendency
A single number that best represents a group of observations.
For the definitions below consider the following example:
A group of children are brought to an emergency room after a flood.
Their ages are: 1,1,1,6,4,6.
MODE
The most frequently occurring number in a series of numbers. There may
be NO mode  all numbers occurring only once or there may be several modes
 more than one number occurring with greater frequency than all the other
numbers.
The mode is generally used for nominal data. It is the quickest calculation
to make.
Calculate by noting the frequency with which each value in a series
occurs.
In the example above, the MODE = 1.
MEDIAN
This is a number which divides the frequency of observations into two
equal parts. This number may be an actual observation, or a contrived one.
The median is generally used for ordinal data. It is easy to calculate
but I suggest the following approach:
1. Arrange the numbers in ORDER from Low to High.
2. If the total number of observations is EVEN
add the two middle numbers together and divide by 2.
If the total number of observations is ODD
the median is the middle number.
Consider the example above:
Step 1. Order the numbers from Low to High
1,1,1,4,6,6
Step 2. Is there an ODD or EVEN number of observations? There are six
observation so we add the two middle numbers together ( 1 + 4 ) and divide
by 2. Thus the median is 2.5. Here the median is a CONTRIVED number in
the sense that it is created and never appears as a real observation.
If there had been an ODD number of observations, for example:
1,1,6,4,6
We would again ORDER the set of observation:
1,1,4,6,6
and then choose the number which divides the set in half. In this example
the median would be 4. Since 2 observations lie to the left of 4 and 2
observations lie to the right of 4.
You have encountered the MEDIAN before, perhaps not by name, but by
application. The 50% mark divides a group of observations in half thus
it is the median. The next time you see a grade sheet, the number correct
that is equal to 50% of the observations is the median mark. You can judge
whether you are above or below the median by comparing your mark to this
reference point.
The big advantage is that the median is unaffected by extreme scores.
For example consider the following set of observations:
1,1,4,6,6000
The median is still 4, since there are 2 observations to the left of
4 and two observations to the right of 4.
MEAN
This is usually taken to be the arithmetic mean. It is defined as the
sum of all the observations divided by the number of observations in the
series.
It is most often used with INTERVAL data.
The biggest problem with the MEAN is that it is markedly affected by
extreme scores.
In the example above, the mean is calculated as follows:
x = 1 + 1 + 1 + 4 + 6 + 6 = 19
N = 6 since there are 6 observations. Thus the mean is equal to 19/6
= 3.17. Notice please that this number is contrived. There is no 3.17 among
our observations. It is nonetheless the best number to represent the average
of the observations.
As a point of emphasis, consider the following set of observations:
1,1,2,6. The mean is 10 divided by 4 = 2.5
1,1,2,6000. Here the mean is 6004 divided by 4 = 1501. Note how the
mean is markedly affected. IT MOVES IN THE DIRECTION OF THE EXTREME SCORE.
Measures of Dispersion
Range
The lowest number in a series subtracted from the highest number in
the series.
Quick to compute but is of limited utility. May be used to calculate
sample size.
For the series: 1,1,1,4,6,6 the range is 6  1 = 5.
Variance
This is one of the most commonly used terms to describe the spread of
a set of observations. It is calculated by formulae in your readings. I
will not ask you to calculate it for this class.
Interpretation:
A. Large
Range of numbers
Number of people or observations in the study is small
B. Small
Range of numbers
Number of people or observations in the study is LARGE
BIGGEST APPLICATION IS IN THE COMPUTATION OF THE
STANDARD DEVIATION.
STANDARD DEVIATION
This is the positive SQUARE ROOT of the Variance. Again I will not ask
you to calculate this for the class. The interpretation is important however.
Application:
In certain circumstances determine NUMBER of OBSERVATIONS around the mean of a set of observations.
Accuracy of a set of readings
Stability
Range of numbers
Number of observations
Table 2. Summary of the Characteristics of 93 Patients Admitted to the Hospital for Suspected MI  
(N = 43) 
(N = 50) 

No. of patients assigned to:
CCU Floor 
29 14 
22 28 
No. of patients with MI 


No. of patients without MI 


Age, (years ± SD) 


HDPI score (± SD) 


CCU demotes cardiac care unit; MI, myocardial infarction; HDPI, Acute Ischemic Heart Disease Predictive Instrument. 
Source: Green and Ruffin, MI Treatment in Men vs. Women,
The Journal of Family Practice, Vol. 36, No.4, 1993
COEFFICIENT OF VARIATION
This is calculated by dividing the standard deviation
of a set of observations by the mean of the observations.
This is a quick way of comparing the dispersion of
two different sets observations.
Males  Females 
=  = 
s.d. =  s.d. = 
Coef. of Variation:
Males =
Females =
This is only an exercise! We would only use this
C.V. in the situation when the variable of interest is measured on different
scales in two different studies.
Researchers in the medical field may be motivated by different factors
but it is undeniable that people anticipate results of their investigation.
This anticipated result, sometimes vague, leads one to start to systematically
inquire initiate a set of activities which, hopefully, will lead to an
unbiased determination of whether their ideas have merit or not. The basic
ground rule for going into these activities is usually that their initial
idea is testable. This means it is capable of being rendered true
or false by empirical data. This initial notion is sometimes called the
SCIENTIFIC
or RESEARCH HYPOTHESIS. It is really what the anticipated results might
look like. This scientific hypothesis (es) are then translated into a SET
of specific statements called STATISTICAL HYPOTHESES , which are
then evaluated by statistical techniques. The plausibility of these statistical
hypotheses, decided after this evaluation process, gives credence to the
research hypothesis. This evaluation process is called
STATISTICAL TESTING.
The following are examples of scientific hypotheses:
A) Early detection of breast cancer will increase the proportion of
women who survive 5 years.
B) ACE inhibitors will have fewer side effects in hypertensive AfricanAmerican
males.
C) Combination therapy in HIV positive patients will persistently decrease
viral load during the period of administration.
Notice that these statements reflect only the anticipated findings of
the research. There may be no mention of comparison groups, etc., although
these can certainly be included. These details must be attended to in the
translation of these ideas into statistical hypotheses. For example:
Example: Lumpectomy vs. Lumpectomy plus radiation
Scientific hypothesis: Among women with ductal carcinoma in situ
who undergo a lumpectomy alone or lumpectomy plus radiation, a difference
will exist between the proportion experiencing tumor recurrence in the
treated breast within five years after treatment.
Statistical Hypotheses: Eight hundred women with ductal carcinoma in
situ were sampled and assigned randomly to have lumpectomy alone or
lumpectomy plus radiation. Within 5 years, 56 of the 400 women who had
lumpectomy alone had recurrence of cancer in the treated breast; 16 of
400 women who had lumpectomies plus radiation had recurrence in the treated
breast. Let represent the proportion of women who experience tumor
recurrence, L indicate lumpectomy alone, and L/R designate
lumpectomy plus radiation. The STATISTICAL HYPOTHESES are:
H_{0}: _{L/R } _{L }= 0
H_{1}: _{L/R}  _{L} 0
I. Decision Making
a. Clinical Situation  Choose between alternatives.
1. At least 2 alternatives  Not Sick or Sick.
2. Not treat or Treat
b. Literature  same notion.
1. No difference between Drug A and Drug B or Drug A is different than
Drug B.
2. Cases no different from Controls or they are different.
3. Experimental group is the same as the comparison group or it is different.
4. Exposed group is not different from the unexposed group, or it is.
II. Sampling
a. In the clinical situation or the literature  conclusions based
on sampling.
e.g. HTN on visit  is the person really hypertensive.
THE UNDERLYING CONCERN IS ALWAYS WHETHER OUR SAMPLES ARE TRULY REFLECTIVE
OF THE TRUE STATE OF AFFAIRS? For it is from these samples that we will
infer what goes on in the population from which these samples are theoretically
obtained.
III. Hypotheses  Something to be shown correct or incorrect.
a. From a literature (as well as clinical pointofview) two statements:
* The groups do not differ from each other (they are equal to
each other  the apparent difference is due to chance).
** The groups in fact differ from each other (something is going on
 it is due to something other than chance).
b. Convention calls the statement of NO DIFFERENCE the NULL HYPOTHESIS
( H_{0} )
c. The alternative statement is called the ALTERNATIVE HYPOTHESIS. (H_{1}
)
d. Mathematically:
H_{0}
H_{1}
Explicit Null Hypotheses Accompanying Decision Situations^{*}  
Decision Situation  Null Hypothesis 
Diagnostic Tests  This patient's test in no different from the test of the group called well. 
Clinical Trials  This experimental treatment is no different from the treatment it is compared with. 
Quality Control  This batch of production is no different from the usual highquality products of this company. 
Patient Satisfaction  This patient is no different from those who have benefitted from this therapy in the past. 
Judicial  This defendant is no different from the group of people whom we call not guilty. 
Graduate education  This candidate for graduate school is no different from those who have succeeded in the past 
Used cars  This car is no different from those that have proved dependable in the past. 
Connubial  This spouse is no different from faithful spouses. 
^{*} No different indicates having no important difference.
a. You always enter the situation ASSUMING THE NULL HYPOTHESIS IS
TRUE. (i.e. There is no relationship between the variables  the difference
is equal to 0 or that the ratio is equal to 1).
b. The pvalue is a measure of the compatibility of YOUR data with the
NULL hypothesis.
THUS
if the pvalue is large, "accept" the Null Hypothesis.
if the pvalue is small, accept the Alternative Hypothesis.
c. Large vs. small is YOUR DECISION. This is referred to as the level.
Think of this as the point BEYOND REASONABLE DOUBT.
d. For purposes of this course, the formal definition of a pvalue is
the
probability of obtaining the result as large or larger than you
did, and the null hypothesis still being true.
e. Type I error.
Because the pvalue is a probability (a proportion) it can range from
0  1. It is calculated on the assumption that the null hypothesis is
true. Thus you are saying: I know the null hypothesis is TRUE, but the
finding I see is so weird, that I am going to conclude the null hypothesis
is false and accept the alternative. IT IS A MISTAKE TO DO SO, BUT YOU
DO IT ANYWAY. Your subjective estimate of reasonable doubt has been exceeded.
Thus you have made an error. YOU have rejected a true hypothesis.
V. How the Statistical Testing Process proceeds.
1. State the hypothesis of what you think will happen.
2. Generate the NULL hypothesis.
3. Decide on your level.
4. Collect data.
5. Apply a statistical test.
6. Decide on the truth status of the NULL Hypothesis.
a. Point Estimation
b. INTERVAL
Depends on 3 things:
1. Sample size
2. Sample variability
3. level
c. Length of the interval
d. Decision Rule:
If the confidence interval contains THE NULL VALUE (either 0 for
interval data or 1 for ratio data) then DO NOT reject H_{0}.
If the confidence interval DOES NOT CONTAIN THE NULL VALUE, then
REJECT THE NULL HYPOTHESIS.
Table A
Effects of caffeine consumption and other risk factors on low birth
weight according to logistic regression for term deliveries, YaleNew Haven
Hospital, 19801982
Parameter  Adjusted relative
risk 
95% CI^{*} 
p value 
Caffeine intake^{†} (mg/day)  
1  150  1.4  0.7  3.0  0.33 
151  300  2.3  1.1  5.2  0.04 
301  4.6  2.0  10.2  0.0004 
Nonwhite ethnicity^{‡}  4.0  2.4  6.6  0.0000 
Parity 0^{§}  2.0  1.2  3.3  0.007 
Cigarette smoking  1.7  1.1  2.9  0.02 
Gestational age^{¶}  5.6  4.3  7.2  0.0000 
^{*} 95% CI (categorical) = exp[ ± 1.96(SE)]: 95% CI (continuous) = exp[(X_{1}) ± 1.96(X_{1}]/
exp[(X_{0}) ± 1.96(SE)X_{0}], where X_{1} is the value of interest of the variable, and X_{0} the reference value.
^{†} Reference category is 0 mg/day.
^{‡} Black and other compared with white ethnicity to calculate relative risk.
^{§} Compared with parity 1 or more to calculate the relative risk.
One or more cigarettes/day compared with none to calculate the relative risk.
^{¶} Continuous variable.
Thirtyseven weeks compared with 40 weeks gestation to calculate the relative
risk.
Table 3. Patient Characteristics and Site of Care by Race at Hospitalization for Angiography.  
Characteristic  Whites, %  Blacks, %  p* 
Sociodemographic factors  
Female  37.6  52.0  <.001 
Medicaid eligible  3.8  23.3  <.001 
Principal diagnosis  
Myocardial infarction  19.0  22.5  .003 
Unstable angina  25.0  27.0  ns 
Angina pectoris  11.5  13.1  ns 
Chronic ischemia  44.5  37.4  <.001 
Secondary diagnoses  
Congestive heart failure  7.7  11.9  <.001 
Diabetes mellitus  14.2  26.2  <.001 
Chronic renal failure  0.7  2.8  <.001 
Peripheral vascular disease  3.4  4.1  ns 
Cerebrovascular disease  3.6  2.7  ns 
Chronic obstructive lung disease  8.0  5.4  .002 
Type of hospital  
Public  9.6  15.1  <.001 
Teaching  66.1  71.0  .001 
Urban/suburban  92.7  90.2  .002 
Revascularization procedures available  84.2  77.7  <.001 
*^{2 }test; ns indicates not significant. 
Table 4. Unadjusted Rates of Revascularization Procedures Within 90 Days After Angiography, Stratified by Race and Type of Hospital.  
Type of Hospital Where Angiography Performed  Whites, %  Blacks, %  Relative Risk  90% Confidence Interval 
Public  55.4  35.2  1.58  1.28  1.94 
Private  53.9  37.5  1.44  1.32  1.56 
Teaching  54.6  37.1  1.47  1.34  1.62 
Nonteaching  52.9  37.3  1.42  1.22  1.64 
Urban/suburban  54.1  37.0  1.46  1.35  1.59 
Rural  53.9  38.3  1.41  1.10  1.80 
Revascularization procedures available  56.0  39.7  1.41  1.30  1.54 
Revascularization procedures not available  43.4  28.3  1.53  1.25  1.88 
Table 5. Significant Multivariate Predictors of Revascularization Procedures Within 90 Days After Angiography.*  
Variable  Adjusted Odds Ratio  95% Confidence Interval 
Sociodemographic factors  
White  1.78  1.56  2.03 
Male  1.28  1.22  1.35 
Medicaid eligibility  0.80  0.71  0.91 
Principal diagnosis†  
Myocardial infarction  2.14  1.94  2.35 
Unstable angina  2.78  2.54  3.04 
Chronic ischemia  2.17  1.99  2.35 
Secondary diagnoses  
Congestive heart failure  0.76  0.69  0.83 
Peripheral vascular disease  0.74  0.65  0.85 
Cerebrovascular disease  1.20  1.05  1.37 
Chronic obstructive lung disease  0.79  0.72  0.87 
Type of hospital  
Revascularization procedures available  1.63  1.52  1.75 
Public  1.11  1.02  1.21 
Region‡  
Northeast  0.73  0.67  0.80 
South  0.72  0.66  0.77 
Midwest  0.80  0.74  0.87 
* Using logistic regression to adjust for
all listed variables and age, secondary diagnoses of diabetes mellitus
and chronic renal failure, and the teaching status and urban or rural location
of the hospital in which angiography was performed.
† Relative to angina pectoris. ‡ Relative to West. 
Table 6. Adjusted WhitetoBlack Odds Ratios for Revascularization Procedures Within 90 Days After Coronary Angiography by Type of Hospital.*  
Type of Hospital Where Angiography Performed  WhitetoBlack Odds Ratio  95% Confidence Interval 
Public  2.11  1.51  2.95 
Private  1.73  1.49  1.99 
Teaching  1.84  1.58  2.16 
Nonteaching  1.63  1.28  2.08 
Urban/suburban  1.79  1.56  2.05 
Rural  1.63  0.93  2.86 
Revascularization procedures available  1.79  1.55  2.08 
Revascularization procedures not available  1.72  1.28  2.32 
*Using logistic regression to adjust for age; sex; region of residence; Medicaid eligibility; principal coronary diagnosis; secondary diagnoses of congestive heart failure, diabetes mellitus, chronic obstructive pulmonary disease, chronic renal failure, cerebrovascular disease, and peripheral vascular disease; and the ownership, teaching status, location, and availability of revascularization procedures at the hospital in which angiography was performed. 
Examples of Two Statistical Tests
Example 1. Does the Average Height of the Male medical students
in the class of 1996 differ from the average height of the Female medical
students?
MALES  FEMALES 
Mean = 70.42 inches
Standard deviation = 2.74 Number = 53 
Mean = 64.97 inches
Standard deviation = 3.248 Number = 39 
The resulting t value can be interpreted as how many "standard deviations"
you are from the middle of a distribution which has its center on 0. Based
on this value you conclude that the null hypothesis of NO DIFFERENCE BETWEEN
THE HEIGHTS OF MEN AND WOMEN IN THE CLASS OF 1996 SHOULD BE REJECTED, and
the alternative to that null hypothesis should be accepted  There
is a statistically significant difference in the height of male medical
students and female medical students.
Example 2.
Question: Is there any difference between men and women in their ranking
of
Knowledge and Competency as the highest ranking expectation
of the American public regarding their physician.
H_{0}: There is no association between gender of the respondent
and the ranking of Knowledge and Competency as the highest ranking expectation
of the American people relative to their physician.
H_{1}: There is an association between gender and ranking.


Highest  Second  Third  Fourth  Total  
Male  33  9  7  4  53 (57.6%) 
Female  27  6  6  0  39 (42.4%) 
Total  60
(65.2%) 
15
(16.3%) 
13
(14.1%) 
4
(4.3%) 
92 (100%) 
Expected Number of Observations Under the Null Hypothesis


Highest  Second  Third  Fourth  Total  
Male  34.6  8.6  7.5  2.3  53 (57.6%) 
Female  25.4  6.4  5.5  1.7  39 (42.4%) 
Total  60
(65.2%) 
15
(16.3%) 
13
(14.1%) 
4
(4.3%) 
92 (100%) 
This information is combined in a statistic call the ChiSquared Statistic.
Note the pvalue. It is greater than the criterion level of .05, therefore
the data are compatible with the null hypothesis. The null hypothesis is
supported if we have enough people. Thus, There is no association between
gender of the respondent and the ranking of Knowledge and Competency as
the highest ranking expectation of the American people relative to their
physician.
I. Synonyms
A. Cohort
B. Followup
II. Key Points
A. Disease Free
B. Assignment of patients done by "Nature" or Investigator
C. YOU decide what information to collect and when
to collect it.
III. Advantages
A. Can estimate the incidence of a disease (or whatever
dependent variable you are studying) with a high degree of accuracy.
B. Reduce the emphasis on RECALL
C. Can obtain information on changes in habits.
D. Provides the opportunity to study the whole spectrum
of morbidity and/or mortality.
E. Avoids the "latelook" bias.
IV. Disadvantages
A. Difficult and expensive.
B. Induce change in habits.
C. Selection bias may be more difficult to detect.
D. Very inefficient for rare diseases.
Questions to be asked when faced with a
PROSPECTIVE STUDY
1. Study Population : How is the study
population selected? Can you determine if the population studied is similar
to your own population? Is the study population composed of individuals
who have special characteristics that would select them for membership
in the study? Where did they come from (referral center, general practice,
general population, etc.)?
2. Sampling Procedures :
Can you tell how the individuals were picked? Can you detect any SELECTION
BIAS ? How do patients who were asked to participate
BUT REFUSED
differ from those who participated?
3. Follow  up
: Is there loss to followup? Are the reasons for attrition likely to be
related to the outcome of the study?
4. Habits
: Did subjects change their habits while the study was in progress? If
yes, were these individuals put into separate subgroups for analysis? Do
the authors periodically reexamine the cohorts to see if habits change?
5. Surveillance bias
: Is surveillance bias operating? Are the cohorts being followed with equal
intensity? Or is highpowered scrutiny being applied to certain subjects,
which may bias results?
I. Reasons for a CaseControl Study
A. Efficiency
B. Rare Diseases
C. Ethics
II. Problems in CaseControl Studies
A. Adequacy of Information
B. Biased Recall
C. Selection of Controls
D. Selection of Cases
III. Control Group Considerations
A. Multiple Controls
B. Community Controls
C. Matching
1) Wasted Matching
2) Overmatching
Questions to ask when faced with a
Case Control Study
1. Are the Data Dependable?
 since the data are obtained from the past, are
the records complete.
2. Is Recall Bias a serious danger here?
 Have attempts been made to assess or control
for such a bias?
3. How alike are the cases and controls?
 do they differ ONLY on the absence of disease?
Are other differences that MIGHT bear on either the risk factor or the
outcome or both present? If yes, did the study control for these differences
(i.e. matching or a statistical technique?)
4. What kind of population do the cases represent?
 Heterogenous population (high generalizability)
 Homogeneous population (low generalizability)
5. Are other biases evident? (Can Apply to Both PROSPECTIVE and
CASECONTROL STUDIES
a. Detection bias (heightened awareness)
b. Latelook bias (Neyman bias)
c. Nonresponse bias
d. Volunteer
e. **** SELECTION ******
f. Admission bias
Establishing a Statistical Association or Statistical Relationship
Assume we want to know if two variables are statistically
related. This is also referred to as a statistical association. The idea
is to determine if the presence of one variable effects the occurrence
of a second variable.
Example:
Is the taking of oral contraceptives related to thrombophlebitis?
1) Independent variable ___________________________________
2) Dependent variable ____________________________________
Level of measurement of the independent variable
_______________
Level of measurement of the dependent variable _________________
Let us look at two approaches  Prospective and
"Retrospective" or CASECONTROL.
PROSPECTIVE
Thrombophlebitis  NO Thrombophlebitis  
Birth Control Pills  a = 30  b = 970  a + b = 1000 
No Birth Control Pills  c = 3  d = 997  c + d = 1000 
CASECONTROL
Thrombophlebitis  NO Thrombophlebitis  
Birth Control Pills  a = 90  b = 45 
NO Birth Control Pills  c = 10  d = 55 
a + c = 100  b + d = 100 
The letters appearing below A  D represent numbers
of subject in the four possible combinations of exposure and outcome status.
(In this instance death)
A. Exposed persons who later die.
B. Exposed persons who do not die.
C. Unexposed persons who later die.
D. Unexposed persons who do not die.
The total number of subjects in this study is the
sum of A + B + C + D. The total number of exposed persons is A + B, and
the total number of unexposed persons is C + D.

















^{1}In some studies, the outcome is development
of disease rather than death.
Among exposed persons the risk (R) of death is defined
as:
Among unexposed persons the risk (R) of death is
defined as:
The Risk Ratio (RR), or Relative Risk,
is:

















^{1}Data used, with permission, from Nelson
KB, Ellenberg JH: Apgar scores as predictors of chronic neurologic disability.
Pediatrics 1981; 68:36.
The risk among exposed newborns is:
The risk among "less exposed" newborns is:
Quantification of the magnitude of this effect is
achieved by calculating the risk ratio:
Attributable
Risk and Attributable Risk Percent
Attributable Risk or Risk Difference or (excess risk)
= RD. This is defined as:
Using the previously cited data relating 10 minute
Apgar scores (0  3) vs (4  6) to the risk of death in the first year
of life, the risk difference is:
Another measure of interest is the attributable
risk percent (ARP), in which the risk difference is expressed as a
percentage of the total risk experienced by the exposed group:
For the Apgar scoreinfant mortality data, the attributable
risk percent is:
Sometimes studies are constructed to take into account
how long someone stays in the study. For example how many years a person
is studied. When this information is provided and the duration of study
is important, a common framework for analysis is as follows:
Table 8a: Summary format of rate data from a cohort study  
Number of Outcomes  Persontime (PT) usually in years  
Exposed Persons 

PT_{(exposed)} 
Unexposed Persons 

PT_{(unexposed)} 
Total 

PT_{(total)} 
Several studies have looked at the existence of risk
factors for heart attacks among working individuals. Since people come
and go from the work force how long someone stays in study is critical
for forming inferences about the safety of a job or the potential effect
heart attacks have on labor force questions. Assume you have access to
information that looks at baseline cholesterol levels in a workforce and
then whether they subsequently develop an MI. The baseline measure is like
a screen, the outcome is MI. Assume you study approximately 40,000 individuals
for an average of 15 years (some longer, some shorter). Your data may look
like this:
Table 8b. Baseline Cholesterol Levels in a Cohort of Men followed for an average of 15 years and subsequently Developing an MI.  



Chol. Lvl <5.1 mmol/L^{3} 


Chol. Lvl 5.2  6.2 mmol/L^{3} 


Total 


Unmatched Design
A Cases who were exposed
B Controls who were exposed
C Cases who were not exposed
D Controls who were not exposed
Although the summary tables for cohort and casecontrol
studies are similar, it is important to remember that the underlying approaches
to sampling differ, and the analysis must account for these differences.
In a cohort study sampling is based upon exposure status, and the investigator
thus determines the total numbers of exposed (A + B) and the unexposed
C + D) that are included in the study. Risk of disease development then
can be estimated separately for exposed and unexposed groups, and these
two risks can be compared in a risk ratio (RR).
A casecontrol, on the other hand, begins with sampling
of persons with and without the disease of interest ((A + C) and (B + D)
respectively). With this approach, the proportion of persons in the study
who have the disease is no longer determined by the disease risk in the
source population but rather by the choice of the investigator. That is,
a disease that occurs infrequently in the source population can be oversampled,
so that affected individuals constitute a large proportion of the study
sample. This ability to oversample affected individuals is why casecontrol
studies are statistically efficient for the study of rare diseases.
Once the investigator determines the ratio of persons
with and without the disease of interest in a casecontrol study, risk
of disease no longer can be estimated. As shown in the following section,
however and indirect estimate of the incidence rate ratio can still be
obtained in a casecontrol study.

















With the notation introduce in Table 9, the probability
that a case was exposed previously is estimated by:
The odds of exposure for exposure for cases represent
the probability that a case was exposed divided by the probability that
a case was not exposed. The odds then are estimated by:
Similarly, the odds of exposure among controls are
estimated by:
The odds of exposure for cases divided by the odds
of exposure for controls are expressed as the
odds ratio (OR). Substituting
from the preceding equations, the OR is estimated by:
The OR is sometimes termed the exposure odds ratio;
or the crossproduct of Table 9, because it results from dividing the product
of entries on one diagonal of this table by the product of entries on the
cross diagonal.


















In other words, the odds of aspirin use for patients
with Reye's Syndrome were almost ten times greater than the odds of aspirin
use among controls. This will many times be reported as: To the extent
that the OR provides a valid estimate of the relative risk one could
conclude from this investigation that use of aspirin for a preceding viral
illness increased the likelihood of developing Reye's Syndrome tenfold.
Editorial Note: The italicized and bolded
statement above, must be interpreted carefully. It implies that a retrospective
study can induce a "causeeffect" relationship. Statements like this try
to ease this leap of faith by using the phrase "To the extent..." , but
it is a push nonetheless. In point of fact these particular data were taken
at face value, since clinicians are no longer prescribing Aspirin for fever
and headaches in children. Many subsequent studies have tried to verify
the coincident occurrence of Aspirin use, viral illness and serious disease
outcomes. Thus, a case control study provides an efficient means of INITIALLY
looking at a serious disease, but does not establish a definitive cause
and effect relationship.
Table 11. Numbers of cases and controls and relative risk
according to a history of use of dietetic beverages and
sugar substitutes by sex.
EXPOSURE  MEN  WOMEN  
CASES  CONTROLS  RELATIVE RISK  CI^{*}  CASES  CONTROLS  RELATIVE RISK  CI^{*}  
DIETETIC BEVERAGES  144  155  0.8  0.61.1  69  46  1.6  0.92.7 
SUGAR SUBSTITUTES  101  113  0.8  0.51.1  54  39  1.5  0.92.6 
NO EXPOSURE  224  193  1  74  80  1 
^{*} CI denotes 95 percent confidence
interval.
Nonexposed subjects reported never using dietetic
beverages or sugar substitutes and no current use of artificially sweetened
foods
Table 12. Numbers of cases and controls and relative risk according to current frequency of use of dietetic beverages, sugar substitutes and artificially sweetened foods, by sex.
Exposure  Men  Women  
Cases  Controls  Relative Risk  Cases  Controls  Relative Risk  
Dietetic Beverages 

22  12  1.9  6  9  0.5 
Drinks/day 

18  23  0.9  11  9  1.6 

64  77  0.7  33  13  2.5  
Sugar Substitutes
Powder packets 
3+ 
21 
20 
1.0 
10 
8 
1.3 
or equivalent/day 

16  28  0.5  15  11  1.2 
Tablets/day 

8  7  1.3  4  5  0.8 

13  10  1.5  
Dietetic Foods 

31  33  0.9  20  16  1.4 
Servings/wk 

13  18  0.6  9  16  0.5 

12  13  0.8  
NO Exposure  224  193  1  74  80  1 
Associations (Continued)
A. Positive
B. Negative
II. Causation
A. Criteria
1. Strong Design
2. Evidence from Human Experiments
3. STRENGTH OF ASSOCIATION
4. Consistency
5. TEMPORALITY
6. DoseResponse
7. Epidemiologic Sense
8. Biologic Sense
9. Analogous to previously shown studies of causal association
Briefly: Statistical Association
Temporal Association
Alternative explanations ruled out
MAKES COMMON SENSE
B. Correlation DOES NOT PROVE CAUSATION
III. Applications
A. Below are statements which suggest association  are they valid?
1. If you find that 60% of students who develop infectious mononucleosis
are habitual smokers, this shows the presence of an association between
the disease and smoking. (T F)
2. If you find that 5% of students who smoke develop infectious mononucleosis
during a oneyear followup period, this shows the presence of an association
between the disease and smoking. (T F)
B. A suggested mechanism for "Association".
1. 2 x 2 table
Outcome  
Positive  Negative  
Exposed  A  B 
Unexposed  C  D 
2. You can only determine association if ALL THE CELLS CAN BE FILLED
IN FROM THE INFORMATION PROVIDED BY THE AUTHOR.
3. If 60% of a large sample of male students and 30% of a large sample
of female students smoke, there is an association between gender and smoking.
(T F)
4. BE CAREFUL in the sense that the terms used to express a relationship
 like incidence or proportion (percent) implies a knowledge of a denominator
or a total. Thus, you can deduce the missing cells in these instances.
I. Consider the following statement:
The violent crime rate in City A in 1988 was 30%.
The violent crime rate in 1989 decreased to 15%. This shows a decrease
of 15%.
Is the conclusion true or false?
II. Types of changes reported in the medical literature.
A. Absolute Change
B. Percentage change (proportional change)
III. Power
The ability to find a significant difference if
it really exists.
You apply it to instances when the authors show
NO statistical significance.
The question reduces to : Is
my sample size large enough to find a clinically important difference.
Using a Nomogram for CONTINUOUS Variables
Perform these Steps:
1. Decide what size difference between the two groups
is clinically important.
2. Locate the difference on the horizontal axis.
3. Extend a vertical line to the diagonal line representing
the standard deviation.
4. Extend a horizontal line to the vertical axis and read the required sample size.
Using a Nomogram for DICHOTOMOUS Variables
Perform these steps:
1. Identify one of the two groups as the control group.
2. Decide what difference between the two groups
would be considered clinically important. Express this difference as a
% change in the response rate.
3. Locate the % change on the horizontal axis.
4. Extend a vertical line to intersect with the diagonal
line representing the response rate.
5. Extend a horizontal line from the intersection point to the vertical axis and read the required sample size.
A physician wants to assess the effects of calcium
supplements on blood pressure. The physician wants to be able to detect
a 5 mm difference between the treatment and the control group when the
standard deviation is 15. Use a nomogram to determine the required sample
size.
Answer: The required sample size of each group is
__________.
What is the likely outcome of the above experiment
if the research is conducted with 100 patients in each group?
Answer: ___________________.
EXERCISE #2
A researcher is trying to assess the effectiveness
of a new therapy. The standard therapy has a cure rate of 30%. The researcher
is interested if the new therapy will cure 45%. The research is done with
90 patients in the treatment and control group. The difference in cure
rates in found to be nonsignificant. Use a nomogram to determine if the
sample size was adequate.
The sample size was _____________. Approximately
__________ should have been in each group.
Suppose the above research was interested in detecting
a 100% increase in cure rate. Under these conditions was the sample size
adequate if 60 patients were present in each group?
Answer: ____________________________________________________
____________________________________________________________.
We conducted a doubleblind, randomized, placebocontrolled
trial in 40 patients to evaluate the need for antibiotics in acute exacerbations
of chronic bronchitis. All patients were sufficiently ill to require hospitalization
although none needed ventilatory support; the presence of pneumonia was
excluded. Treatment consisted of bronchodilators, corticosteroids, and
either tetracycline, 500 mg, or placebo by mouth every 6 hours for 1 week.
Arterial blood gases, spirometric tests, bacteriologic evaluation of sputum,
and patient and physician evaluation of the severity of illness were assessed
at the beginning and end of the study. All patients improved both symptomatically
and by objective measures of lung function. At the end of the study period
there were no differences between those patients receiving tetracycline
and those receiving placebo. We conclude that antibiotic therapy is not
needed in moderately ill patients with exacerbations of chronic bronchitis.
From the NICOTRA, et al. article
TCN  PLACEBO  
PaO_{2}  Day 7  Day 7 
Mean  74.1  68.1 
St. Dev.  13.6  17.5 
Sample Size = 20 per group
Checklist to be used by Authors when preparing or
by Readers when analyzing a report of a randomized controlled trial (RCT)^{1}

Yes  No  Unable to determine 
1. State the unit of assignment.  
2. State the method used to generate the intervention assignment schedule.  
3. Describe the method used to conceal the intervention assignment schedule from participants and clinicians until recruitment was complete and irrevocable.  
4. Describe the method(s) used to separate the generator and executor of the assignment.  
5. Describe an auditable process of executing the assignment method.  
6. Identify and compare the distributions of important prognostic characteristics and demographics at baseline.  
7. State the method of masking.  
8. State how frequently care providers were aware of the intervention allocation, by intervention group.  
9. State how frequently participants were aware of the intervention allocations, by intervention group.  
10. State whether (and how) outcome assessors were aware of the intervention allocation, by intervention group.  
11. State whether the investigator was unaware of trends in the study at the time of participant assignment.  
12. State whether masking was successfully achieved for the trial.  
13. State whether the data analyst was aware of the intervention allocation.^{*}  
14. State whether individual participant data were entered into the trial database without awareness of intervention allocation.  
15. State whether the data analyst was masked to intervention allocation.  
16. Describe fully the numbers and flow of participants, by intervention group, throughout the trial.  
17. State clearly the average duration of the trial, by intervention group, and the start and closure dates for the trial.^{†}  
18. Report the reason for dropout clearly, by intervention group.  
19. Describe the actual timing of measurements, by intervention group.  
20. State the predefined primary outcome(s) and analyses clearly.  
21. Describe clearly whether the primary analysis has used the intentiontotreat principle.  
22. State the intended sample size and its justification.  
23. State and explain why the trial is being reported now.  
24. Describe and/or compare trial dropouts and completers.  
25. State or reference the reliability, validity, and standardization of the primary outcome.^{‡}  
26. Define what constitutes adverse events and how they were monitored by intervention group.  
27. State the appropriate analytical techniques applied to the primary outcome measure(s).  
28. Present appropriate measures of variability (e.g., confidence intervals for primary outcome measures).  
29. Present sufficient simple (unadjusted) summary data on primary outcome measures and important side effects so that the reader can reproduce the results.  
30. State the actual probability values and the nature of the significance test.  
31. Present appropriate interpretations (e.g., NS does not necessarily indicate no effect; P<.05 does not necessarily indicate proof).  
32. Present the appropriate emphasis in displaying and interpreting the statistical analysis, in particular controlling for unplanned comparisons.  
^{*}If the data analyst is not
masked as to the interventions, new treatments may be grossly favored over
standard treatments.
^{†}This information may sometimes reveal duplicate publication rather than two separate trials by the same author(s). ^{‡}Many trials are longitudinal and require several followup assessments. These assessments may be subjective based on the responses of questionnaires or scales. There is wide variation in how scales and questionnaires are constructed which may influence the assessment, reliability, validity, and responsiveness of the treatment outcome of interest. Providing information or references about the development of these outcome measures will enable readers to judge how confident they should be about the results. 
1. Standards of Reporting Trials Group. A proposal
for structured reporting of randomized controlled trials. JAMA: 1994;272:19261931.
1. Title:
2. Source:
3. Objective (Purpose):
4. Design:
5. Setting:
6. Patients:
7. Intervention:
8. Main Outcome Measures:
9. Main Results:
10. Conclusion:
Selection
Comparison group
Recall
Volunteer
Surveillance
Detection
Loss to F/U
Data Source
Worksheet for Paper Review
1. Title:
2. Source:
3. Objective (Purpose):
4. Design:
5. Setting:
6. Patients:
7. Intervention:
8. Main Outcome Measures:
9. Main Results:
10. Conclusion:
Selection
Comparison group
Recall
Volunteer
Surveillance
Detection
Loss to F/U
Data Source
Worksheet for Paper Review
1. Title:
2. Source:
3. Objective (Purpose):
4. Design:
5. Setting:
6. Patients:
7. Intervention:
8. Main Outcome Measures:
9. Main Results:
10. Conclusion:
Selection
Comparison group
Recall
Volunteer
Surveillance
Detection
Loss to F/U
Data Source
Readings
Addato K. Behavioral factors in urinary tract infection.
JAMA. 1979;241:252526 R1  R2
Hunter RS Antecedents of child abuse and neglect
in premature infants: A prospective study in a newborn intensive care unit.
Pediatrics. 1978;61:62935 R3  R9
Nicotra MB Antibiotic therapy of acute exacerbations
of chronic bronchitis: A controlled study using tetracycline. Annals of
Internal Medicine. 1982;97:1821 R10  R13
Ramond M A randomized trial of prednisolone in patients
with severe alcoholic hepatitis. NEJM. 1992;326:50712. R14  R19a
Schrock CG Clarithromycin vs penicillin in the treatment
of streptococcal pharyngitis. J FAM PRACT. 1992;35:622626. R20  R25
Spitzer WO The use of agonists and the risk of death
and near death from asthma. NEJM. 1992;326:5016. R26  R32
Young MJ Sample size nomograms for interpreting negative
clinical studies. Annals of Internal Medicine. 1983;99:24851. R33  R36
Sauve JS Does this patient have a clinically important carotid bruit?
JAMA. 1993;270:284345. R37  R39
Williams JW Randomized controlled trial of 3 vs. 10 days of trimethoprim/sulfamethoxazole for acute maxillary sinusitis. JAMA. 1995; 273:10151021. R40  48
Supplementary Explanations
Levine MA Readers' guide for causation: Was a comparison
group for those at risk clearly identified? ACP Journal Club 1992 S1 
S2
Altman DG Confidence intervals in research evaluation.
ACP Journal Club 1992 S3  S4
Laupacis A How should the results of clinical trials
be presented to clinicians? ACP Journal Club 1992 S5  S7
Cook D On the clinically important difference. ACP
Journal Club 1992 S8  S9
Oxman AD Users' guide to the medical literature. I. How to get started.
JAMA 1993;270:20935;2096. S10  S13
Guyatt GH Users' guides to the medical literature.
II. How to use an article about therapy or prevention  Are the results
of the study valid? JAMA 1993;270:25982601. S14  S17
Jaeschke R Users' guide to the medical literature.
III. How to use an article about a diagnostic test  Are the results of
the study valid? JAMA 1994;271:38991. S18  S20
The following titles are from articles published
in the medical literature. Determine if the terms INCIDENCE and
PREVALENCE
are used correctly.
1. Incidence of Blood Group O In An Earlier Series
of Myocardial Infarction Patients.
2. The Prevalence of Cardiovascular Disease In Different
Ethnic and Socioeconomic Groups in Beit Shemesh, Israel.
3. Incidence of Rheumatic Fever Summary of an Eight
Year Study of Incoming Freshmen at the University of North Dakota.
4. Rheumatic Heart Disease Epidemiology. Part III.
The San Luis Valley Prevalence Study.
5. Incidence of Primary Carcinoma of the Liver in
the West of Scotland between 1965 an 1975.
6. Prevalence of Undiagnosed Cancer of the Large
Bowel Found at Autopsy in Different Races.
7. The Rising Incidence of Cancer of the PancreasFurther
Epidemiologic Studies.
8. Incidence of Cancer in Men on a Diet High in Polyunsaturated
Fat.
9. Age and Sex Variations in the Prevalence and Onset
of Diabetes Mellitus.
10. The Incidence of Chronic Peptic Ulcer Found at
Necropsy.
GIVEN THE FOLLOWING STATEMENTS IDENTIFY THE INDEPENDENT
AND DEPENDENT VARIABLES:
11. The compliance rate with prescribed medical regimen
will be greater among cancer patients who have high selfesteem and low
anxiety compared with those with low selfesteem and high anxiety levels.
12. The effect of restraint systems on the incidence
of injury to children in automobile accidents.
Questions continued on next page
CLASSIFY THE FOLLOWING VARIABLES AS TO THEIR LEVEL
OF MEASUREMENT (NOMINAL, ORDINAL, INTERVAL):
13. Age
14. Blood pressure
15. Ethnicity
16. Number of cups of coffee per day
17. You are working an Emergency Room on your first
clerkship and a patient comes in complaining of chest pain. You give the
patient nitroglycerine wait a few minutes and then ask the patient, " If
the pain you came in with was a 10, what number would you assign to your
discomfort now?". The patient responds 5. How are you going to interpret
this answer
a) The pain is less. (An ordinal interpretation)
b) A 50% decrease in pain. (An interval interpretation).
18. Characteristics of a normal distribution include
the following:
a. The total area under the curve represents 100% of all values.
b. The mean and median and mode coincide.
c. Approximately 5% of the values lie beyond 2 standard deviations from the median.
d. The curve is symmetrical.
e. All the above.
19. In a study of 250 students taken from the general
student population of a southern university, the mean systolic blood pressure
was 116mm Hg, with a standard deviation of 4mm Hg. From this information
approximately 99% of the general student population will have systolic
blood pressure (mm Hg) in the range of:
a. 110130mm Hg
b. 104128mm Hg
c. 112120mm Hg
d. 116124mm Hg
e. 118122mm Hg
20. In a study involving 150 health providers, the
mean serum cholesterol level was found to be 176 mg/dL with a sample variance
of 25 mg/dL. From this information approximately 1/3 of these providers,
will NOT have a cholesterol level in the range of:
a. 161191 mg/dL
b. 166186 mg/dL
c. 171181 mg/dL
d. 172180 mg/dL
e. 175177 mg/dL
21. From the following information compute the mean,
median and mode.
1,5,1,2,5,6.
22. Tcell counts from a series of newly diagnosed HIV positive males were collected. The mean was 176; the median was 200 and the mode was 224.
From this information the distribution of these data
can be described as: (May be more than one correct answer).
a. Normally distributed.
b. Positively skewed.
c. Negatively skewed.
d. Skewed right.
e. Skewed left.
Homework Set 2
1. All of the following statements are true of the
NORMAL distribution except:
a. The mean = median = mode
b. Approximately 50 percent of the observations are
greater than the mode.
c. Approximately 68 percent of observations fall
within 1 standard deviation of the mean
d. The number of observations between 0 and 1 standard
deviations from the mean is the same as the number of observations between
1 and 2 standard deviations from the mean.
e. The shape of the curve does not depend on the
value of the mean.
2. Randomization is a procedure used for assignment
or allocation of subjects to treatment and control groups in experimental
studies. Randomization ensures
a. that assignment occurs by chance.
b. that treatment and control groups are alike in all respects except treatment.
c. that bias in observations is eliminated.
d. that placebo effects are eliminated.
e. none of the above.
3. In comparing the difference between two means,
the value of p is found to be .20 The correct interpretation of this result
is:
a. the null hypothesis is rejected.
b. the difference is statistically significant.
c. the difference is compatible with the null hypothesis.
d. the sample size is small.
e. sampling variation is an unlikely explanation
of the difference.
4. Correct statements concerning statistical inference
include which of the following? ( Choose all that are correct)
a. If the p value is very low, the difference between
the groups must be very large.
b. If the sample size is large enough, it is easy
to achieve statistical significance at the .05 level.
c. The confidence interval is dependent on the sample
size, the variance of the sample and the degree of confidence.
d. The 95% confidence interval is longer than the
99% confidence interval for the same data.
For each case history that follows, select the study
design that it most appropriately illustrates.
(A) Case series report
(B) Casecontrol study
(C) Clinical trial (Randomized Clinical Trial)
(D) Cohort study (Prospective study)
(E) Case report
5. A total of 300 newly diagnosed patients with laryngeal
cancer are randomly allocated to treatment with either surgical excision
alone or surgical excision with radiation treatment.
6. A 39 year old man who presents with a mild sore
throat, fever, malaise, and headache is treated with penicillin for presumed
streptococcal infection. He returns after a week with hypotension, fever,
rash, and abdominal pain. He responds favorably to chloramphenicol, after
a diagnosis of Rocky Mountain spotted fever is made.
7. A total of 3500 patients with thyroid cancer are
identified and surveyed by patient interviews regarding past exposure to
radiation.
8. A total of 10,000 Vietnam veterans, half of whom
are known by combat records to have been in areas where agent Orange was
used and half of whom are known to have been in areas where no Agent Orange
was used. They were asked to give a history of cancer since discharge.
9. Patients admitted for carcinoma of the stomach
are age and sex matched with fellow patients without a diagnosis of cancer
and surveyed as to smoking history to assess the possible association of
smoking and gastric cancer.
Homework Set 3
1. Identify the independent variable(s) and the dependent
variable(s) in the Adatto study.
2. Assuming you are a family physician and are seeing
a 51 year old women who is presenting with a urinary tract infection, are
you justified in using the results of the Adatto study in counseling you
patient? What assumptions are you making if you do; conversely, what assumptions
are you making if you don't?
3. Given the table below, compute the ODDS RATIO
and interpret the finding.


Risk Factor 
Present  Absent  
Present  6  4  
Absent  112  242 
4. Apply the questions covered in lecture regarding
the problems that are inherent in a casecontrol study to the Adatto study.
5. A patient asks, "Doc, what are my chances of getting
through this operation?". What would you need to know order to answer the
patient factually?
6. The following questions refer to the paper entitled
"Antecedents of child abuse and neglect in premature infants: A prospective
study in a newborn intensive care unit. Hunter, et al.
6a. What type of study do the authors cite as a foundation
for their study ?
6b. Justify your answer to Question 6a.
6c. Identify any sources of Study Population bias
and sampling bias.
6d. Is there loss to followup and if so, do the
authors address the problem?
6e. In the 24 item inventory cited on page 630, what
level of measurement is used?
6f. What is the dependent variable in the study?
6g. What evidence did the authors collect to document
the occurrence of the dependent variable?
6h. Given the evidence you cite in the preceding
question, does this suggest any SURVEILLANCE BIAS?
6i. In the abstract, the authors use the term incidence?
Is this a correct or incorrect use of the term? Why?
The following questions refer to the table below:
Acne Present  Acne Absent  
Eat Breakfast  20  50 
Did Not Eat Breakfast  60  110 
7a. Put in the correct margins for a retrospective
study and calculate an odds ratio.
7b. Put in the correct margins for a prospective
study and calculate a risk ratio.
7c. Interpret each calculation.
8. Read the descriptions of the studies below. Determine
the scientific hypothesis, the independent and dependent variables, and
indicate whether a onetailed or a two tailed statistical test is suggested
by the description.
a) Alcohol is assumed to be the causative agent in
many accidents. It is further assumed that alcoholics have an increased
risk of dying from accidents involving severe burns. A study was undertaken
to evaluate the mortality of alcoholics and nonalcoholics admitted to a
burn unit of a major hospital. Nine of 28 alcoholics died; 8 of 75 nonalcoholic
patients died.
b) The relationship between parental smoking and
number of colds per year was examined in nonsmoking teenagers. Nonsmoking
teens in households where both parents smoke had 3 times the number of
colds in 1 year compared to nonsmoking teens in households whether neither
parent smokes.
Homework Set 4
1. Given the 2 x 2 table below identify the following
terms by cell identification.
Disease  
+    
Test  +  a  b  a + b 
  c  d  c + d  
a + c  b + d  a + b + c + d 
I. Prevalence
ii. Sensitivity
iii. Specificity
iv. False positives
v. False Negatives
vi. Negative Predictive Value
vii. Positive Predictive Value
2. A test is said to have a Positive Predictive value
of 77%. What does this mean?
3. A test has a negative predictive value of 90%.
Your patient's test result is negative. What would you probably conclude?
4. For the table in question 1: Assume the disease
you are looking for is a Myocardial Infarction (heart attack) or MI for
short. The test you decide to use is a Creatine Kinase (CK), or as is sometimes
referred to, the CPK (Creatine Phosphokinase). This is thought to rise
early in an infarction. Recall your biochemistry and remember where CPK
comes from and under what conditions it is released. Assume your study
will take place in a Coronary Care Unit. This unit receives all patients
suspected of having an MI. For purposes of this example assume you look
at 360 patients. Assume you have a prevalence rate of 64%. The CPK has
a sensitivity of 93% and a specificity of 88%. I'll admit that these figures
are not that easy to work with but these are actually from a study and
not made up for ease in computation.
a) What are the respective PPV and NPV ?
b) Would you rely on this test to identify if your
patient
I) has a heart attack?
ii) doesn't have a heart attack.
5. Assume you agree with my answer to question 4.
You then get energetic and suggest that this test be used as a screening
tool for general admissions to the hospital rather than just the CCU. There
are 2300 admissions to the hospital and the community prevalence rate is
10%. What are the calculated PPV and NPV?
6. From the diagram on page 36 of the notes, answer
the following questions with the correct letters:
A. Cutoff point set too low.
B . Cutoff point of greater sensitivity.
C. Cutoff point of greater specificity.
D. Cutoff point of greater false positive rate.
E. True positives for cutoff point X.
F. True negatives for cutoff point X.
1. Incorrect. The number of patients with blood group
O is PREVALENCE.
2. This could be correct. If this was a survey of
the patients in this geographic area and the purpose was to establish an
estimate of existing disease then PREVALENCE is correct.
3. Incorrect. The title indicates that the authors
are reporting on how many total cases of rheumatic fever were accumulated
in the group of freshmen ( better they should have said first year students!)
over the eight years of the study.
4. This could be correct, assuming the intent of
the study is to see how many people in the San Luis Valley have rheumatic
fever.
5,6,7 correct.
8. This could be correct. The title indicates that
the men are placed on a diet and then followed to see how many develop
cancer. That is to say, how many NEW cases of cancer are diagnosed.
9. The term prevalence could be used correctly here
giving one a baseline against which to compare. The term onset could
be used as a synonym for incidence since the term connotes newly developed
disease.
10. Incorrect. You don't develop a disease at necropsy!
These would clearly be existing cases of chronic peptic ulcer.
11. The Independent variables are selfesteem and
anxiety. The Dependent variable is compliance. The point I want you to
think about is what data would you accept as evidence of compliance (e.g.
appointments kept, number of pills taken, diary, someone to verify that
the regimen was followed)? Similarly for the independent variables a
standardized psychological scale for anxiety and self esteem, perspiration
when talking to you about disease?
12. This was the title of an article. Presumably
the independent variable is the type of restraint system used on the child.
Here is a problem how old is the child? Do you use a care seat, a lap
belt, a lap and shoulder combination or use all of these and maybe more.
Those of you with children can probably come up with some new ones. Do
you use the same restraint with all children or only up to a certain age
and use only one type of device. The dependent variable is accidents due
to automobile accidents. Here again, the precision or the term looms large.
A fender bender, a high speed crash, etc.
13. Age  interval. The difference between age intervals
is theoretically the same ( i.e. 45 to 42 is three years and the difference
between 25 and 22 is the same distance three years).
14. Blood pressure  interval. The clinical significance
of the distance between measurements can be remarkable but the interval
is the same theoretical distance. This is why clinicians use the terms
low, normal, high and OH MY GOD! to help describe the pressure of the patient.
If you have a patient with a blood pressure of 220/150 with headache, visual
disturbances etc. you are in an emergency situation. Dropping the pressure
to 210/150 is ten points but clinically you are still in trouble. A patient
who is 130/80 and loses weight due to your persuasive style and DoctorPatient
Relationship training and now measures 120/75, has also experienced a drop
of 10 points systolic but wasn't in trouble at either time. THE TEACHING
POINT TO REMEMBER IS THAT IT IS THE CLINICAL INTERPRETATION OF THE DATA
THAT IS IMPORTANT.
15. Ethnicity is nominal.
16. Number of cups of coffee/day is interval.
17. There is no real right answer here since either could be used. I would suggest that an ORDINAL interpretation be used for the following reasons:
a) The purpose of the question. It is usually the
case that you want to assess the direction of the pain and if it is improving.
The ordinal interpretation does this.
b) An interval response assumes a lot about the patient.
If the patient is experiencing a first episode a question requiring a high
level of precision is usually unable to be duplicated the next day (a statement
like "hey doc, great stuff my pain is only 2 and a half today" is very
unlikely.
18. The answer is E (all of the above).
19. This question turns on the properties of the Normal Distribution. the diagram below summarizes the percentages based on the number of standard deviations from the mean. (NOTE: Because the distribution is normal one could also say standard deviations from the median or the mode, since the mean, median and mode coincide.)
Mean ± 1 s.d. = 68% 116 ± 4 = 112120.
Mean ± 2 s.d. = 95% 116 ± 8 = 108124. (See Next Page)
Mean ± 3 s.d. = 99% 116 ± 12 = 104128.
Thus, the answer is B.
20. Be careful here. The information gives the mean
and the VARIANCE. The normal curve properties are based on the standard
deviation. The standard deviation is the SQUARE ROOT of the variance. Thus
the standard deviation is:
Using some of the explanation from above, 1/3 or
approximately 34% will lie 1 standard deviation on either side of the mean.
We don't know which direction it will be so we will have to include both
directions to account for this uncertainty.
So, if these people are below the mean, the range
would be:
mean  1 s.d. = 176  5 = 171; thus 1/3 of the people would be excluded from the range of 171  176.
So, if these people are above the mean, the range
would be:
mean + 1 s.d. = 176 + 5 = 181; thus 1/3 of the people
would be excluded from the range of 176  181.
Thus to be complete the range would be 171  181.
The answer would be C.
21. Mode is most frequent: There are 2 numbers that
each have a frequency of 2  the number 1 and the number 5. Thus a bimodal
distribution (1,5)
The median is the middle point: 1,1,2,5,5,6.
Even number of observations (six of them) so add
the middle two together and divide by two (2 + 5)/2 = 7/2 = 3.5.
The mean is the arithmetic average = x/N = 20/6 =
3.33.
22. Since the mean median mode the distribution is NOT symmetric (or normally distributed). The mean is most effected by extreme scores and is thus moved in the direction of the extreme score. The median is next effected by the extreme score(s) then the mode. The distribution is thus NEGATIVELY SKEWED or SKEWED LEFT. The literature will use both terms, so I put them both in, the answer is C and E.
Suggested Answers to Homework Set 2
1. D
2. A
3. C
4. B,C
5. C
6. E
7. A
8. D
9. B
Suggested Answers to Homework Set 3
1. The authors are somewhat slippery here. They ostensibly
want to look at behavioral factors and see if they are related to recurrent
urinary tract infections (UTI). Then they want to institute a regimen that
will reduce recurrent UTI's. The never address the first question directly.
The authors first determine if the groups differ on certain behavioral
factors just from a descriptive point of view. They limit the factors to
sexual habits, voiding behavior and personal hygiene. Thus, for the first
part of their work, they have used these 3 behavioral factors as
INDEPENDENT
variables. After finding a difference in voiding habits they then posit
a "cause" and effect proposition that voiding habits lead to recurrent
UTI's and design a program to encourage women in the study group to void
regularly. Thus the independent variable in the second part of the study
is the behavioral program and the dependent variable is recurrent UTI's.The
authors then use the following logic: The behavioral program encouraged
people to void more frequently than their usual habits and now they have
fewer recurrent UTI's; therefore voiding frequently MAY be protective of
UTI's.
2. The authors indicate that two groups of people
were excluded from the study: individuals who had an history of UTI or
any serious chronic illness and women older than 40 (see page 2525). Your
patient is thus excluded form the scope of the study by her age. Are you
still tempted? If so, you are assuming that age and any chronic illness
you patient might have has no bearing on UTI's. This may be true, I will
not answer that for the time being since this may keep you interest peaked
when you get to the Renal system next year. If you don't know the influence
of age or chronic disease then you would be justified in NOT applying the
results of this study to your patient and you must search the literature
to see if a study has been done that includes people like you patient (matched
 in a sense) or look to the basic sciences to fill the gaps of the study.
3. Assuming the study to be a casecontrol study, you will be using the ODDS RATIO and the definition is best cited as:
Since the odds ratio is larger than 1.0, one might
be willing to conclude that the risk factor is associated with the outcome.
Later on we will see that we need to look at more than the calculated figure
to interpret the association, but for now make sure that you can calculate
the figure and that you know that the number against which you want to
compare it to is 1.0.
4. The following is a suggested response to applying
the questions posited in lecture regarding a casecontrol study. You may
have additional points but these remarks may help clarify the points covered
in lecture.
1. Are the data dependable?
The data were collected from a University Health
Service which (think for a moment about your undergraduate college) generally
handles uncomplicated medical problems. These records should be fairly
complete. The outcome in question (a UTI) can be defined in an unambiguous
way  note the criteria stipulated in the article. Thus, if anyone has
a question about whether or not someone is a "case" or not they should
be able to go to the medical record and verify that the criteria is met.
This is the acid test in practice, send two people to independently review
the same material and see if they reach the same conclusion. The outcome
here is not subtle and the women should know if they have an infection
because of the manifestations of the illness dysuria, urgency and frequency.
I would be satisfied as to the dependability of the data.
2. Is recall bias operating?
YES. This is a danger in all retrospective studies.
The major teaching point here is that one tends to focus on noxious stimuli
or major outcomes. Thus the cases could remember perhaps more clearly their
voiding histories better than people who have no particular reason to recall
earlier behavior or action. A WORD OF WARNING HERE, HOWEVER. There are
certain things that force individuals to suppress recollection. One example
has been the recollection of mothers who give birth to children who present
with birth defects  either mental or physical. Histories of child abusers
are many times devoid of clues when the parent is asked. One can generate
explanations for these absences of memory on a common sense basis, but
I would like to stress that people can overestimate as well as underestimate
their actions and thus some corroboration of information is necessary.
This corroboration can come from medical records (previous treatment for
a specific illness or symptoms of an illness, neighbors, family, etc.).
The authors have tried to address the problem of estimation of behavior
by asking the patients in each group about other habits WHICH, BIOLOGICALLY
SPEAKING, MAKE SENSE !!!!! In this instance, hygiene and sexual habits.
The reasoning here would be that if one tended to overestimate the times
they waited to relieve their bladder they would also tend to overestimate
other relevant habits. The article reports these habits in some
detail so one could be persuaded that the interviewers were pretty good
about getting information. Since there was a similar response pattern between
both groups in these other areas, one could assume that the recall bias
was not a major problem in the study.
3. How alike are the cases and controls?
This study uses university women who are use the
health service of the university. This provides a common setting for all
subjects. Since I claimed (in Q1) that the service usually handles routine
instances of illness, we can probably assume that the groups are generally
in pretty good health. Notice that the heavy duty problems have been eliminated
prior to the start of the study no chronic illness was allowed in either
group. Thus the health spectrum could be assumed to be comparable between
groups. Since women had to have been in the health service in order to
be chosen, health seeking behaviors could also be assumed to be equal.
If a women decided to go to her family osteopathic physician at home she
would not have been in the study because she uses services beyond the university.
So matching was done here but on a GROUP basis rather than an individual
bases.
4. What kind of populations do the cases represent?
These are university women. They are generally better
educated and more concerned about their health than the general public.
The age range is also truncated (18  39). While this population may be
representative of college populations in may not necessarily be suitable
for all populations. To answer this question, you must satisfy yourself
that the factors that make this study group special have no bearing on
the disease (or outcome) in question. The three big ones here are age (young),
no chronic illness, and education which might encourage better compliance
with treatment regimens. You will be able to answer these concerns when
you have studied the genitourinary system but now I would like to make
sure you are aware enough to raise the questions. The TEACHING POINT TO
REMEMBER is that a special population MAY be a good population to use if
their "specialness" does not interfere with the natural disease process.
If there is doubt you would be better off using the results ONLY on similar
populations and not generalizing to other groups in other settings.
5. Are other BIASES evident?
a. Detection bias: In brief, are people looking for this disease now more than they did in the past?
Uncomplicated UTI's are not a new phenomenon and
are a commonly occurring illness, relatively speaking. Because of the long
standing nature of the disease entity I would not suspect this bias to
be present.
b. Late look bias : In brief, are you looking
at the disease close to the exposure? Looking late would exclude individuals
who would have died soon after the exposure or who would have recovered
without seeking medical care. Thus you would be looking at people who be
better off, from a health perspective, just because they are still alive
although ill.
With this definition in mind, I would not suspect
this to be a problem here. Usually people do not die from a UTI. The symptoms
of dysuria, frequency and urgency occur soon after the bacteria colonizes,
thus attention is sought quickly. This should be contrasted with something
like CORONARY ARTERY DISEASE which may not manifest itself clinically until
years after exposure.
c. Nonresponse bias: In brief, this occurs
when people fail to reply to solicitations for information. The important
question to ask here is do the respondents differ in any way from the nonrespondents
which could influence the outcome or exposure history. An example of this
is in obtaining information via questionnaires from study subjects. There
have been numerous examples of differences in history of alcohol use, drug
use and prescription compliance between people who respond and fail to
respond to such inquirers. This was found out only after the nonrespondents
were literally tracked down and questioned on the spot, so to speak, about
certain facts relative to exposure status.
Applied to this study, the investigators followed
only the last 37 case patients but all 84 controls for a more detailed
look at urinary retention. That is less than 50% of the group. It is not
clear why this cut took place. While it is unlikely that the unstudied
cases group all voided quickly after the urge to do so it is conceivable
that the difference between the case and control group could be diminished
to a more equal state. This is a problem here.
d. Volunteer : In brief, this is a bias that
creeps in if volunteers would display different patterns of behavior than
nonvolunteers. We know that volunteers, over the short haul are compliant
and excited to be part of a study and hence are different from the average
patient. DO NOT BE TOO QUICK TO INVALIDATE ALL STUDIES WHO USE VOLUNTEERS,
HOWEVER. The question is are the characteristics that differentiate volunteers
from other people RELEVANT to the outcome or exposure status. If the nature
of a study is to look at the pathogenesis of disease then the fact that
they are human rather than a volunteer is what is important. If compliance
is important in the study then you must question the generalizability of
the study.
Applied to this study, both groups were volunteers
and thus would not account for any difference in the histories given by
the respondents. The fact that a behavioral regimen was part of the study
raises concern for generalizability. One could say that if volunteers have
a better behavioral compliance record than nonvolunteers then this study
should show the absolute best that the behavioral treatment can do. NOTICE
that only 65% of the patients experienced no reinfection if they followed
the regimen. If as a practicing physician you average people, then you
should conclude that you will see less than this number if you try to duplicate
their program. Also, what happened to the 12 people who were " lost to
followup"?
e. SELECTION : This is really a collection
of many of the biases above. It focuses your attention on any of the characteristics
of the cases or controls that impinge on the outcome or exposure of the
study. I think the best example would be if the cases and controls were
inherently so different that one could not help but find a difference.
One example, would be in studying the nutritional status of children and
an author took a group from the innercity of Brooklyn, New York and a
second group from Bloomfield Hills, Michigan. This would show differences
alright, but would make sense only to a legislative initiative, not a medical
intervention.
Applied to this study, the selection bias is really
only evident in that which is described above. I did not catch any other
error.
f. Admission Bias : I put this in because
it is called Berkson's Paradox of Berkson's Fallacy in the literature.
The bias essentially is that certain diseases bring individuals into the
hospital in greater numbers than others. Heart attack as opposed to influenza,
for example. If the heart attack is also related to the outcome of interest
say kidney disease and there is a high probability that the two diseases
occur together, then the chances of finding the kidney disease in the hospital
is increased. It is increased just because it can get in two ways: by itself
and riding on the coattails of a jointly occurring disease. Thus the relationship
between exposure and outcome is distorted since if the heart attacks serve
as controls, they also have kidney disease.
Applied to this study, the status of a UTI is the
only criteria for admission to the study since complicating factors are
eliminated (p. 2525). Thus a fair admission rate into the study for both
controls and cases is assured.
5. You have two choices explain to him in terms of
risk or in terms of odds. First, you have to assume that the patient's
operation is going to be identical to others that the surgeon has done
in the past. This of course rests on your skill at doing a good History
and Physical examination. The information gleaned from the H&P will
let you know how similar the patient is to those the surgeon has worked
on previously. Then from a statistical point of view the problem takes
on the following form:
SURGICAL HISTORY 


Surgeries performed by surgeon on patients like your patient 


Thus if you chose the RISK approach you would say
your chances are 50/80 or about 63% (actually 62.5%). If you chose the
ODDS approach you would say 50/30 or about 5 to 3. The patient would in
all likelihood understand the RISK approach a little better (unless s/he
plays the horses). But what you can see here is that we are talking high
risk stuff here. Your chances are only a little better than 50%. Does the
patient really need surgery?
6a,b. The authors cite a RETROSPECTIVE study as the
basis for their current work. This makes sense and is not an unusual occurrence.
Recall that while child abuse is an abhorrent event it is fortunately rare.
The use of a retrospective technique gives at least a clue as to whether
certain variables occur together. If they do then further study might be
indicated. The retrospective study is quicker and less expensive than a
prospective study and thus is a reasonable place to start.
6c. The study population is from the INTENSIVE CARE
UNIT of a hospital. This hospital draws from a wide geographic county.
The authors cite that the distance one has to travel often impedes the
amount of interaction between the infant and family. Since this is a regional
center it suggests that there are not that many intensive care units for
infants (neonatal intensive care units) or that these kids are really sick!
Indeed, children were excluded from the study if they rapidly recovered
or died. Thus the prolonged stay and the effect it might have on their
constituted family might make this a tough population from which to generalize.
Children who rapidly recovered were not included, these kids may be a better
group from which to make generalizations, since most infants DO NOT go
to an ICU in the general population. The explanation of the 21 families
who were in the unit but not long enough for contact to be made is a curiosity
to me. How long do they have to be there? The authors then proceed to enroll
the remaining families into the study. This would preclude any sampling
bias. FROM AN ETHICAL point of view the authors should be commended for
providing the full hospital services to all families. THE TEACHING POINT
is that either the STUDY population may be a problem or the SAMPLING may
be a problem or both. Attention must be paid to each one. Here the Sampling
is OK but the Population may be suspect.
6d. The authors report data on all 255 families.
Thus there is no loss to followup that I can find.
6e. The scale is: 0 = absent
1 = present to some degree
2 = strongly present
This is an example of an ORDINAL SCALE.
6f. The dependent variable is THE INCIDENCE OF REPORTED
MALTREATMENT of BABIES AFTER APPROXIMATELY ONE YEAR POST DISCHARGE FROM
HOSPITAL.
Although I didn't ask the independent variable is
the INVENTORY SCORE of the family.
6g. The authors have, as they say in the game, operationally
defined the incidence of maltreatment as reports made to the local department
of social services. I want you to go further than this however in your
analyses to understand WHAT was reported. The evidence consists of: serious
physical abuse  2 cases; and neglect  8 cases. Of these 8 neglect cases
these were failure to treat chronic or acute medical problems and failure
to comply with minimal well child care (i.e. immunizations, etc.). BUT
THE MOST FREQUENT complaint was inadequate parental supervision.
6h. There is SURVEILLANCE BIAS. The study involved
the department of social service immediately after the family inventory
was administered and the family was identified as high risk as well as
a special effort to make hospital support services available to them. These
families are in a sense marked. Since followup visits are done by social
service, the instances of abuse are more easily found since they are looking
for it. I have little qualms about the detection of physical abuse since
any child who has these problems will most times come to the attention
of medical personnel. My concern is with the nonphysical abuse instances
of what has been termed maltreatment, in particular, lack of adult supervision.
It is conceivable that the unreported group has left the children alone
as well and maybe as frequently. IT IS JUST THAT NO IS LOOKING IN ON THEM
TO CHECK. This is what I mean by surveillance bias. The high risk
group is scrutinized more carefully than the "comparison group". A way
around this would be to take a random sample of the unreported families
and visit them on a regular basis to see if there are instances of unreported
child abuse.
6i. The term incidence is used correctly. The children
are all "abuse free" since they have just been born. They will BECOME
abused. They will thus be NEW cases of abuse.
7a. Retrospective study.
Acne Present  Acne Absent  
Eat Breakfast  20  50 
Did not Eat Breakfast  60  110 
80  160 
7b. Prospective study
Acne Present  Acne Absent  
Eat Breakfast  20  50  70 
Did not Eat Breakfast  60  110  170 
7c. The rules for interpreting the ratios are the
same. Here both are less than 1.0. This suggests that there is a protective
effect operating. Prospectively then, your chances of getting acne are
less if you eat breakfast relative to not eating breakfast. Retrospectively,
one would interpret this as among those who had acne they tended not to
eat breakfast.
8. A) Scientific Hypothesis: The proportion of alcoholics,
compared to nonalcoholics, die following accidents in which severe burns
are sustained.
Independent Variable: Alcoholic status (alcoholic vs. nonalcoholic)
Dependent Variable: Death (Yes/No)
One thing that is not described is the seriousness
of the burn. Were the alcoholics and nonalcoholics matched on severity
of burn? What is the comorbid (accompanying illnesses) that present with
each patient? These are some of the things that must also be considered.
Onetailed vs. Twotailed: Because of the assumptions,
the authors would want to argue a onetailed test is warranted.
B) Scientific Hypothesis: A relationship exists between
parental smoking (smokers vs. nonsmokers) and number of colds per year
in nonsmoking teenagers.
Independent Variables: Smoking status of parents (smoker vs. nonsmoker)
Dependent Variables: Number of colds in a 12 month
period.
Onetailed vs TwoTailed: Twotailed. A relationship
is suggested but no direction is stated. For example, it is not hypothesized
that nonsmoking teens whose parents both smoke will have a GREATER number
of colds per year than nonsmoking teens whose parents are nonsmokers.
Question: What if only one parent smokes? What if there is smoking, but it is not done in the house?
Suggested Answers to Homework Set 4
1.
I. (a+c) / (a+b+c+d)
ii. a / (a+c)
iii. d / (b+d)
iv. b
v. c
vi. d / (c+d)
vii a / (a+b)
2. This means that of the people who test positive
on you lab test, 77% of them actually have the disease. You can infer from
this that 23% will be falsely positive.
3. With a high NPV I would probably conclude that
the patient DOES NOT HAVE the disease. I must recognize that I could be
wrong 10% of the time, since that it the percentage of falsely negative
tests.
4.
MI  



CK 












In brief, 1) Prevalence x Sample size = .64 x 360
= 230
2) Sensitivity x 230 = .93 x 230 = 215
3) Appropriate subtraction yields 15 for the false
negatives and 130 for the number of disease free people in the CCU.
4) Specificity x 130 = .88 x 130 = 114.
5) Subtract for the false positives.
a) The PPV is equal to
The NPV is equal to
b) Given these high values I would feel comfortable
concluding that if the test result is POSITIVE the patient is having an
INFARCTION; if the test result is NEGATIVE I would feel comfortable concluding
the patient DOES NOT have an infarction.
5) Notice that the Prevalence has dropped considerably.
Since SENSITIVITY AND SPECIFICITY are INDEPENDENT of the prevalence I should
not expect them to change and thus they will be same for my calculations.
The PPV and NPV are absolutely dependent on the prevalence and this is
the major teaching point. Upon recalculation:
PPV is 46%
NPV is 99%
Thus, for the general population of this hospital
I would be on extremely shaky grounds concluding that a person has an MI
based solely on a positive CPK. However, for this population, with this
prevalence if the patient had a NEGATIVE test I would conclude that the
there is NO MI. This is where you must study an article carefully. If the
population has a prevalence different from yours, the PPV and NPV is going
to change. Are your chances significantly improved for correctly diagnosing
a medical problem based on the LAB? If no, don't run the test. It contributes
little to your knowledge base and runs up the medical costs.
6a. A 6b. A 6c. B 6d. A 6e. F 6f. E
Index
ADJUSTED FOR 19
Association
Causation 82
Correlation 82
Associations 82
Applications 83
Attack Rate 14
Biases 67
Birth rate 14
Case 7
Case Fatality 9
CaseControl Studies 65
CASECONTROL 69
Problems 66
Reasons for 66
Causespecific death rate 16
Cumulative Incidence or Risk 5
Cumulative incidence 6
Death Rate 14
Dependent Variable 19, 38, 40
consequent 38
outcome 38
predicted 38
result 38
Descriptive Statistics 42
COEFFICIENT OF VARIATION 46
Measures of Central Tendency 42
Measures of Dispersion 44
STANDARD DEVIATION 45
Epidemiology 4
Associations 82
biases 67
Case Fatality 9
CaseControl Studies 65
Cumulative incidence 6
Cumulative Incidence or Risk 5
Incidence 4
Incidence Rate 4
Negative Predictive Value 32
Prevalence 4
Prospective Studies 62
Sample Size 85
Sensitivity 30
Specificity 30
Statistical Association 68
Statistical Testing 48
Survival 8
Fertility rate 14
Fetal death RATIO 15
Fetal Death rate 15
Hypotheses 50
ALTERNATIVE HYPOTHESIS 50
NULL HYPOTHESIS 50
Incidence 4
Incidence Rate 4, 6, 8, 14
Independent Variable 38, 40
precursors 38
predictors 38
Infant Mortality Rate 15
Levels of Measurement 39
Interval 39
Nominal 39
Ordinal 39
Low birth weight ratio 14
Measures of Central Tendency 42
MEAN 44
MEDIAN 42
MODE 42
Measures of Dispersion 44
Range 44
Variance 45
Morbidity 14
Mortality 14
Natality 14
Negative Predictive Value 32, 34
Null Hypotheses 52
Null Hypothesis
Decision Tree 91
Expected Number of Observations 60
OUTCOME 38
Period prevalence ratio 15
Point prevalence ratio 15
Positive Predictive Value 31, 34
Predictive Value Negative 34
Predictive value positive 34
Prevalence 6, 31, 34
Prevalence (Point) 4
Proportion 17
Proportionate mortality ratio 15
Prospective Studies 62
Advantages 63
Cohort 63
Disadvantages 63
Followup 63
PROSPECTIVE 68
PROSPECTIVE STUDY 64
Rate 17
Ratio 17
Recall Bias 67
RISK 68
RULE IN 34
RULE OUT 34
Sample Size 85
Power 85
Sensitivity 30, 31, 34
Specificity 30, 31, 34
Statistical Association 68
Statistical Relationship 68
Statistical Testing 48
ChiSquared Statistic 61
Confidence Intervals 53
Decision Making 49
How 53
Hypotheses 50
PValue 52
Sampling 49
t value 59
Study Population 64
Surveillance bias 64
Survival 8
Validity 30
Variable 38
Dependent Variable 38
Independent Variable 38
Please fill in the information requested. The
answers will be used for demonstration purposes in the next few meetings.
1. Initials (First, Last) ___ ___
2. Age at last birthday (in years, please) ____
3. Sex ___ M ___ F
4. Height (in inches, please) _____
5. Weight (in pounds, please) _____
6. What year do you plan to graduate ___ 2000
___ 2001 ___ 2002 ___ Grad Student
7. Which major division of the medical field do
you think you want to enter, at this time? (Does not apply to Graduate
Students)
___ Medicine ___ Surgery ___ Don't Know
8. Please rank order the following expectations
of physicians held by a sample of American people; 1 being the highest,
followed by 2, 3, etc. (This is from a national survey entitled  A Report
Card on Americans' Primary Care Physicians)
A. ___ Be knowledgeable and competent
B. ___ Have a friendly personality
C. ___ Counsel patients on steps they could take
to enjoy good health
D. ___ Really care about a patient's health