A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
A |
|
alphanumeric or string variables |
Chapter 2 |
|
Alphanumeric variables are variables whose
values may be stored as letters, digits or other characters or a combination
of them. Alphanumeric variables are also known as string variables and
are not available for arithmetic operations in SPSS.
|
|
antecedent |
Chapter 9 |
|
An antecedent variable is one that has
a causal influence on another variable.
|
|
association. |
Chapter 9 |
|
A relationship between two categorical
variables. Two variables are associated when the proportion in each
category of one variable differs according to the categories of the
other.
|
|
asymmetric |
Chapter 9 |
|
A characteristic of a measure of association.
A measure is asymmetric when the value it takes depends on which variable
is considered to be the independent and which the dependent
variable.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
B |
|
backward elimination |
Chapter 12 |
|
A procedure for exploratory analysis that
successively deletes variables from the regression equation until there
is a significant reduction in the variance explained.
|
|
bar chart |
Chapter 4 |
|
A graphical method appropriate for nominal
and ordinal variables where the proportion in each category of
a variable is represented as a vertical or horizontal bar.
|
|
bivariate analysis |
Chapter 8 |
|
Statistical analysis involving two variables
|
|
bivariate relationship |
Chapter 1 |
|
A statistical relationship between two
variables.
|
|
boxplot |
Chapter 6 |
|
An exploratory data analytic method to
display the distribution of a single variable where the box is defined
by the upper and lower quartiles and the 'whiskers' extend
to the highest (or lowest) values that are not outliers. Sometimes
known as a box and whisker plot.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
C |
|
categorical variable |
Chapter 1 |
|
Both nominal and ordinal
variables are categorical variables whose attributes have simply been
categorised. For instance sex is a categorical variable where respondents
have been classified or categorised as either male or female.
|
|
cell |
Chapter 9 |
|
The 'hole' into which a number (a frequency,
proportion or percentage) is put when constructing a table
|
|
census |
Chapter 10 |
|
A survey of every case in a population
|
|
Central Limit Theorem |
Chapter 10 |
|
The theorem states that the distribution
of the means of numerous samples all taken from the same distribution
tends to the normal curve (the more samples that are included,
the better the approximation to a normal curve). The theorem
holds regardless of the form of the distribution from which the samples
are taken.
|
|
chi squared |
Chapter 11 |
|
A test statistic used when the data consists
of counts.
|
|
cluster analysis |
Chapter 12 |
|
A form of analysis that groups variables
(or cases) into clusters according to their similarity (the correlations
between variables, or common patterns of values for cases).
|
|
codebook |
Chapter 3 |
|
A document that lists the dictionary information
about all the variables in a data file. This usually includes the original
question text, the SPSS variable names if appropriate,
and the value labels for each coded response. A codebook may
also contain information about derived variables and notes given
to interviewers or coders when preparing the data file.
|
|
coding |
Chapter 2 |
|
The process by which numbers are ascribed
to responses to a survey questionnaire in preparation for computer analysis.
For example, to the question "Are you male or female?", male may be
coded with the number 1 and female with the number 2.
|
|
coefficient of determination |
Chapter 8 |
|
See R square
|
|
column percentage |
Chapter 9 |
|
A percentage calculated by finding the
count in a cell compared with the total count for the column (the column
marginal count). For example, if there are 530 unemployed women
in a sample including 1192 women in all, the column percentage of unemployed
women is 530/1192 %, or 45%.
|
|
concordant |
Chapter 9 |
|
If there are two cases, A and B, and case
A has a higher value on one variable than case B, and case A also has
a higher value on another variable than case B, the two cases are concordant
for the two variables.
|
|
confidence interval |
Chapter 10 |
|
The range within which it can be inferred
that a population mean lies, with some specified degree of confidence.
For example, the 95 per cent confidence interval is the range within
which we can be confident that the population mean can be found. It
is equal to the sample mean plus or minus 1.96 standard errors.
|
|
confidence level |
Chapter 10 |
|
The probability that a population mean
lies within an interval. For example, at the 95 per cent confidence
level, the population mean lies within plus or minus 1.96 standard
errors of the sample mean (see confidence interval).
|
|
confirmatory analysis |
Chapter 12 |
|
A statistical analysis that aims to test
a pre-specified hypothesis or model.
|
|
contingency table |
Chapter 9 |
|
A table of two or more variables cross-classified,
consisting of cells showing the number of cases in each combination
of categories from the variables. Also called a cross-tabulation.
|
|
continuous variables |
Chapter 1 |
|
A continuous variable is one which can
be measured at any point on a continuous scale. For instance age may
seem to be a discrete variable because it is usually measured
to the nearest whole year. However, since time is a continuum, age can
in principle be measured down to a fraction of a second.
|
|
control variable |
Chapter 9 |
|
A variable that specifies which are the
partial tables in a tabulation of three variables.
|
|
Cook's distance |
Chapter 12 |
|
A measure of the extent to which an outlier
has an influence on a regression models' coefficients.
|
|
correlation coefficient |
Chapter 8 |
|
A measure of association for continuous
variables obtained by dividing the covariance by the product
of the standard deviations of the two variables. (So for variables
X and Y, the product of the standard deviations is sx
times sy.)
|
|
count |
Chapter 9 |
|
The number of times cases with a particular
combination of attributes occurs in a dataset. For example, the count
of unemployed men within a dataset might equal 421.
|
|
covariance |
Chapter 8 |
|
A measure which indicates how the values
of two continuos variables vary together. Obtained by dividing the sum
of the cross-products by one less than the number of cases.
|
|
Cramer's V |
Chapter 9 |
|
A measure of association appropriate
for measuring the strength of a relationship between two variables,
one or both of which has more than two categories.
|
|
critical region |
Chapter 11 |
|
The region of the sampling distribution
between the critical value and positive infinity (or between
the positive critical value and positive infinity, plus the region between
the negative critical value and negative infinity for a two-tailed test
- see Exhibit 11.6)
|
|
critical value. |
Chapter 11 |
|
The value (a number of standard errors)
that defines the boundary of the critical region. If the test
statistic falls within this region, the null hypothesis is assumed
to be false.
|
|
cross-product. |
Chapter 8 |
|
Obtained by multiplying the deviations
from the means of pairs of values of two variables. The cross-products
are obtained as a first step in the calculation of the covariance.
|
|
crosstabulation |
Chapter 4 |
|
A table of the joint frequency distributions
of two nominal or ordinal variables. Also called a contingency table.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
D |
|
data matrix |
Chapter 1 |
|
The name given to the column by row organisation
of numerical responses which results from coding survey questionnaires.
The rows correspond to the cases and the columns contain the responses
to each variable.
|
|
deduction |
Chapter 1 |
|
A method of analysis that proceeds by
formulating a theory and testing it with data.See induction for
an alternative approach.
|
|
degrees of freedom |
Chapter 11 |
|
The number of values free to vary in the
calculation of a statistic. For example, if we have 4 cells in
a table and we know the marginal frequencies, we would need to
know at least one value in order for the other three values to be completely
determined. Such a 2x2 table has one degree of freedom.
|
|
dependent variable |
Chapter 6 |
|
A dependent variable is a variable whose
values are predicted by another variable or variables, the independent
variable(s). The dependent variable is usually the subject of primary
interest in a study.
|
|
derived variable |
Chapter 3 |
|
A derived variable is one that has been
created out of the responses from the original variables in the datafile.
Thus one may derive the variable agegroup from data collected about
age.
|
|
dichotomous |
Chapter 9 |
|
A variable with only two categories, such
as male and female, is called dichotomous
|
|
discordant |
Chapter 9 |
|
If there are two cases, A and B, and case
A has a higher value on one variable than case B, and case A has a lower
value on another variable than case B, the two cases are discordant
for the two variables. They are also discordant if case A has a lower
value on one variable while case B has a higher value on the other.
|
|
discrete variable |
Chapter 1 |
|
A discrete variable is one which can only
be counted in whole numbers. For instance, number of people
in a family; number of cars in a household. For comparison, see continuous
variable.
|
|
dummy variables. |
Chapter 12 |
|
A dichotomous variable used in
a regression equation. Categorical variables with more
than two categories need to be converted into a set of equivalent dummy
variables (with one less dummy variable than there are categories) in
order to include them in the regression model.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
E
|
|
elaboration |
Chapter 9 |
|
A method for exploring and testing ideas
about causal relationships between three or more variables It involves
examining tables of two variables, controlling for a third.
|
|
expected |
Chapter 9 |
|
For a table of two variables, the count
which would be obtained if there were no association between
the two variables.
|
| |
|
expected counts |
Chapter 11 |
|
The counts that would be obtained if a
null hypothesis were true (usually the hypothesis is that there
is no association between two variables).
|
|
explanation |
Chapter 1 |
|
An independent variable explains a dependent
variable if knowledge of the value of the independent variable provides
good predictions of the value of the dependent variable for all cases.
|
|
exploratory analysis |
Chapter 12 |
|
A statistical analysis that aims to discover
theoretically interesting hypotheses or models describing the data.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
F
|
|
factor analysis |
Chapter 12 |
|
A form of statistical analysis that is
built on the assumption that the data were generated by some small number
of unmeasured latent variables that in combination created the measured
variables. The analysis recreates the latent variables as 'factors'.
|
|
falsification |
Chapter 11 |
|
The idea that while it is impossible to
prove that a hypothesis is true, it is possible to show that a hypothesis
is false (since only one piece of evidence that is counter to the hypothesis
is sufficient to show that it is false).Acceptance of a theory only
means that we have not yet disproved it.
|
|
fitted values |
Chapter 8 |
The fitted or expected values are obtained
by substituting values for the independent variables into the regression
equation. For example, if the regression equation is Y = 1 +3X, substituting
a value of 2 for X will give a fitted value for Y of 7 (i.e.1 + 3 times
2 ). Denoted as ,
pronounced y hat.
|
|
forward selection |
Chapter 12 |
|
A procedure for exploratory analysis that
successively adds more variables to the regression equation until there
is no further significant improvement in the variance explained.
|
|
frequency distribution |
Chapter 3 |
|
A frequency distribution is a table of
frequencies of occurrence of each value of a variable. In SPSS
9.0 a frequency distribution is obtained by selecting Analyse, Descriptive
statistics Frequencies... and usually includes a percentage calculation
for each value.
|
|
frequency polygon |
Chapter 4 |
|
The graphical representation of the distribution
of an interval or ratio variable where the midpoint of each interval
is represented by a marker which is then joined by a line to the next
midpoint marker.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
G
|
|
gamma |
Chapter 9 |
|
A measure of association appropriate
to variables where one or both are measured at the ordinal level
of measurement.
|
|
Goodman and Kruskal's tau, |
Chapter 9 |
|
An asymmetric measure of association
between two variables appropriate when one variable is considered to
be the cause and the other the effect.
|
|
grouped data. |
Chapter 5 |
|
Data which has been recoded or
grouped into fewer categories than when originally collected. For instance,
age, measured on an interval scale, could be grouped into agegroups
in order to create a histogram.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
H
|
|
histogram |
Chapter 4 |
|
The graphical representation of the distribution
of an interval or ratio variable where the frequency of occurrence of
each interval is represented by the height of a bar.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
I
|
|
independent variable |
Chapter 6 |
|
An independent variable is one that predicts
the values of another variable, the dependent variable. Sometimes
called a predictor variable.
|
|
indicator |
Chapter 1 |
|
A method intended to measure a concept.
For instance, a common indicator of the concept of social class is a
person's occupation. Since class is not something we can measure directly,
an indicator has to be used. See reliability and validity.
|
|
induction |
Chapter 1 |
|
A method of analysis that derives theories
by generalising from evidence, usually from a large number of cases.
See deduction for an alternative approach.
|
|
inner fence |
Chapter 6 |
|
In a boxplot, the inner fences
are the boundaries of the main body of the data beyond which lie the
outliers. It is positioned at 1.5 times the IQR above
or below the upper and lower quartiles respectively.
|
|
inter-quartile range |
Chapter 5 |
|
The value of the upper quartile
minus the value of the lower quartile. Denoted by IQR
|
|
interaction |
Chapter 9 |
|
See specification.
|
|
interval variable |
Chapter 1 |
|
An interval variable is one which categories
may not only be ranked or ordered but the distance between the categories
is precisely defined e.g. salary, age. See level of measurement
|
|
intervening |
Chapter 9 |
|
A variable is intervening if another variable
has an effect on it and it in turn affects a third variable.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
K
|
|
kurtosis. |
Chapter 5 |
|
A property of a distribution which reflects
the 'peakedness' of the plotted curve. A tall, peaked distribution is
called leptokurtic while a flat, plateau-like shaped distribution is
called platykurtic. A symmetrical curve is called mesokurtic.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
L
|
|
level of significance |
Chapter 11 |
|
The probability that the research outcome
could have happened by chance
|
|
level of measurement |
Chapter 1 |
|
The name given to the classification scheme
which distinguishes the relationships between the categories or attributes
of a variable. See nominal, ordinal, interval and ratio
variables.
|
|
leverage |
Chapter 12 |
|
A measure of the extent to which an outlier
is both distant from the regression line and has a large value
on an independent variable.
|
|
logistic regression |
Chapter 12 |
|
A type of regression used when
the dependent variable is categorical .
|
|
longitudinal data |
Chapter 9 |
|
Data collected from the same people over
a period of time.
|
|
lower quartile |
Chapter 5 |
|
The value of the category which defines
the upper boundary of the bottom 25 per cent of cases when they are
arranged in rank order. Denoted by Q1.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
M
|
|
marginal |
Chapter 9 |
|
The sum of the counts for a particular
category of a variable. These sums are often placed in the right-most
column or bottom row of a table and are therefore in the margins of
the
|
|
mean |
Chapter 5 |
A measure of central tendency appropriate
for interval and ratio variables. It is the arithmetic average of the
values of a distribution. Denoted by
or µ.
|
|
measure of central tendency |
Chapter 5 |
|
A single statistic that summarises the
distribution of one variable. See mode, median and mean.
|
|
measure of association |
Chapter 8 |
|
A statistic which measures the degree
of association or relationship between two variables. Examples include
phi for nominal variables and the correlation coefficient
for interval variables.
|
|
measures of dispersion |
Chapter 5 |
|
Statistics that describes the spread of
a distribution or how the values of a distribution are scattered around
the mean. The simplest is the range, but see also variance
and standard deviation.
|
|
median |
Chapter 5 |
|
A measure of central tendency,
most appropriate for ordinal variables. It is the value of the
category that occurs in the middle of a ranked distribution. Also known
as the 50th percentile.
|
|
missing value |
Chapter 3 |
|
Values are assigned as missing values
when the researcher wishes to exclude that value from statistical analysis.
For instance, some respondents may have refused to answer a question
about their age. Every response must be given a code, so you may decide
to code this 'no response' with the number 999. However, you would want
to ensure that this code is not treated as a valid age response. If
it was, the average age for the sample would be incorrect.
|
|
mode |
Chapter 5 |
|
A measure of central tendency,
most appropriate for nominal variables. It is the value or label of
the most frequently occurring category.
|
|
model |
Chapter 12 |
|
A theory that proposes relationships between
two or more variables.
|
|
multi-dimensional scaling |
Chapter 12 |
|
A form of analysis that positions points
(representing variables and cases) according to their similarity, with
the most similar being nearest to each other.
|
|
multi-variate analysis. |
Chapter 8 |
|
Statistical analysis involving three or
more variables.
|
|
multiple regression analysis |
Chapter 8 |
|
Statistical methods concerned with explaining
or predicting the variability of a continuous dependent
variable using information from two or more continuous independent
variables.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
N
|
|
nominal variable |
Chapter 1 |
|
The categories of a nominal variable bear
no relationship to one another, other than that they are different.
For example, marital status is a nominal variable with the categories,
married, single, divorced etc. Nominal variables are measured at the
lowest level of measurement.
|
|
normal distribution |
Chapter 7 |
|
A theoretical, continuous probability
distribution where the horizontal axis plots all possible values of
the variable and the vertical axis is the frequency of occurrence of
each value.
|
|
normal curve |
Chapter 7 |
|
A symmetric, bell-shaped curve describing
a normal distribution.
|
|
null hypothesis |
Chapter 11 |
|
The converse of a working hypothesis:
if the null hypothesis is found to be false, this can be taken as indirect
support for the working hypothesis.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
O
|
|
observed counts |
Chapter 11 |
|
The counts obtained from the data.
|
|
one-sample test |
Chapter 11 |
|
A test comparing a sample statistic with
a population parameter.
|
|
one-tailed test |
Chapter 11 |
|
An inferential test in which the null
hypothesis will be rejected if the test statistic is greater
than the critical value.
|
|
ordinal variable |
Chapter 1 |
|
The categories of an ordinal variable
can be ordered or ranked but the distance between the categories is
unknown. For instance, educational level is an ordinal variable ranging
from none through higher. How close one category is to another is unknown.
See nominal and interval levels of measurement.
|
|
Ordinary Least Squares [OLS] |
Chapter 8 |
|
The statistical procedure used in regression analysis to arrive
at the 'best fitting' regression line. It is a mathematical method that
ensures that the squares of the deviations from the regression line
are minimised.
|
|
outer fences |
Chapter 6 |
|
In a boxplot, the outer fences define the boundary beyond which
lie the extreme outliers at 3.0 times the IQR above and
below the upper and lower quartiles.
|
|
outlier |
Chapter 6 |
|
A case with an extremely high or extremely low value compared with
the rest of a distribution. An outlier has a value more than 1.5 times
the IQR above the upper
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
P
|
|
partial correlation coefficient |
Chapter 12 |
|
The correlation coefficient between two variables when other
variables have been controlled.
|
|
partial regression coefficient |
Chapter 12 |
|
A regression coefficient describing the relationship between
a dependent and an independent variables obtained when
controlling for one or more other independent variables.
|
|
partial table. |
Chapter 9 |
|
A table that displays counts for part of a dataset, for example, just
for the men.
|
|
percentile |
Chapter 5 |
|
The Nth percentile is the value of the category
that occurs N per cent through a ranked distribution.
|
|
phi |
Chapter 9 |
|
A measure of association appropriate for measuring the strength
of a relationship between two dichotomous variables.
|
|
pie chart |
Chapter 4 |
|
A graphical method appropriate for nominal and ordinal variables where
the proportion in each category of a variable is represented as a segment
of a circle.
|
|
pooled variance t test |
Chapter 11 |
|
A t test for two samples that uses the variance estimated from all
the cases in both samples. A pooled variance t test is appropriate when
both samples are drawn from the same population.
|
|
population, |
Chapter 11 |
|
The set of all those to whom the research hypothesis is assumed
to apply (a synonym for universe).
|
|
pre-coded |
Chapter 1 |
|
A question in a survey questionnaire is pre-coded when a number has
been assigned to every possible response in advance of the survey. See
coding.
|
|
prediction |
Chapter 1 |
|
On discovering an association or relationship between two or
more variables in a sample, a prediction may be made about how
other cases in the population, that were not measured, may behave under
the same conditions.
|
|
primary analysis |
Chapter 1 |
|
Analysis of data collected by the researcher. See secondary analysis
|
|
proportional reduction in error |
Chapter 9 |
|
The increased accuracy (i.e. reduction in error) in predicting the
characteristics of sample on one variable that one obtains if the values
of a second variable are known, compared with not knowing the second
variable. For example, three-quarters of all those in work, work full-time
(see Exhibit 9.11). Knowing only this, one can predict that there is
a chance of 3 in 4 that any person in the sample is a full-timer. The
chances of error in the predication are reduced if one also knows that
a person is female, because we know that, for women, 55 per cent are
in full-time work. The proportional reduction of error is related to
the association between the two variables.
|
|
pseudo-random number |
Chapter 10 |
|
A number generated from a complex formula designed for the purpose,
usually by computer, which has the properties of a random number
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
Q
|
|
quartiles |
Chapter 5 |
|
The values of the categories of a ranked distribution which divides
it into four equal parts each containing 25% of the cases. See lower
and upper quartiles
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
R
|
|
R square |
Chapter 8 |
|
A measure that describes how well a regression line fits the scatter
of data points. It describes the proportion of variance of the
dependent variable that is 'explained' by the independent
variable. In simple regression, it is the square of the correlation
coefficient, r. Also known as the coefficient of determination.
|
|
random number seed |
Chapter 10 |
|
The starting value for a generator of pseudo-random numbers.
The stream of pseudo-random numbers from the generator is always the
same for the same starting seed.
|
|
random sample |
Chapter 10 |
|
A sample in which the cases are selected from the population
at random and in which every case has a chance of being selected
|
|
range |
Chapter 5 |
|
The difference between the lowest and highest values of a distribution.
|
|
ratio variable |
Chapter 1 |
|
A ratio variable is one that can be measured on an interval
scale and also has a meaningful zero. For instance, income is a ratio
variable because it is possible to have a zero income. See level
of measurement.
|
|
real class intervals |
Chapter 3 |
|
When grouping data into more convenient categories, it is usual to
declare real class intervals that are one level of precision greater
than the original data. This ensures that every possible value can be
accounted for in the new coding scheme. For example, if age data have
been collected to the nearest whole year, the real class intervals would
be reported to one decimal place. For example, the stated interval of
10 to 19 years would be expressed as a real class interval of 9.5 to
19.5.
|
|
real class limits |
Chapter 3 |
|
Real class limits are the upper and lower values that contain a real
class interval.
|
|
recoding |
Chapter 2 |
|
Recoding in SPSS is the procedure used to change or regroup
the numeric codes of a variable. For example, if age is coded using
respondents' ages in years and you wish to group individuals into those
above and below the age of 40, you could recode age so that codes from
0 through 40 are changed to code 1 and all other codes are changed to
code 2.
|
|
regression equation |
Chapter 8 |
|
An equation that describes the relationship between a dependent
variable (Y) and one or more independent variables (X). In
its simplest form, written: Y= a + bX where a and b are the regression
coefficients.
|
|
regression line |
Chapter 8 |
|
The regression line is a graphical representation of the regression
equation. It summarises the relationship between a continuous
dependent variable and one independent variable using the
OLS criterion to obtain the best fitting line. The regression
line may be obtained by substituting two values of X into the regression
equation, thus obtaining two pairs of X,Y coordinates
|
|
regression coefficient |
Chapter 8 |
|
Part of the regression equation which describes the extent of
the relationship between the dependent and an independent
variable. In the regression equation Y=a +bX , a, the constant, is the
intercept on the Y axis and b is a measure of the slope. The standardised
b coefficient is called the beta coefficient.
|
|
regression plane. |
Chapter 12 |
|
The analogue of a regression line when there are two independent
variables. The plane plots in three dimensional space the expected values
of the dependent variable for all values of the two independent
variables.
|
|
reliability |
Chapter 1 |
|
Reliability is concerned with whether the indicator we use to measure
a concept gives the same answer each time it is used. See validity.
|
|
representative sample |
Chapter 10 |
|
A sample in which cases are included in proportion to the number
in the population that resemble them
|
|
research hypothesis |
Chapter 11 |
|
A proposition to be investigated that can be assessed as either true
or false, expressed in terms of theoretical concepts
|
|
residual |
Chapter 8 |
|
In regression, residuals are obtained by subtracting the fitted
values from the data values. The residuals are also called deviations
from the fitted values.
|
|
resistant measure |
Chapter 5 |
|
A statistic that is less likely to be affected by a few extreme high
or low values (outliers) in a distribution. The median is a resistant
measure which is not affected by extreme values. The mean is
not a resistant measure.
|
|
rounding |
Chapter 3 |
|
A method of simplifying numbers to fewer significant figures than in
the original number. There are several ways to round numbers but the
most common is to round down a number ending in 1 through 4 and round
up numbers ending in 5 through 9. For example, 41.4 rounds down to 41
and 41.5 rounds up to 42.
|
|
row percentages |
Chapter 9 |
|
A percentage calculated by finding the count in a cell compared with
the total count for the row (the row marginal count). For example,
if there are 530 unemployed women in a sample including 648 unemployed
people in all, the row percentage of unemployed women is 530/648 %,
or 82%.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
S
|
|
sample |
Chapter 1 |
|
A set of people selected from a population, usually with a random
method that ensures that everyone has an equal chance of selection.
|
|
sample distribution |
Chapter 10 |
|
The frequency distribution of an empirical sample. Not to be confused
with sampling distributions.
|
|
sampling distribution |
Chapter 10 |
|
A theoretical frequency distribution of any statistic calculated for
a sample. For example, the sampling distribution of the mean could be
generated by collecting an infinite number of similar size samples and
calculating the means for each. These means would then form a distribution
which could be plotted and would form a normal curve. Sampling
distributions are the basis of inferential statistics.
|
|
sampling error |
Chapter 10 |
|
The difference between an estimate derived from a sample and
the true value measured in the population
|
|
saturated |
Chapter 12 |
|
A model that includes all possible effects.
|
|
secondary analysis |
Chapter 1 |
|
Reanalysis of data collected by another researcher or organisation
that may originally have been collected for other purposes. See primary
analysis.
|
|
separate variance t test |
Chapter 11 |
|
A t test for two samples that uses the variance estimated
from the cases in each sample separately. A separate variance t test
is appropriate when the samples are drawn from the different populations.
|
|
simple regression analysis |
Chapter 8 |
|
Statistical methods concerned with explaining or predicting the variability
of a continuous dependent variable using information from
one continuous independent variable.
|
|
skewed distribution |
Chapter 5 |
|
A distribution which has either predominately low values and a few
extreme high values (positively skewed) or one where there are predominately
high values and a few extreme low values (negatively skewed). When plotted,
such a distribution produces a non-symmetric curve where the mean,
mode and median do not coincide.
|
|
specification |
Chapter 9 |
|
If the strength of the association between two variables differs
according the level of a third variable, the third variable specifies
the relationship. Also called interaction.
|
|
SPSS |
Chapter 2 |
|
A computer program used for the management and analysis of social science
data.
|
|
spurious |
Chapter 9 |
|
A relationship between two variables is spurious if it disappears when
a third variable is controlled.
|
|
standard deviation |
Chapter 5 |
|
The square root of the variance
|
|
standard deviation or SD line |
Chapter 8 |
|
The line obtained by plotting the means and standard deviations
of two continuous variables.
|
|
standard error of the mean |
Chapter 10 |
|
The standard deviation of the distribution of sample means
|
|
standard error of the difference |
Chapter 11 |
|
The standard error of a test statistic obtained from comparing two
samples
|
|
Standard Normal Curve |
Chapter 7 |
|
A normal curve after standardisation to z scores.
|
|
standardisation |
Chapter 7 |
|
The procedure to create standard scores (z scores) in order
to compare variables measured in different units.
|
|
stem and leaf diagram |
Chapter 6 |
|
An exploratory data analytic method to show the distribution of a continuous
variable
|
|
stepwise selection |
Chapter 12 |
|
A procedure for exploratory analysis that alternates using forward
selection and backward elimination in order to find the most
satisfactory set of variables to use in a regression equation.
|
|
stub |
Chapter 9 |
|
The left-hand column of a table, in which the labels describing each
row are placed.
|
|
subsample |
Chapter 2 |
|
A subsample is a selection of cases from a sample. For example,
if you remove all the males from a sample of both males and females,
you are left with a subsample of females.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
T
|
|
test statistic |
Chapter 11 |
|
The statistic used to test a null hypothesis. The test statistic
must be one whose distribution is known; common test statistics are
the z score, the t statistic, chi
|
|
transformation |
Chapter 7 |
|
A mathematical procedure employed to change the scale of a variable
from one set of values into another set of values, without changing
their relative order. This procedure is often employed in order to produce
a more linear relationship between two continuous variables. Examples
include transforming proportions into percentages by multiplying each
value by 100 and, in regression analysis, transforming raw scores into
logarithmic scores.
|
|
two-sample test |
Chapter 11 |
|
A test comparing two statistics derived from the characteristics of
different samples
|
|
two-tailed test |
Chapter 11 |
|
An inferential test in which the null hypothesis will be rejected
if it falls into either the critical region above the mean value,
or the critical region below the mean value (see Exhibit 11.6)
|
|
type I error |
Chapter 11 |
|
The probability that a null hypothesis will be rejected although
it is actually true
|
|
type II error |
Chapter 11 |
|
The probability that a null hypothesis will be accepted although it
is in fact false
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
U
|
|
unexplained variance |
Chapter 8 |
|
In regression, unexplained variance is the variance after
the variance 'explained' by the regression is subtracted from the total
variance of the dependent variable. Often called the variance of the
residuals.
|
|
univariate statistics |
Chapter 3 |
|
Univariate statistics are statistics that describe the characteristics
of a single variable.
|
|
universe |
Chapter 11 |
|
The set of all those to whom the research hypothesis is assumed
to apply (a synonym for population)
|
|
upper quartile |
Chapter 5 |
|
The value of the category which defines the lower boundary of the top
25 per cent of cases when they are arranged in rank order. Denoted by
Q3.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
V
|
|
validity |
Chapter 1 |
|
Validity concerns how well an indicator measures the concept it is
designed to measure. For instance, if we wanted to measure height and
all we had was a set of bathroom scales, we would not have a valid indicator
since the scales measure weight. However, the scales are a very reliable
measure since we get the same weight each time. However, if we used
an elastic tape measure to measure height, we may get a different height
each time we use it - it is very unreliable but it is a valid
indicator since it does measure height. See reliability
|
|
value label |
Chapter 2 |
|
Value labels in SPSS are an optional way of explaining the numeric
codes ascribed to each response of a variable. For example, code 1,
indicating those respondents who are married, may be given the value
label 'Married'.
|
|
variable name |
Chapter 2 |
|
Required by SPSS to identify each variable. Variable names may
be up to 8 characters long and must be unique in any one data file.
For example, marital status may be given the variable name marstat.
|
|
variable label |
Chapter 2 |
|
A label that may be created in SPSS to describe each variable.
Variable labels, with a limit of 255 characters, allow you to expand
on the 8-character variable name. The variable label for the variable
marstat could be "Respondent's marital status".
|
|
variable-centred |
Chapter 1 |
|
A type of data analysis concerned with exploring relationships between
variables, rather than exploring relationships between cases.
|
|
variance |
Chapter 5 |
|
A measure of dispersion which is calculated from the average of the
sum of the squared difference of each value from the mean value. A distribution
with a low variance has most values clustered around the mean; a distribution
with a high variance has a wide spread of values around the mean. Denoted
by s2 or 2
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
W
|
|
working hypothesis |
Chapter 11 |
|
A reformulation of a research hypothesis in terms of indicators
rather than concepts
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
X
|
|
X variable |
Chapter 8 |
|
An independent variable in a regression analysis.
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
Y
|
|
Y variable |
Chapter 8 |
|
The dependent variable in a regression analysis
|
A B
C D E
F G H
I J K L
M N O
P Q R
S T U
V W X
Y Z Home
|
|
Z
|
|
z score |
Chapter 7 |
|
Standardisation of a value by subtracting the variable's mean
from the value and dividing by the variable's standard deviation.
A distribution of z scores always has a mean of zero and a standard
deviation of unity.
|