Home

About the book

Glossary

Statistical Tables

Exercises

Datasets

Feedback


A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

A
alphanumeric or string variables Chapter 2
Alphanumeric variables are variables whose values may be stored as letters, digits or other characters or a combination of them. Alphanumeric variables are also known as string variables and are not available for arithmetic operations in SPSS.
antecedent Chapter 9
An antecedent variable is one that has a causal influence on another variable.
association. Chapter 9
A relationship between two categorical variables. Two variables are associated when the proportion in each category of one variable differs according to the categories of the other.
asymmetric Chapter 9
A characteristic of a measure of association. A measure is asymmetric when the value it takes depends on which variable is considered to be the independent and which the dependent variable.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

B
backward elimination Chapter 12
A procedure for exploratory analysis that successively deletes variables from the regression equation until there is a significant reduction in the variance explained.
bar chart Chapter 4
A graphical method appropriate for nominal and ordinal variables where the proportion in each category of a variable is represented as a vertical or horizontal bar.
bivariate analysis Chapter 8
Statistical analysis involving two variables
bivariate relationship Chapter 1
A statistical relationship between two variables.
boxplot Chapter 6
An exploratory data analytic method to display the distribution of a single variable where the box is defined by the upper and lower quartiles and the 'whiskers' extend to the highest (or lowest) values that are not outliers. Sometimes known as a box and whisker plot.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

C
categorical variable Chapter 1
Both nominal and ordinal variables are categorical variables whose attributes have simply been categorised. For instance sex is a categorical variable where respondents have been classified or categorised as either male or female.
cell Chapter 9
The 'hole' into which a number (a frequency, proportion or percentage) is put when constructing a table
census Chapter 10
A survey of every case in a population
Central Limit Theorem Chapter 10
The theorem states that the distribution of the means of numerous samples all taken from the same distribution tends to the normal curve (the more samples that are included, the better the approximation to a normal curve). The theorem holds regardless of the form of the distribution from which the samples are taken.
chi squared Chapter 11
A test statistic used when the data consists of counts.
cluster analysis Chapter 12
A form of analysis that groups variables (or cases) into clusters according to their similarity (the correlations between variables, or common patterns of values for cases).
codebook Chapter 3
A document that lists the dictionary information about all the variables in a data file. This usually includes the original question text, the SPSS variable names if appropriate, and the value labels for each coded response. A codebook may also contain information about derived variables and notes given to interviewers or coders when preparing the data file.
coding Chapter 2
The process by which numbers are ascribed to responses to a survey questionnaire in preparation for computer analysis. For example, to the question "Are you male or female?", male may be coded with the number 1 and female with the number 2.
coefficient of determination Chapter 8
See R square
column percentage Chapter 9
A percentage calculated by finding the count in a cell compared with the total count for the column (the column marginal count). For example, if there are 530 unemployed women in a sample including 1192 women in all, the column percentage of unemployed women is 530/1192 %, or 45%.
concordant Chapter 9
If there are two cases, A and B, and case A has a higher value on one variable than case B, and case A also has a higher value on another variable than case B, the two cases are concordant for the two variables.
confidence interval Chapter 10
The range within which it can be inferred that a population mean lies, with some specified degree of confidence. For example, the 95 per cent confidence interval is the range within which we can be confident that the population mean can be found. It is equal to the sample mean plus or minus 1.96 standard errors.
confidence level Chapter 10
The probability that a population mean lies within an interval. For example, at the 95 per cent confidence level, the population mean lies within plus or minus 1.96 standard errors of the sample mean (see confidence interval).
confirmatory analysis Chapter 12
A statistical analysis that aims to test a pre-specified hypothesis or model.
contingency table Chapter 9
A table of two or more variables cross-classified, consisting of cells showing the number of cases in each combination of categories from the variables. Also called a cross-tabulation.
continuous variables Chapter 1
A continuous variable is one which can be measured at any point on a continuous scale. For instance age may seem to be a discrete variable because it is usually measured to the nearest whole year. However, since time is a continuum, age can in principle be measured down to a fraction of a second.
control variable Chapter 9
A variable that specifies which are the partial tables in a tabulation of three variables.
Cook's distance Chapter 12
A measure of the extent to which an outlier has an influence on a regression models' coefficients.
correlation coefficient Chapter 8
A measure of association for continuous variables obtained by dividing the covariance by the product of the standard deviations of the two variables. (So for variables X and Y, the product of the standard deviations is sx times sy.)
count Chapter 9
The number of times cases with a particular combination of attributes occurs in a dataset. For example, the count of unemployed men within a dataset might equal 421.
covariance Chapter 8
A measure which indicates how the values of two continuos variables vary together. Obtained by dividing the sum of the cross-products by one less than the number of cases.
Cramer's V Chapter 9
A measure of association appropriate for measuring the strength of a relationship between two variables, one or both of which has more than two categories.
critical region Chapter 11
The region of the sampling distribution between the critical value and positive infinity (or between the positive critical value and positive infinity, plus the region between the negative critical value and negative infinity for a two-tailed test - see Exhibit 11.6)
critical value. Chapter 11
The value (a number of standard errors) that defines the boundary of the critical region. If the test statistic falls within this region, the null hypothesis is assumed to be false.
cross-product. Chapter 8
Obtained by multiplying the deviations from the means of pairs of values of two variables. The cross-products are obtained as a first step in the calculation of the covariance.
crosstabulation Chapter 4
A table of the joint frequency distributions of two nominal or ordinal variables. Also called a contingency table.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

D
data matrix Chapter 1
The name given to the column by row organisation of numerical responses which results from coding survey questionnaires. The rows correspond to the cases and the columns contain the responses to each variable.
deduction Chapter 1
A method of analysis that proceeds by formulating a theory and testing it with data.See induction for an alternative approach.
degrees of freedom Chapter 11
The number of values free to vary in the calculation of a statistic. For example, if we have 4 cells in a table and we know the marginal frequencies, we would need to know at least one value in order for the other three values to be completely determined. Such a 2x2 table has one degree of freedom.
dependent variable Chapter 6
A dependent variable is a variable whose values are predicted by another variable or variables, the independent variable(s). The dependent variable is usually the subject of primary interest in a study.
derived variable Chapter 3
A derived variable is one that has been created out of the responses from the original variables in the datafile. Thus one may derive the variable agegroup from data collected about age.
dichotomous Chapter 9
A variable with only two categories, such as male and female, is called dichotomous
discordant Chapter 9
If there are two cases, A and B, and case A has a higher value on one variable than case B, and case A has a lower value on another variable than case B, the two cases are discordant for the two variables. They are also discordant if case A has a lower value on one variable while case B has a higher value on the other.
discrete variable Chapter 1
A discrete variable is one which can only be counted in whole numbers. For instance, number of people in a family; number of cars in a household. For comparison, see continuous variable.
dummy variables. Chapter 12
A dichotomous variable used in a regression equation. Categorical variables with more than two categories need to be converted into a set of equivalent dummy variables (with one less dummy variable than there are categories) in order to include them in the regression model.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

E
elaboration Chapter 9
A method for exploring and testing ideas about causal relationships between three or more variables It involves examining tables of two variables, controlling for a third.
expected Chapter 9
For a table of two variables, the count which would be obtained if there were no association between the two variables.
expected counts Chapter 11
The counts that would be obtained if a null hypothesis were true (usually the hypothesis is that there is no association between two variables).
explanation Chapter 1
An independent variable explains a dependent variable if knowledge of the value of the independent variable provides good predictions of the value of the dependent variable for all cases.
exploratory analysis Chapter 12
A statistical analysis that aims to discover theoretically interesting hypotheses or models describing the data.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

F
factor analysis Chapter 12
A form of statistical analysis that is built on the assumption that the data were generated by some small number of unmeasured latent variables that in combination created the measured variables. The analysis recreates the latent variables as 'factors'.
falsification Chapter 11
The idea that while it is impossible to prove that a hypothesis is true, it is possible to show that a hypothesis is false (since only one piece of evidence that is counter to the hypothesis is sufficient to show that it is false).Acceptance of a theory only means that we have not yet disproved it.
fitted values Chapter 8
The fitted or expected values are obtained by substituting values for the independent variables into the regression equation. For example, if the regression equation is Y = 1 +3X, substituting a value of 2 for X will give a fitted value for Y of 7 (i.e.1 + 3 times 2 ). Denoted as Y HAT symbol, pronounced y hat.
forward selection Chapter 12
A procedure for exploratory analysis that successively adds more variables to the regression equation until there is no further significant improvement in the variance explained.
frequency distribution Chapter 3
A frequency distribution is a table of frequencies of occurrence of each value of a variable. In SPSS 9.0 a frequency distribution is obtained by selecting Analyse, Descriptive statistics Frequencies... and usually includes a percentage calculation for each value.
frequency polygon Chapter 4
The graphical representation of the distribution of an interval or ratio variable where the midpoint of each interval is represented by a marker which is then joined by a line to the next midpoint marker.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

G
gamma Chapter 9
A measure of association appropriate to variables where one or both are measured at the ordinal level of measurement.
Goodman and Kruskal's tau, Chapter 9
An asymmetric measure of association between two variables appropriate when one variable is considered to be the cause and the other the effect.
grouped data. Chapter 5
Data which has been recoded or grouped into fewer categories than when originally collected. For instance, age, measured on an interval scale, could be grouped into agegroups in order to create a histogram.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

H
histogram Chapter 4
The graphical representation of the distribution of an interval or ratio variable where the frequency of occurrence of each interval is represented by the height of a bar.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

I
independent variable Chapter 6
An independent variable is one that predicts the values of another variable, the dependent variable. Sometimes called a predictor variable.
indicator Chapter 1
A method intended to measure a concept. For instance, a common indicator of the concept of social class is a person's occupation. Since class is not something we can measure directly, an indicator has to be used. See reliability and validity.
induction Chapter 1
A method of analysis that derives theories by generalising from evidence, usually from a large number of cases. See deduction for an alternative approach.
inner fence Chapter 6
In a boxplot, the inner fences are the boundaries of the main body of the data beyond which lie the outliers. It is positioned at 1.5 times the IQR above or below the upper and lower quartiles respectively.
inter-quartile range Chapter 5
The value of the upper quartile minus the value of the lower quartile. Denoted by IQR
interaction Chapter 9
See specification.
interval variable Chapter 1
An interval variable is one which categories may not only be ranked or ordered but the distance between the categories is precisely defined e.g. salary, age. See level of measurement
intervening Chapter 9
A variable is intervening if another variable has an effect on it and it in turn affects a third variable.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

K
kurtosis. Chapter 5
A property of a distribution which reflects the 'peakedness' of the plotted curve. A tall, peaked distribution is called leptokurtic while a flat, plateau-like shaped distribution is called platykurtic. A symmetrical curve is called mesokurtic.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

L
level of significance Chapter 11
The probability that the research outcome could have happened by chance
level of measurement Chapter 1
The name given to the classification scheme which distinguishes the relationships between the categories or attributes of a variable. See nominal, ordinal, interval and ratio variables.
leverage Chapter 12
A measure of the extent to which an outlier is both distant from the regression line and has a large value on an independent variable.
logistic regression Chapter 12
A type of regression used when the dependent variable is categorical .
longitudinal data Chapter 9
Data collected from the same people over a period of time.
lower quartile Chapter 5
The value of the category which defines the upper boundary of the bottom 25 per cent of cases when they are arranged in rank order. Denoted by Q1.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

M
marginal Chapter 9
The sum of the counts for a particular category of a variable. These sums are often placed in the right-most column or bottom row of a table and are therefore in the margins of the
mean Chapter 5
A measure of central tendency appropriate for interval and ratio variables. It is the arithmetic average of the values of a distribution. Denoted by X BAR symbol or µ.
measure of central tendency Chapter 5
A single statistic that summarises the distribution of one variable. See mode, median and mean.
measure of association Chapter 8
A statistic which measures the degree of association or relationship between two variables. Examples include phi for nominal variables and the correlation coefficient for interval variables.
measures of dispersion Chapter 5
Statistics that describes the spread of a distribution or how the values of a distribution are scattered around the mean. The simplest is the range, but see also variance and standard deviation.
median Chapter 5
A measure of central tendency, most appropriate for ordinal variables. It is the value of the category that occurs in the middle of a ranked distribution. Also known as the 50th percentile.
missing value Chapter 3
Values are assigned as missing values when the researcher wishes to exclude that value from statistical analysis. For instance, some respondents may have refused to answer a question about their age. Every response must be given a code, so you may decide to code this 'no response' with the number 999. However, you would want to ensure that this code is not treated as a valid age response. If it was, the average age for the sample would be incorrect.
mode Chapter 5
A measure of central tendency, most appropriate for nominal variables. It is the value or label of the most frequently occurring category.
model Chapter 12
A theory that proposes relationships between two or more variables.
multi-dimensional scaling Chapter 12
A form of analysis that positions points (representing variables and cases) according to their similarity, with the most similar being nearest to each other.
multi-variate analysis. Chapter 8
Statistical analysis involving three or more variables.
multiple regression analysis Chapter 8
Statistical methods concerned with explaining or predicting the variability of a continuous dependent variable using information from two or more continuous independent variables.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

N
nominal variable Chapter 1
The categories of a nominal variable bear no relationship to one another, other than that they are different. For example, marital status is a nominal variable with the categories, married, single, divorced etc. Nominal variables are measured at the lowest level of measurement.
normal distribution Chapter 7
A theoretical, continuous probability distribution where the horizontal axis plots all possible values of the variable and the vertical axis is the frequency of occurrence of each value.
normal curve Chapter 7
A symmetric, bell-shaped curve describing a normal distribution.
null hypothesis Chapter 11
The converse of a working hypothesis: if the null hypothesis is found to be false, this can be taken as indirect support for the working hypothesis.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

O
observed counts Chapter 11
The counts obtained from the data.
one-sample test Chapter 11
A test comparing a sample statistic with a population parameter.
one-tailed test Chapter 11
An inferential test in which the null hypothesis will be rejected if the test statistic is greater than the critical value.
ordinal variable Chapter 1
The categories of an ordinal variable can be ordered or ranked but the distance between the categories is unknown. For instance, educational level is an ordinal variable ranging from none through higher. How close one category is to another is unknown. See nominal and interval levels of measurement.
Ordinary Least Squares [OLS] Chapter 8
The statistical procedure used in regression analysis to arrive at the 'best fitting' regression line. It is a mathematical method that ensures that the squares of the deviations from the regression line are minimised.
outer fences Chapter 6
In a boxplot, the outer fences define the boundary beyond which lie the extreme outliers at 3.0 times the IQR above and below the upper and lower quartiles.
outlier Chapter 6
A case with an extremely high or extremely low value compared with the rest of a distribution. An outlier has a value more than 1.5 times the IQR above the upper

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

P
partial correlation coefficient Chapter 12
The correlation coefficient between two variables when other variables have been controlled.
partial regression coefficient Chapter 12
A regression coefficient describing the relationship between a dependent and an independent variables obtained when controlling for one or more other independent variables.
partial table. Chapter 9
A table that displays counts for part of a dataset, for example, just for the men.
percentile Chapter 5
The Nth percentile is the value of the category that occurs N per cent through a ranked distribution.
phi Chapter 9
A measure of association appropriate for measuring the strength of a relationship between two dichotomous variables.
pie chart Chapter 4
A graphical method appropriate for nominal and ordinal variables where the proportion in each category of a variable is represented as a segment of a circle.
pooled variance t test Chapter 11
A t test for two samples that uses the variance estimated from all the cases in both samples. A pooled variance t test is appropriate when both samples are drawn from the same population.
population, Chapter 11
The set of all those to whom the research hypothesis is assumed to apply (a synonym for universe).
pre-coded Chapter 1
A question in a survey questionnaire is pre-coded when a number has been assigned to every possible response in advance of the survey. See coding.
prediction Chapter 1
On discovering an association or relationship between two or more variables in a sample, a prediction may be made about how other cases in the population, that were not measured, may behave under the same conditions.
primary analysis Chapter 1
Analysis of data collected by the researcher. See secondary analysis
proportional reduction in error Chapter 9
The increased accuracy (i.e. reduction in error) in predicting the characteristics of sample on one variable that one obtains if the values of a second variable are known, compared with not knowing the second variable. For example, three-quarters of all those in work, work full-time (see Exhibit 9.11). Knowing only this, one can predict that there is a chance of 3 in 4 that any person in the sample is a full-timer. The chances of error in the predication are reduced if one also knows that a person is female, because we know that, for women, 55 per cent are in full-time work. The proportional reduction of error is related to the association between the two variables.
pseudo-random number Chapter 10
A number generated from a complex formula designed for the purpose, usually by computer, which has the properties of a random number

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

Q
quartiles Chapter 5
The values of the categories of a ranked distribution which divides it into four equal parts each containing 25% of the cases. See lower and upper quartiles

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

R
R square Chapter 8
A measure that describes how well a regression line fits the scatter of data points. It describes the proportion of variance of the dependent variable that is 'explained' by the independent variable. In simple regression, it is the square of the correlation coefficient, r. Also known as the coefficient of determination.
random number seed Chapter 10
The starting value for a generator of pseudo-random numbers. The stream of pseudo-random numbers from the generator is always the same for the same starting seed.
random sample Chapter 10
A sample in which the cases are selected from the population at random and in which every case has a chance of being selected
range Chapter 5
The difference between the lowest and highest values of a distribution.
ratio variable Chapter 1
A ratio variable is one that can be measured on an interval scale and also has a meaningful zero. For instance, income is a ratio variable because it is possible to have a zero income. See level of measurement.
real class intervals Chapter 3
When grouping data into more convenient categories, it is usual to declare real class intervals that are one level of precision greater than the original data. This ensures that every possible value can be accounted for in the new coding scheme. For example, if age data have been collected to the nearest whole year, the real class intervals would be reported to one decimal place. For example, the stated interval of 10 to 19 years would be expressed as a real class interval of 9.5 to 19.5.
real class limits Chapter 3
Real class limits are the upper and lower values that contain a real class interval.
recoding Chapter 2
Recoding in SPSS is the procedure used to change or regroup the numeric codes of a variable. For example, if age is coded using respondents' ages in years and you wish to group individuals into those above and below the age of 40, you could recode age so that codes from 0 through 40 are changed to code 1 and all other codes are changed to code 2.
regression equation Chapter 8
An equation that describes the relationship between a dependent variable (Y) and one or more independent variables (X). In its simplest form, written: Y= a + bX where a and b are the regression coefficients.
regression line Chapter 8
The regression line is a graphical representation of the regression equation. It summarises the relationship between a continuous dependent variable and one independent variable using the OLS criterion to obtain the best fitting line. The regression line may be obtained by substituting two values of X into the regression equation, thus obtaining two pairs of X,Y coordinates
regression coefficient Chapter 8
Part of the regression equation which describes the extent of the relationship between the dependent and an independent variable. In the regression equation Y=a +bX , a, the constant, is the intercept on the Y axis and b is a measure of the slope. The standardised b coefficient is called the beta coefficient.
regression plane. Chapter 12
The analogue of a regression line when there are two independent variables. The plane plots in three dimensional space the expected values of the dependent variable for all values of the two independent variables.
reliability Chapter 1
Reliability is concerned with whether the indicator we use to measure a concept gives the same answer each time it is used. See validity.
representative sample Chapter 10
A sample in which cases are included in proportion to the number in the population that resemble them
research hypothesis Chapter 11
A proposition to be investigated that can be assessed as either true or false, expressed in terms of theoretical concepts
residual Chapter 8
In regression, residuals are obtained by subtracting the fitted values from the data values. The residuals are also called deviations from the fitted values.
resistant measure Chapter 5
A statistic that is less likely to be affected by a few extreme high or low values (outliers) in a distribution. The median is a resistant measure which is not affected by extreme values. The mean is not a resistant measure.
rounding Chapter 3
A method of simplifying numbers to fewer significant figures than in the original number. There are several ways to round numbers but the most common is to round down a number ending in 1 through 4 and round up numbers ending in 5 through 9. For example, 41.4 rounds down to 41 and 41.5 rounds up to 42.
row percentages Chapter 9
A percentage calculated by finding the count in a cell compared with the total count for the row (the row marginal count). For example, if there are 530 unemployed women in a sample including 648 unemployed people in all, the row percentage of unemployed women is 530/648 %, or 82%.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

S
sample Chapter 1
A set of people selected from a population, usually with a random method that ensures that everyone has an equal chance of selection.
sample distribution Chapter 10
The frequency distribution of an empirical sample. Not to be confused with sampling distributions.
sampling distribution Chapter 10
A theoretical frequency distribution of any statistic calculated for a sample. For example, the sampling distribution of the mean could be generated by collecting an infinite number of similar size samples and calculating the means for each. These means would then form a distribution which could be plotted and would form a normal curve. Sampling distributions are the basis of inferential statistics.
sampling error Chapter 10
The difference between an estimate derived from a sample and the true value measured in the population
saturated Chapter 12
A model that includes all possible effects.
secondary analysis Chapter 1
Reanalysis of data collected by another researcher or organisation that may originally have been collected for other purposes. See primary analysis.
separate variance t test Chapter 11
A t test for two samples that uses the variance estimated from the cases in each sample separately. A separate variance t test is appropriate when the samples are drawn from the different populations.
simple regression analysis Chapter 8
Statistical methods concerned with explaining or predicting the variability of a continuous dependent variable using information from one continuous independent variable.
skewed distribution Chapter 5
A distribution which has either predominately low values and a few extreme high values (positively skewed) or one where there are predominately high values and a few extreme low values (negatively skewed). When plotted, such a distribution produces a non-symmetric curve where the mean, mode and median do not coincide.
specification Chapter 9
If the strength of the association between two variables differs according the level of a third variable, the third variable specifies the relationship. Also called interaction.
SPSS Chapter 2
A computer program used for the management and analysis of social science data.
spurious Chapter 9
A relationship between two variables is spurious if it disappears when a third variable is controlled.
standard deviation Chapter 5
The square root of the variance
standard deviation or SD line Chapter 8
The line obtained by plotting the means and standard deviations of two continuous variables.
standard error of the mean Chapter 10
The standard deviation of the distribution of sample means
standard error of the difference Chapter 11
The standard error of a test statistic obtained from comparing two samples
Standard Normal Curve Chapter 7
A normal curve after standardisation to z scores.
standardisation Chapter 7
The procedure to create standard scores (z scores) in order to compare variables measured in different units.
stem and leaf diagram Chapter 6
An exploratory data analytic method to show the distribution of a continuous variable
stepwise selection Chapter 12
A procedure for exploratory analysis that alternates using forward selection and backward elimination in order to find the most satisfactory set of variables to use in a regression equation.
stub Chapter 9
The left-hand column of a table, in which the labels describing each row are placed.
subsample Chapter 2
A subsample is a selection of cases from a sample. For example, if you remove all the males from a sample of both males and females, you are left with a subsample of females.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

T
test statistic Chapter 11
The statistic used to test a null hypothesis. The test statistic must be one whose distribution is known; common test statistics are the z score, the t statistic, chi
transformation Chapter 7
A mathematical procedure employed to change the scale of a variable from one set of values into another set of values, without changing their relative order. This procedure is often employed in order to produce a more linear relationship between two continuous variables. Examples include transforming proportions into percentages by multiplying each value by 100 and, in regression analysis, transforming raw scores into logarithmic scores.
two-sample test Chapter 11
A test comparing two statistics derived from the characteristics of different samples
two-tailed test Chapter 11
An inferential test in which the null hypothesis will be rejected if it falls into either the critical region above the mean value, or the critical region below the mean value (see Exhibit 11.6)
type I error Chapter 11
The probability that a null hypothesis will be rejected although it is actually true
type II error Chapter 11
The probability that a null hypothesis will be accepted although it is in fact false

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

U
unexplained variance Chapter 8
In regression, unexplained variance is the variance after the variance 'explained' by the regression is subtracted from the total variance of the dependent variable. Often called the variance of the residuals.
univariate statistics Chapter 3
Univariate statistics are statistics that describe the characteristics of a single variable.
universe Chapter 11
The set of all those to whom the research hypothesis is assumed to apply (a synonym for population)
upper quartile Chapter 5
The value of the category which defines the lower boundary of the top 25 per cent of cases when they are arranged in rank order. Denoted by Q3.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

V
validity Chapter 1
Validity concerns how well an indicator measures the concept it is designed to measure. For instance, if we wanted to measure height and all we had was a set of bathroom scales, we would not have a valid indicator since the scales measure weight. However, the scales are a very reliable measure since we get the same weight each time. However, if we used an elastic tape measure to measure height, we may get a different height each time we use it - it is very unreliable but it is a valid indicator since it does measure height. See reliability
value label Chapter 2
Value labels in SPSS are an optional way of explaining the numeric codes ascribed to each response of a variable. For example, code 1, indicating those respondents who are married, may be given the value label 'Married'.
variable name Chapter 2
Required by SPSS to identify each variable. Variable names may be up to 8 characters long and must be unique in any one data file. For example, marital status may be given the variable name marstat.
variable label Chapter 2
A label that may be created in SPSS to describe each variable. Variable labels, with a limit of 255 characters, allow you to expand on the 8-character variable name. The variable label for the variable marstat could be "Respondent's marital status".
variable-centred Chapter 1
A type of data analysis concerned with exploring relationships between variables, rather than exploring relationships between cases.
variance Chapter 5
A measure of dispersion which is calculated from the average of the sum of the squared difference of each value from the mean value. A distribution with a low variance has most values clustered around the mean; a distribution with a high variance has a wide spread of values around the mean. Denoted by s2 or 2

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

W
working hypothesis Chapter 11
A reformulation of a research hypothesis in terms of indicators rather than concepts

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

X
X variable Chapter 8
An independent variable in a regression analysis.

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

Y
Y variable Chapter 8
The dependent variable in a regression analysis

A  B  C  D  E  F  G  H  I  J   K  L  M  N  O  P  Q  R  S   T  U  V  W  X  Y  Z  Home

Z
z score Chapter 7
Standardisation of a value by subtracting the variable's mean from the value and dividing by the variable's standard deviation. A distribution of z scores always has a mean of zero and a standard deviation of unity.