The definitions given here are designed to understand the more technical parts of statistical regression analysis. I often find this type of work can help evidence some relationships and some inferred causality. The regressions are ultimately equations, so if relativism is not relevant or necessary for a particular research question, then a more layered and appropriate approach using regressions can help. Similarly, simple reductive reasoning from the statistical findings can also mislead the ‘real’ logic and intuitive depth of research. In short exploring economics and the economy is both an art and a science. Italics indicate cross-references to other entries. Warning though, statistics can bring out a sense of certainty rather than doubt – curb your self-righteousness.

Autocorrelation

The association between a variable and an earlier or a subsequent measure of the same variable.

Binary variable/outcome

A variable with just two values, for example, ‘yes’ and ‘no’.

Bivariate regression model

A regression model analysing two variables to establish the strength of the relationship between them

Categorical variable

A variable taking more than two ordered or unordered values, for example social class or ethnic group.

Conditional model

A regression model.

Confounding variable

A variable not included in a regression model that might lead to biased causal conclusions.

Control variable

A variable included in a regression model to reduce the effects of self-selection.

Difference in differences

A regression model in which the difference model between two (or more) occasions in the outcome variable is related to the difference in the explanatory variable of causal interest.

Dummy variable

One or more binary variables generated from a categorical variable in order to represent that variable as an explanatory variable in a regression model.

Effect size

The effect of an explanatory variable on an outcome variable, expressed in standardised units.

Endogeneous variable

A variable that is part of the causal process generating an outcome variable.

Exogeneous variable

A variable that is unrelated to the causal process generating an outcome variable.

Explanatory variable

The variable on the right-hand side of a regression model, also known as a predictor variable or, misleadingly, as an independent variable which, in some circumstances, is a cause of the outcome variable.

Expected value

The mean.

Fixed effect

An effect associated with an explanatory variable that has a limited number of values. See also random effect.

Imputation model

A model that generates plausible values of a variable that has one or more missing values.

Instrumental variable/

A variable correlated with an explanatory Instrument variable of interest but uncorrelated with the residual in a regression model and used in place of the explanatory variable in order to reduce bias.

Interaction

The product of two or more explanatory variables, used to represent the situation when the effect of one explanatory variable varies according to the value of another explanatory variable.

Interval scale

A scale, for example income, in which the same numerical difference between two scale values has the same meaning throughout the scale’s range.

Linear effect

An effect of an explanatory variable that does not vary according to its own value or with the value of other explanatory variables.

Logistic regression model

A regression model in which the outcome variable is a binary variable.

Maximum likelihood

A method of estimation used with, for example, a logistic regression model.

Measurement error

When the true and observed values of a variable differ.

Monotonic

Increasing or decreasing but not both.

Multilevel model

A regression model in which the outcome variable can vary at two or more levels, for example between pupils within schools and between schools. These levels are represented as random effects.

Multivariate regression

A regression model with more than two outcome variables.

Non-linear effect

An effect of an explanatory variable that varies according to its own value or with the value of other explanatory variables.

Normal distribution

The symmetric bell-shaped statistical distribution.

Ordinary least squares

A method of estimation often used with, for example, a regression model.

Outcome variable

The variable on the left-hand side of a regression model, also known as a dependent variable or response which, in some circumstances, is an effect of an explanatory variable.

p-value

See statistical significance.

Point estimate

The actual estimate from the sample.

Power transformation

A transformation, for example square root or log, often used to bring an outcome variable closer to a Normal distribution.

Predicted value

The expected value of an outcome variable from a regression model for fixed values of the explanatory variables.

Prediction distribution

The distribution, often assumed to be Normal, of a predicted value.

Prediction model

A regression model.

Pseudo R squared

One measure of the extent to which the explanatory variables account for the variability of a binary outcome in a logistic regression model.

A squared term, for example x2.

Quartiles

The points that divide any statistical distribution into four equal sections.

R squared (R2)

The proportion of the variance of an outcome variable accounted for by the explanatory variables in a regression model when the outcome is measured on an interval scale.

Random effect

An effect associated with explanatory variables (for example, schools) chosen at random from a population having a large or infinite number of possible values.

Regression model

A statistical model that relates one (or more) outcome variables to a set of one or more explanatory variables.

Residual

A variable in a regression model that represents the variance unexplained by the explanatory variables.

A statistical distribution used for income.

Standard deviation

The square root of the variance.

Standard error

A representation of the way in which an estimate, for example, from a regression model, varies from sample to sample.

Statistical distribution

A representation of the way in which a variable varies from case to case in a sample.

Statistical significance

A representation, using the p-value, of how likely it is that the effect of interest will be as far or further from a chosen value (often zero) in other samples.

Uniform distribution

A statistical distribution in which all values are equally likely.

Unobserved heterogeneity

The variance in an outcome variable attributed to confounding variables.

Variance

A measure of the spread in a statistical distribution.

Wald test

A test of statistical significance often applied to a set of estimates from a regression model.

z transformation

A transformation of any statistical distribution so that the mean is zero and the variance is one.