The definitions given here are designed to understand the more technical parts of statistical regression analysis. I often find this type of work can help evidence some relationships and some inferred causality. The regressions are ultimately equations, so if relativism is not relevant or necessary for a particular research question, then a more layered and appropriate approach using regressions can help. Similarly, simple reductive reasoning from the statistical findings can also mislead the ‘real’ logic and intuitive depth of research. In short exploring economics and the economy is both an art and a science. Italics indicate cross-references to other entries. Warning though, statistics can bring out a sense of certainty rather than doubt – curb your self-righteousness.

**Autocorrelation **

The association between a variable and an earlier or a subsequent measure of the same variable.

**Binary variable/outcome **

A variable with just two values, for example, ‘yes’ and ‘no’.

**Bivariate regression model **

A *regression model *analysing two variables to establish the strength of the relationship between them

**Categorical variable **

A variable taking more than two ordered or unordered values, for example social class or ethnic group.

**Conditional model **

A *regression model*.

**Confounding variable **

A variable not included in a *regression model* that might lead to biased causal conclusions.

**Control variable **

A variable included in a *regression model *to reduce the effects of self-selection.

**Difference in differences **

A *regression model *in which the difference model between two (or more) occasions in the *outcome variable *is related to the difference in the *explanatory variable *of causal interest.

**Dummy variable **

One or more binary variables generated from a *categorical variable *in order to represent that variable as an *explanatory variable *in a *regression model*.

**Effect size **

The effect of an *explanatory variable *on an *outcome variable*, expressed in standardised units.

**Endogeneous variable **

A variable that is part of the causal process generating an *outcome variable*.

**Exogeneous variable **

A variable that is unrelated to the causal process generating an *outcome variable*.

**Explanatory variable **

The variable on the right-hand side of a *regression model*, also known as a predictor variable or, misleadingly, as an independent variable which, in some circumstances, is a cause of the *outcome variable*.

**Expected value **

The mean.

**Fixed effect **

An effect associated with an explanatory variable that has a limited number of values. See also *random effect*.

**Imputation model **

A model that generates plausible values of a variable that has one or more missing values.

**Instrumental variable/ **

A variable correlated with an *explanatory *Instrument *variable *of interest but uncorrelated with the residual in a *regression model *and used in place of the explanatory variable in order to reduce bias.

**Interaction **

The product of two or more *explanatory variables*, used to represent the situation when the effect of one explanatory variable varies according to the value of another explanatory variable.

**Interval scale **

A scale, for example income, in which the same numerical difference between two scale values has the same meaning throughout the scale’s range.

**Linear effect **

An effect of an *explanatory variable *that does not vary according to its own value or with the value of other explanatory variables.

**Logistic regression model **

A regression model in which the *outcome* *variable *is a *binary variable*.

**Maximum likelihood **

A method of estimation used with, for example, a *logistic regression model*.

**Measurement error **

When the true and observed values of a variable differ.

**Monotonic **

Increasing or decreasing but not both.

**Multilevel model **

A *regression model *in which the *outcome variable *can vary at two or more levels, for example between pupils within schools and between schools. These levels are represented as *random effects*.

**Multivariate regression **

A *regression model *with more than two *outcome variables*.

**Non-linear effect **

An effect of an *explanatory variable *that varies according to its own value or with the value of other explanatory variables.

**Normal distribution **

The symmetric bell-shaped *statistical* *distribution*.

**Ordinary least squares **

A method of estimation often used with, for example, a *regression model*.

**Outcome variable **

The variable on the left-hand side of a *regression model*, also known as a dependent variable or response which, in some circumstances, is an effect of an *explanatory variable*.

**p-value **

See *statistical significance*.

**Point estimate **

The actual estimate from the sample.

**Power transformation **

A transformation, for example square root or log, often used to bring an *outcome variable* closer to a *Normal distribution*.

**Predicted value **

The *expected value *of an *outcome variable *from a *regression model *for fixed values of the *explanatory variables*.

**Prediction distribution **

The distribution, often assumed to be Normal, of a *predicted value*.

**Prediction model **

A *regression model*.

**Pseudo R squared **

One measure of the extent to which the *explanatory variables *account for the variability of a *binary outcome *in a *logistic regression model*.

**Quadratic term **

A squared term, for example x2.

**Quartiles **

The points that divide any *statistical distribution *into four equal sections.

**R squared (R2) **

The proportion of the *variance *of an *outcome variable *accounted for by the *explanatory variables *in a *regression model *when the outcome is measured on an *interval scale*.

**Random effect **

An effect associated with *explanatory variables *(for example, schools) chosen at random from a population having a large or infinite number of possible values.

**Regression model **

A statistical model that relates one (or more) *outcome variables *to a set of one or more *explanatory variables.*

**Residual **

A variable in a *regression model *that represents the *variance *unexplained by the *explanatory variables*.

**Singh-Maddala distribution **

A *statistical distribution *used for income.

**Standard deviation **

The square root of the *variance*.

**Standard error **

A representation of the way in which an estimate, for example, from a regression model, varies from sample to sample.

**Statistical distribution **

A representation of the way in which a variable varies from case to case in a sample.

**Statistical significance **

A representation, using the *p-value*, of how likely it is that the effect of interest will be as far or further from a chosen value (often zero) in other samples.

**Uniform distribution **

A *statistical distribution *in which all values are equally likely.

**Unobserved heterogeneity **

The variance in an *outcome variable *attributed to *confounding variables*.

**Variance **

A measure of the spread in a *statistical distribution*.

**Wald test **

A test of *statistical significance *often applied to a set of estimates from a *regression model*.

**z transformation **

A transformation of any *statistical distribution *so that the mean is zero and the *variance *is one.