**Probit**** Analysis**

By: Kim Vincent

**Quick Overview**

- Probit analysis is a type of regression used to analyze binomial response variables.
- It transforms the sigmoid dose-response curve to a straight line that can then be analyzed by regression either through least squares or maximum likelihood.
- Probit analysis can be conducted by one of three techniques:
- Using tables to estimate the probits and fitting the relationship by eye,
- Hand calculating the probits, regression coefficient, and confidence intervals, or
- Having a stastitical package such as SPSS do it all for you.

**Background**

The idea of probit analysis was
originally published in *Science* by
Chester Ittner Bliss in 1934. He worked as an
entomologist for the Connecticut agricultural experiment station and was primarily
concerned with finding an effective pesticide to control insects that fed on
grape leaves (Greenberg 1980). By plotting the response of the insects to
various concentrations of pesticides, he could visually see that each pesticide
affected the insects at different concentrations, i.e. one was more effective
than the other. However, he didn’t have
a statistically sound method to compare this difference. The most logical approach would be to fit a
regression of the response versus the concentration, or dose and compare
between the different pesticides. Yet,
the relationship of response to dose was sigmoid in nature and at the time
regression was only used on linear data.
Therefore, Bliss developed the idea of transforming the sigmoid dose-response
curve to a straight line. In 1952, a professor of statistics at the University of
Edinburgh by the name of David Finney took Bliss’ idea and wrote a book called *Probit** Analysis *(Finney 1952). Today, probit
analysis is still the preferred statistical method in understanding
dose-response relationships.

**The Basics**

Probit Analysis is a specialized regression model of binomial response variables.

Remember that regression is a method of fitting a line to
your data to compare the relationship of the response variable or dependent
variable (Y) to the independent variable (X).
** **

**Y**** = a + b X + e **

Where

- a =
y-intercept
- b =
the slope of the line
- e
= error term

Also remember that a binomial response variable refers to a
response variable with only two outcomes.

For example:

- Flipping a coin: Heads or tails
- Testing
beauty products: Rash/no rash
- The
effectiveness or toxicity of pesticides: Death/no death

**Applications **

Probit analysis is used to analyze many kinds of dose-response or binomial response experiments in a variety of fields. However, because my background knowledge of probit analysis stems only from toxicology, the examples from this webpage will only be of toxicology.

Probit Analysis is commonly used in toxicology to determine the relative toxicity of chemicals to living organisms. This is done by testing the response of an organism under various concentrations of each of the chemicals in question and then comparing the concentrations at which one encounters a response. As discussed above, the response is always binomial (e.g. death/no death) and the relationship between the response and the various concentrations is always sigmoid. Probit analysis acts as a transformation from sigmoid to linear and then runs a regression on the relationship.

Once a regression is run, the researcher can use the output of the probit analysis to compare the amount of chemical required to create the same response in each of the various chemicals. There are many endpoints used to compare the differing toxicities of chemicals, but the LC50 (liquids) or LD50 (solids) are the most widely used outcomes of the modern dose-response experiments. The LC50/LD50 represent the concentration (LC50) or dose (LD50) at which 50% of the population responds.

For example, consider comparing the toxicity of two different pesticides to aphids, pesticide A and pesticide B. If the LC50 of pesticide A is 50ug/L and the LC50 of pesticide B is 10ug/L, pesticide B is more toxic than A because it only takes 10ug/L to kill 50% of the aphids, versus 50ug/L of pesticide B.

**How does probit analysis work?
How to get from dose-response curve to an LC50?**

Below you will find a step by step guide to using probit analysis with various methods. The easiest by far is to use a statistical
package such as SPSS, SAS, R, or S, but it is good to see the history of the
methodology to get a thorough understanding of the material.

**Step 1: Convert %
mortality to probits (short for probability unit)**

__Method A__: Determine probits by looking up those corresponding to the %
responded in Finney’s table (Finney 1952):

For example, for a 17% response, the corresponding probit would be 4.05. Additionally, for a 50% response (LC50), the corresponding probit would be 5.00.

__Method B__: Hand calculations (Finney and Stevens
1948):

The probit Y, of the proportion P is defined by:

The standard method of analysis makes use of the maximum and minimum working probits:

And the range 1/Z where

__Method C__: Computer software
such as SPSS, SAS, R, or S convert the percent responded to probits
automatically.

**Step 2: Take the log
of the concentrations.**

This can either be done by hand if doing hand calculations, or specify this action in the computer program of choice.

For example, after clicking Analyze, Regression, Probit, choose the log of your choice to transform:

**Step 3: Graph the probits versus the log of the concentrations and fit a line
of regression.**

Note: Both least squares and maximum likelihood are acceptable techniques to fitting the regression, but maximum likelihood is preferred because it gives more precise estimation of necessary parameters for correct evaluation of the results (Finney 1952).

__Method A:__ Hand fit the line
by eye that minimizes the space between the line and the data (i.e. least
squares). Although this method can be
surprisingly accurate, calculating a regression by hand or using computer
program is obviously more precise. In
addition, hand calculations and computer programs can provide confidence
intervals.

__Method B__: Hand calculate the
linear regression by using the following method (Finney and Stevens
1948):

First set the proportion responding to be equal to p = r/n and the complement equal to q = 1-p.

The probits
of a set value of p should be approximately linearly related to x, the measure
of the stimulus, and a line fitted by eye may be used to give a corresponding
set of *expected probits*,
Y.

The working probit corresponding to each proportion is next calculated from either of the following equations:

Next a set of expected probits is then derived from the weighted linear regression equation of working probits on x, each y being assigned a weight, nw, where the weighting coefficient, w, is defined as:

The process is repeated with the new set of Y values. The iteration converges to give you a linear regression.

__Method C__: Use a computer
program. SPSS uses maximum likelihood to
estimate the linear regression.

To run the probit anaylsis in SPSS, follow the following simple steps:

Simply input a minimum of three columns into the Data Editor

· Number of individuals per container that responded

· Total of individuals per container

· Concentrations

For example in the following
screen, a_mort is the number of individuals that
responded per container, a_total is the total number
of individuals per container, and a_conc are the
concentrations. Row

Screen 1:

After you columns are set, simply go to analyze, regression, probit:

Screen 2:

Then set your number responded column as the “Response Frequency”, the total number per container as the “Total Observed”, and the concentrations as the “Covariates”. Don’t forget to select the log base 10 to transform your concentrations.

Screen 3:

If you run the above example, you will see that SPSS determines an optimal solution after 18 iterations.

**Step 4: Find the LC50**

__Method A: __Using your hand
drawn graph, either created by eye or by calculating the regression by hand, find the probit of 5 in the
y-axis, then move down to the x-axis and find the log of the concentration
associated with it. Then take the
inverse of the log and voila! You have
the LC50.

__Method B:__ The LC50 is
determined by searching the probit list for a probit of 5.00 and then taking the inverse log of the
concentration it is associated with.

**Step 5: Determine the
95% confidence intervals:**

__Method A:__ Hand
calculate using the following equation:

The standard error is approximately: ± 1/b √(Snw)

•
b = estimate of the slope of the
line

•
Snw =
summation of nw

•
w = weighted coefficient from
table III = Z²/PQ Finney, 1952

__Method B__: SPSS and other computer programs calculate
this automatically

**Notes of Interest for Probit Analysis**

·
Probit
analysis assumes that the relationship between number responding (not percent
response) and concentration is normally distributed.

If data are not normally distributed, logit (more on this below) is preferred.

·
Must correct data if there is more
than 10% mortality in the control

One method is to use the Schneider-Orelli’s (1947) formula:

% Responded – % Responded in Control

Corrected =
________________________________ x 100

100
- Responded % in Control

For example:

Let’s say you have 20% mortality in
the control and you are correcting the mortality rate for the concentration
where 60% occurred. Plug in the mortality rates into the equation
above and you come up with a mortality of 50% instead of the original 60%.

60% – 20%

________ 40/80 = 50%

100% – 20%

** Logit vs. Probit**:

Logit is
another form of transforming binomial data into linearity and is very similar
to probit. Logit functions by taking the log of the odds: logit(P)
= log P/ (1-P). Yet, the relationship
between logit and probit is
almost indistinguishable: Logit ≈ (π/√3) x probit. In general,
if response vs. dose data are not normally
distributed, Finney suggests using the logit over the
probit transformation (Finney, 1952). Although the multivariate usage of probit analysis is beyond the content of this webpage, it
is worth noting that the similarity between probit
and logit doesn’t hold in a multivariate realm (Hahn and Soyer date unknown). Hahn and Soyer
suggest that logit provides a better fit in the
presence of extreme independent variable levels and conversely that probit better fit random effects models with moderate data
sets (Hahn and Soyer date unknown).

**Summary**

·
Probit
Analysis is a type of regression used with binomial response variables. It is very similar to logit,
but is preferred when data are normally distributed.

·
Most common outcome of a
dose-response experiment in which probit analysis is
used is the LC50/LD50.

· Probit analysis can be done by eye, through hand calculations, or by using a statistical program.

**Citations:**

Finney, D. J., Ed. (1952). __Probit____
Analysis__. Cambridge, England, Cambridge University
Press.

Finney, D. J. and W. L. Stevens (1948).
"A table for the calculation of working probits and weights in probit
analysis." __Biometrika__ **35**(1-2): 191-201.

Greenberg, B. G. (1980). "Chester I. Bliss,
1899-1979." __International Statistical Review / Revue Internationale de Statistique__
**8**(1): 135-136.

Hahn, E. D. and R. Soyer. (date unknown). "Probit and Logit Models: Differences in a Multivariate Realm." Retrieved May 28, 2008, from http://home.gwu.edu/~soyer/mv1h.pdf.