17.2.1 Creating Histograms


The histogram plots the frequency distribution of a single variable dataset. Histograms allow for a quick assessment of these parameters:

  • Dataset center.
  • Dataset range.
  • Dataset skewness.
  • The existence of outliers.
  • The existence of multiple modes in the data.

The histogram is produced by splitting the data range into bins of equal size (automatically or by user specification). Bins are the class range (lower boundary ¡Ü bin < upper boundary) for frequency counts. Then for each bin, the number of points from the dataset that fall into each bin is counted. The result is a plot of frequency (i.e. counts for each bin) on the vertical axis vs. response variable on the horizontal axis.

Creating histograms

To create a histogram:

  1. Highlight one or more Y worksheet columns (or a range from one or more Y columns).
  2. Select Plot: 2D: Histogram: Histogram or click the Histogram button Button Histogram.png on the 2D Graphs menu.

Origin automatically calculates the bin size and creates a new graph from the HISTGM.OTP template. The binned data is saved in a Binn worksheet (see next). This worksheet contains the bin X values, Counts, (cumulative) Sum, and Percentages.

Note: The Histogram menu command plots each selected data set in the same layer. The Stacked Histograms menu command (Plot: Statistics: Stacked Histograms) plots each selected data set in its own layer (using the same bin limit for each layer).

Accessing the binned data

To access the Binn worksheet:

  1. Right-click on the histogram and select Go to Bin Worksheet from the shortcut menu.
  2. Double-click to open the Plot Details dialog, then go to Data tab to click Go to Bin Worksheet button

Customizing the histogram

To customize the histogram:

  1. Double-click on the histogram, or right-click and select Plot Details.

Both actions open the Plot Details dialog box with the histogram data plot icon active on the left side of the dialog box. The histogram controls are available on tabs on the right side of the dialog box. Binning behavior is determined by controls on the Data tab. Note that some Data tab controls are relevant only to the Box Chart (which shares the Data tab with the histogram).

Controlling the binning

Edit the bin limits by:

  1. Clearing the Automatic Binning check box.
  2. Edit the Bin Size, and the Begin and End values (see Programming Note, below).
    Or
    Switch the radio box to By Intervals, edit the Number of Bins, and the Begin and End values.

Overlaying a curve on the histogram

To overlay a distribution curve on the binned data (the histogram):

  1. Select Normal, Lognormal, Weibull, Exponential, Gamma, Laplace, Lorentz, Kernel Smooth, Poisson or Binomial from the Type drop-down list. The Preview window displays your selection.

See details about the Curve group in the Data tab of Plot Details dialog page.

Programming Notes:

Bin size and number are controllable via these system variables:

To set bin size, @HBS = value; value = -1 if not set.

To set bin number, @HBN = value; rounding is used and value = 0 if not set.

To force bin number, @HBM = value; value may be a non-integer (rounding is not used).

Priority sequence: @HBS > @HBN > @HBM

If neither @HBS and @HBN are specified, @HBF = value will determine bin number automatically as per the following expression:

number of bins = 1 + nint(value* log10(npts))

Create a Histogram + Probabilities

To create a Histogram + Probabilities:

Highlight a single worksheet column (or a range from a worksheet column) and select Plot: Statistics: Histogram + Probabilities or click the Histogram + Probabilities button Button Histogram And Probabilities.png on the 2D Graphs toolbar.

This menu command/toolbar button plots a cumulative sum of observations in a second graph layer (layer 2).

Additionally, the Histogram + Probabilities menu command types the statistical results -- the mean, the standard deviation, the maximum and minimum values, and the total number of values -- to the Results Log.

Hint: The two layers in the Histogram + Probabilities graph template are linked. Therefore, before you make changes to the X axis scale or to the dimensions of the layer, make sure that the parent layer -- layer 1 -- is the active layer. This will ensure that the child layer axis scales and dimensions -- in this case, layer 2 -- are adjusted accordingly.