17.2.7 2D Kernel Density
The 2D Kernel Density plot is a smoothed color density representation of the scatterplot, based on kernel density estimation, a nonparametric technique for probability density functions. The goal of density estimation is to take a finite sample of data and to infer the underyling probability density function everywhere, including where no data point are presented. In kernel density estimation, the contribution of each data point is smoothed out from a single point into a region of vicinity. These smoothed density plot shows an average trend for the scatter plot.
Creating 2D Kernel Density Plot
To create a 2D Kernel Density plot:
- Highlight one Y column.
- Open 2D Kernel Density plot dialog by clicking Plot > Contour: 2D Kernel Density.
- In the plot_kde2 dialog box, specify the Method , Number of Grid Points in X/Y and the Number of Points to Display, and Plot Type.
- Click OK to create a 2D Kernel Density plot.
The Dialog of plot_kde2
Specify the input data.
- Bandwidth Method
- Specify the bandwidth calculation method of the 2D Kernel Density plot.
- Bivariate Kernel Density Estimator
- Rules of Thumb
- Density Method
- Specify a method to calculate the kernel density for defined XY grids.
- Choose the option to calculate density values according to the Ks2density equation. For a large dataset, computation of the exact computation may require extensive calculation,
- Binned Approximate Estimation
- Choose the option to calculate approximation of density values. This option is recommended for a large sample.
- Number of Points to Display
- Specify the first N lowest density points to be superimposed on the density image.
- Interpolate Density Points
- Specify the calculation method to decide which points to superimposed on the density image (see details in below Algorithm section). Usually if the number of source data is large (ie. >50000), we strongly recommend to select this option to improve the speed.
- Number of Grid Points in X/Y
- Specify the number of equally spaced grid points for the density estimation.
- Number of Points to Display
- Specify the first N lowest density points to be superimposed on the density image when the checkbox of All is unchecked. Otherwise, it will display all points when the All checkbox is selected by default.
- Grid Range
- As an interim step, a matrix of gridded values is generated from the X/Y data and the kernel density plot is created from the matrix values. By default, the Grid Range registers the minimum and maximum X and Y values in that matrix. Clear the Auto box to enter a value manually.
- X Minimum
- X Maximum
- Y Minimum
- Y Maximum
- Plot Type
- Specify the plot type.
- Use the density matrix to plot contour
- Use the density matrix to make an image plot
Density Estimation data
This determines where the calculated data for the graph is stored.
This determines where the data of the displayed scatter plot is stored. Only available when Number of Points to Display is not 0.
Kernel density estimation is a nonparametric technique to estimate density of scatter points. The goal of density estimation is to estimate underlying probability density function everywhere, including where no data are observed, from the existing scatter points. A kernel function is created with the datum at its center – this ensures that the kernel is symmetric about the datum. Kernel density estimation smooths the contribution of data points to give overall picture of the density of data points.
Density Calculation Method
Specify a method to calculate the kernel density for defined xy grids.
Density values are calculated based on the equation below
where n is the number of elements in vector vX or vY, is ith element in vector vX and is ith element in vector vY. and is the optimal bandwidths values.
Binned Approximate Estimation
Speed up the density calculation by an approximation to the exact estimation of 2D kernel density.
First 2D binning is performed on the (x, y) points to obtain a matrix with the bin counts. Then 2D Fast Fourier Transform is utilized to perform discrete convolutions for calculating density values of each grid.
4th root of density values is calculated to map the density scale to the color scale
Bivariate Kernel Density Estimator
Calculate bandwidth based on linear diffusion process.
Rule of Thumb
The estimation of wx and wy simply can be calculated by:
where n is the size of vector vX or vY, is the sample standard variation for dataset vX, and for dataset vY accordingly.
Interpolate Density Points
Specify the calculation method to decide which points to superimpose on the density image.
If the option is selected, kernel density of points are calculated by the interpolation on the density matrix for defined XY grids. If number of source data is very large, selecting the option can greatly improve the speed.
If the option is not selected, the density values will be calculated by the Exact Estimation method.