5.6.2 Cluster Analysis


We will perform cluster analysis for the mean temperatures of US cities over a 3-year-period.

The starting point is a hierarchical cluster analysis with randomly selected data in order to find the best method for clustering. K-means analysis, a quick cluster method, is then performed on the entire original dataset.

Minimum Origin Version Required: OriginPro 8.6 SR0

Hierarchical Cluster Analysis

  1. Start with a new project or a new workbook. Import the data file \Samples\Graphing\US Mean Temperature.dat.
  2. Highlight Column D through Column O.
  3. Select Statistics: Multivariate Analysis: Hierarchical Cluster Analysis.
  4. Select Input tab, click the triangle button Button Select Data Right Triangle.png next to Variables, and then click Select Columns... in the context menu.
    Cluster ex2 hcluster dialog1.png
  5. In the lower panel of the Column Browser dialog, click the ... button. Set the data range from 1 to 100. Click OK.
    Cluster ex2 col browser.png
  6. In the dialog, go to Settings tab, make sure Cluster is set to Observations, and Number of Clusters is 1. Select Furthest Neighbor for Cluster Method and then click OK.
    Hcluster ex2 dialog1.png
  7. Go to the Cluster 1 sheet. Based on the resulting dendrogram, we choose to cluster data into 5 groups.
  8. Click the lock icon in the dendrogram or the result tree, and then click Change Parameters in the context menu.
  9. Set Number of Clusters to 5 in the Settings tab and then select the Cluster Center check box in the Quantities tab. Click OK.
    Cluster ex2 hcluster dialog.png
    Cluster ex2 hcluster dialog01.png
  10. In the resulting dendrogram, we can clearly see how observations are clustered. (Note, you can double-click to open and customize the dendrogram.)
    Hcluster ex2 dendrogram.png
  11. Due to the large number of observations, tick labels overlap in this dendrogram. Use the Scale In Button Scale In Tool.png tool to select an area to magnify.
    Dendrogram zoom1.PNG
Note that beginning with Origin 2019b you will find, on the Plot tab, a radio button for displaying Similarity on the Y axis of your Dendrogram (Distance is still default).

Analyzing Original Data with K-Means Cluster

  1. Right-click on Cluster Center and select Create Copy as New Sheet in the context menu. We are going to use the newly created Cluster Center as the Initial Cluster Centers in our k-means cluster analysis.
    Cluster ex2 cluster center.png
  2. Go back to the worksheet with the source data (US Mean Temperature), and highlight col(D) through col(O). Select Statistics: Multivariate Analysis: K-Means Cluster Analysis.
  3. Select the Specify Initial Cluster Centers check box in the Options tab. Click the interactive button Button Select Data Interactive.png next to Initial Cluster Centers. The dialog will "roll up".
  4. Go to Cluster Center and hightlight Col(D) through Col(O). Click the button on the rolled-up dialog to restore the dialog.
  5. In the Plot tab, select Group Graph. Click the interactive button Button Select Data Interactive.png next to X Range. The dialog will "roll up". Go back to the source worksheet US Mean Temperature, and highlight Col(B):Longtitude. Click the button in the rolled up dialog to restore.
  6. Click the triangle buttonButton Select Data Right Triangle.png next to Y Range, and then select C(Y), Latitude. Click OK.
    Kmeans ex2 dialog.png
  7. Activate the worksheet K-Means Plot Data1. Observe that data has been clustered into 5 groups corresponding to the latitudes of the cities.
    Group graph.png

User can also select the output destination of Cluster Membership column, such as next to input data, for further operation if needed

Cluster Membership.png