5.6.2 Cluster Analysis
We will perform cluster analysis for the mean temperatures of US cities over a 3-year-period.
The starting point is a hierarchical cluster analysis with randomly selected data in order to find the best method for clustering. K-means analysis, a quick cluster method, is then performed on the entire original dataset.
Minimum Origin Version Required: Updated Origin 2020
Hierarchical Cluster Analysis
- Start with a new project or a new workbook. Import the data file \Samples\Graphing\US Mean Temperature.dat.
- Highlight Column D through Column O.
- Select Statistics: Multivariate Analysis: Hierarchical Cluster Analysis and open the dialog.
- Select Input tab, click the triangle button next to Variables, and then click Select Columns... in the context menu.
- In the lower panel of the Column Browser dialog, click the ... button. Set the data range from 1 to 100. Click OK.
- Click on the Settings tab and set Cluster to Observations, and Number of Clusters is 1. For Cluster Method, select Furthest Neighbor and then click OK.
- Go to the Cluster 1 sheet. After examining the resulting dendrogram, we choose to cluster data into 5 groups.
- Click the lock icon in the dendrogram or the result tree, and then click Change Parameters in the context menu.
- Set Number of Clusters to 5 in the Settings tab and then select the Cluster Center check box in the Quantities tab. Click OK.
- In the resulting dendrogram, we can clearly see how observations are clustered. Note, that you can double-click on the embedded dendrogram in the report sheeet to open the dendrogram in its own window. From here, you can customize the dendrogram -- for instance, by adding text labels, arrows, etc -- then click the Close button in the upper-right corner of the graph window to put changes back into the embedded graph in the report sheet.
- To focus in on a particular subtree, click on a node to select it then right-click and choose Duplicate Branch to New Window. This opens the selected subtree in a new graph window.
- Note that beginning with Origin 2019b you will find, on the Plot tab of the hcluster dialog, a radio button for displaying Similarity on the Y axis of your Dendrogram (Distance is still default).
Analyzing Original Data with K-Means Cluster
- Right-click on Cluster Center and select Create Copy as New Sheet in the context menu. We are going to use the newly created Cluster Center as the Initial Cluster Centers in our k-means cluster analysis.
- Go back to the worksheet with the source data (US Mean Temperature), and highlight col(D) through col(O). Select Statistics: Multivariate Analysis: K-Means Cluster Analysis.
- Select the Specify Initial Cluster Centers check box in the Options tab. Click the interactive button next to Initial Cluster Centers. The dialog will "roll up".
- Go to Cluster Center and hightlight Col(D) through Col(O). Click the button on the rolled-up dialog to restore the dialog.
- In the Plot tab, select Group Graph. Click the interactive button next to X Range. The dialog will "roll up". Go back to the source worksheet US Mean Temperature, and highlight Col(B):Longtitude. Click the button in the rolled up dialog to restore.
- Click the triangle button next to Y Range, and then select C(Y), Latitude. Click OK.
- Activate the worksheet K-Means Plot Data1. Observe that data has been clustered into 5 groups corresponding to the latitudes of the cities.
User can also select the output destination of Cluster Membership column, such as next to input data, for further operation if needed