2.75 FAQ-820 How to work with large datasets in Origin?

Last Update: 7/13/2018


Data Size Limits

Data in Origin can be contained in workbooks and matrices. Each workbook can contain up to 1024 worksheets. Within a given worksheet, the number of columns can be up to 65500, and the number of rows can be up to 90 million (64-bit OS). In practical terms, the maximum limits may be lower depending on your available system resources.

A matrix window can contain up to 1024 matrix sheets. Each sheet can contain up to 90 million matrix columns (1 row) or 90 million rows (1 column). Again, the maximum limits may be lower depending on your available system resources.

Sampling Interval Support

Origin worksheet columns support a Sampling Interval property. If the X values associated with a Y data set are evenly spaced, then that information can be stored as the Sampling Interval of the Y column. This allows the size of the Origin Project to be reduced by 50%, as the X column is no longer needed as an explicit column in the worksheet. This also improves the plotting and analysis speed of large datasets as the X information does not have to be read point-by-point from a worksheet column.

Sampling Interval in the Large Datasets.png

Graphing Large Datasets

By default Origin hides points when plotting large datasets. This is referred to as Speed Mode. For plotting matrix data in Speed Mode, fixed increments in both the X and Y dimensions are used to skip points. For worksheet data, a more sophisticated Speed Mode mechanism examines the nature of the data and selects a subset of points that represent the overall data shape. In the Speed Mode dialog (open with menu Graph: Speed Mode), you can select Speed Mode options of Low, Medium, High, or a Custom setting. The setting can be further saved as part of the Graph template, or as a theme for use with other graphs.

Large Datasets-Speed Mode-line.png

Importing Large Datasets

Many of Origin's import routines support partial importing, allowing you, for example, to import 5 rows, skip the next 20, import the next 5 and so on. A partial import of a large dataset lets you quickly examine the nature of the data, and also try various plotting and analysis routines on the data subset, rather than on the entire dataset. Once your graphs and analyses have been optimized you can use Origin's Re-Import feature to re-import the file in its entirety. Analysis results set to automatically recalculate, as well as any graphs you have created, will automatically update using the full dataset.

IImpASC dialog box with Skip.png Reimport and Analysis Template.png

Analyzing Large Datasets

In addition to the ability to perform a partial import, Origin has flexible tools for graphically selecting a range of your data. These Region of Interest (ROI) tools make it possible to perform calculations, data processing, data analysis, etc on a subset of your already imported data. There are several tools for selecting data:

Data Selector, data markers

Graphically define a range for analysis using a beginning and ending data marker. (Button Data Selector.png button on Tools toolbar)

Regional Data Selector

Graphically define one or more ranges on one or more curves for analysis. The Regional Data Selector tools on Tools toolbar include the option of selecting only the Active curve Button Selection On Active Plot.png, or all curves inside the region Button Selection On All Plots.png. When making your selection you can choose to use a rectangular window or free form shape. Analysis markers define the selected ranges.

Data Selector in Graph.png Fitting of partial range selected.png

Reduce Rows or Columns

You can reduce data before performing analysis or graphing. Please refer to this blog post for the detailed information.

Batch Process Multiple Data Files

Origin provides some batch processing tools to batch analyze or plot multiple files. Please refer to this blog post and these tutorials for the detailed information.



Keywords:large dataset, limitation, sampling interval, speed mode, skip rows, reduce, batch processing, regional selector, region