GMDH Shell Documentation

Concepts and Features

GMDH Shell is a Windows application, its GUI consists of a host application and plug-ins that draw their panels inside the host window. There are two areas - tabs and the sidebar where plug-in panels can be placed.

GMDH Shell plug-ins are linked in a chain that can be executed by clicking at the Start button or from the command line.

Project folders and Templates

When you click at the Start button GMDH Shell saves all modified settings (several files) to a folder where the dataset is located. At program start-up plug-ins try to read settings from this folder and if some of configuration files are not found they use default settings located in the program installation directory.

So, project folder is a folder that contains data sources and settings. All settings inside the project folder can be applied only to datasets stored in this folder.

Task specific project settings called Templates can be loaded using the Menu > File > Load template.

Dataset

GMDH Shell can read a dataset from CSV (text) and XLS files composed of columns and rows. In the program GUI we usually mention the columns and rows as variables and observations. In case of classification problem we call them variables and instances.

GMDH Shell uses part of dataset variables as model inputs and another part (one or more variables) as prediction targets. Multivariate datasets consist of two variables or more, while univariate datasets consist of only one variable.

Problem types

To solve real-world predictive analytics problems we should formulate them in terms of standard problem types. GMDH Shell is able to produce categorical and continuous value predictions that allow it to solve classification and regression problems respectively. It is notable that GMDH Shell provides sophisticated tools for Time series forecasting which is a special type of continuous value prediction. Other popular tasks that GMDH Shell can help with are Feature ranking, Function finding and Curve fitting.

Time series forecasting

Time series are time-ordered datasets (univariate or multivariate). The recent part of data in time series is usually more important for model training than other historical observations. GMDH Shell has a special Time series preprocessor for proper management of ordered observations. Time series preprocessor allows core algorithms to learn from a window of latest data. Another useful feature of Time series preprocessor is the ability to launch iterative step-back simulations for evaluation of method accuracy.

Classification

Classification is prediction of a category of unknown instance. GMDH Shell has a special Classification & Regression preprocessor that allows two-class and multi-class classification. GMDH Shell requires all text data to be encoded with numbers.Target variables with more than two categories can be encoded and decomposed into binary variables or just encoded with numbers.

Regression

Regression is a prediction of continuous values. Unknown points of target variable can reside in the end of the dataset or be involved during application of a model.

Results

As a result of processing GMDH Shell returns a set of predictive models and their predictions. The best model for the first target variables is shown in visualization panels by default. Other models of the same target can be viewed using the Model browser panel. Also GMDH Shell calculates importance of each variable and model performance for the known part of modeled dataset.

Feature list

Solving modeling problems:

Multivariate time series forecasting

Regression (continuous value prediction)

Classification (prediction of a category)

Ranking and selection of variables

Polynomial curve fitting

Modeling simulation outputs the following results:

A set of models that can be exported to Excel

Predictions

Importance of input variables

Analysis of out-of-sample model accuracy

Predictive modeling work-flow:

Create a model

Save the model

Export the model's formula to Excel (deploy a model)

Load a model from a save-file

Apply the model to unknown instances within the analyzed file

Apply the model to a new data-file (scoring)

Learning algorithms:

GMDH-type neural networks

Combinatorial GMDH

Embedded data exploration:

File preview

Descriptive statistics

Line charts

Bar charts

Scatter plot

Histogram

Autocorrelation chart

Pair-wise correlations with ranking

Contour plot

Heat map

3D surface

Data-file formats:

CSV (and any other text files with delimiters)

XLSX

XLS

File sets with the same extension

Data pre-processing:

Visual handling of input and output (target) variables and data transformations

Handling of missing values

Converting categorical (text) data into numeric values (encoding and binary decomposition)

Weighting of dataset rows (handling of imbalanced classification problems)

Time series preprocessing (lags, differences, moving average, incremental weighting of dataset rows)

Elementary functions (logarithmic transformation, normalization, etc.)

Dynamic post-processing

Average of top-ranked models

Quantization of predictions

Miscellaneous:

Background execution mode via the command line

Dataset examples and project templates

One-click result recalculation for dynamically updated data files

Support for multi-core processors

Support for clustered Linux systems (Enterprise edition)

~~UP~~

GMDH Shell Documentation

Sidebar

External links

General topics

Reference

Table of Contents

Concepts and Features

Project folders and Templates

Dataset

Problem types

Time series forecasting

Classification

Regression

Results

Feature list

Solving modeling problems:

Modeling simulation outputs the following results:

Predictive modeling work-flow:

Learning algorithms:

Embedded data exploration:

Data-file formats:

Data pre-processing:

Dynamic post-processing

Miscellaneous:

GMDH Shell Documentation

User Tools

Site Tools

Sidebar

External links

General topics

Reference

Table of Contents

Concepts and Features

Project folders and Templates

Dataset

Problem types

Time series forecasting

Classification

Regression

Results

Feature list

Solving modeling problems:

Modeling simulation outputs the following results:

Predictive modeling work-flow:

Learning algorithms:

Embedded data exploration:

Data-file formats:

Data pre-processing:

Dynamic post-processing

Miscellaneous:

Page Tools