Concepts and Features

GMDH Shell is a Windows application, its GUI consists of a host application and plug-ins that draw their panels inside the host window. There are two areas - tabs and the sidebar where plug-in panels can be placed.

GMDH Shell plug-ins are linked in a chain that can be executed by clicking at the Start button or from the command line.

Concept

Project folders and Templates

When you click at the Start button GMDH Shell saves all modified settings (several files) to a folder where the dataset is located. At program start-up plug-ins try to read settings from this folder and if some of configuration files are not found they use default settings located in the program installation directory.

So, project folder is a folder that contains data sources and settings. All settings inside the project folder can be applied only to datasets stored in this folder.

Task specific project settings called Templates can be loaded using the Menu > File > Load template.

Dataset

GMDH Shell can read a dataset from CSV (text) and XLS files composed of columns and rows. In the program GUI we usually mention the columns and rows as variables and observations. In case of classification problem we call them variables and instances.

GMDH Shell uses part of dataset variables as model inputs and another part (one or more variables) as prediction targets. Multivariate datasets consist of two variables or more, while univariate datasets consist of only one variable.

Problem types

To solve real-world predictive analytics problems we should formulate them in terms of standard problem types. GMDH Shell is able to produce categorical and continuous value predictions that allow it to solve classification and regression problems respectively. It is notable that GMDH Shell provides sophisticated tools for Time series forecasting which is a special type of continuous value prediction. Other popular tasks that GMDH Shell can help with are Feature ranking, Function finding and Curve fitting.

Time series forecasting

Time series are time-ordered datasets (univariate or multivariate). The recent part of data in time series is usually more important for model training than other historical observations. GMDH Shell has a special Time series preprocessor for proper management of ordered observations. Time series preprocessor allows core algorithms to learn from a window of latest data. Another useful feature of Time series preprocessor is the ability to launch iterative step-back simulations for evaluation of method accuracy.

Classification

Classification is prediction of a category of unknown instance. GMDH Shell has a special Classification & Regression preprocessor that allows two-class and multi-class classification. GMDH Shell requires all text data to be encoded with numbers.Target variables with more than two categories can be encoded and decomposed into binary variables or just encoded with numbers.

Regression

Regression is a prediction of continuous values. Unknown points of target variable can reside in the end of the dataset or be involved during application of a model.

Results

As a result of processing GMDH Shell returns a set of predictive models and their predictions. The best model for the first target variables is shown in visualization panels by default. Other models of the same target can be viewed using the Model browser panel. Also GMDH Shell calculates importance of each variable and model performance for the known part of modeled dataset.

Feature list

Solving modeling problems:
  • Multivariate time series forecasting
  • Regression (continuous value prediction)
  • Classification (prediction of a category)
  • Ranking and selection of variables
  • Polynomial curve fitting
Modeling simulation outputs the following results:
  • A set of models that can be exported to Excel
  • Predictions
  • Importance of input variables
  • Analysis of out-of-sample model accuracy
Predictive modeling work-flow:
  • Create a model
  • Save the model
  • Export the model's formula to Excel (deploy a model)
  • Load a model from a save-file
  • Apply the model to unknown instances within the analyzed file
  • Apply the model to a new data-file (scoring)
Learning algorithms:
  • GMDH-type neural networks
  • Combinatorial GMDH
Embedded data exploration:
  • File preview
  • Descriptive statistics
  • Line charts
  • Bar charts
  • Scatter plot
  • Histogram
  • Autocorrelation chart
  • Pair-wise correlations with ranking
  • Contour plot
  • Heat map
  • 3D surface
Data-file formats:
  • CSV (and any other text files with delimiters)
  • XLSX
  • XLS
  • File sets with the same extension
Data pre-processing:
  • Visual handling of input and output (target) variables and data transformations
  • Handling of missing values
  • Converting categorical (text) data into numeric values (encoding and binary decomposition)
  • Weighting of dataset rows (handling of imbalanced classification problems)
  • Time series preprocessing (lags, differences, moving average, incremental weighting of dataset rows)
  • Elementary functions (logarithmic transformation, normalization, etc.)
Dynamic post-processing
  • Average of top-ranked models
  • Quantization of predictions
Miscellaneous:
  • Background execution mode via the command line
  • Dataset examples and project templates
  • One-click result recalculation for dynamically updated data files
  • Support for multi-core processors
  • Support for clustered Linux systems (Enterprise edition)
You are here: IntroductionConcepts and Features
CC Attribution-Noncommercial 3.0 Unported
Valid CSS Driven by DokuWiki Recent changes RSS feed Valid XHTML 1.0