One week after the initial release of GMDH Shell 2.0 we are pleased to introduce the updated version 2.1 that sets clear difference between 2.x and 1.x branches.

Here is a comparison of processing time measured for GS 1.x and 2.x. We used datasets of 100, 1'000, 3'000 and 6'500 rows and measured single CPU-core processing time of two GS versions using the same projects and settings.

The bar chart shows that 2.x branch is 6.3 times faster by a factor of 6.3 in case of 6.5k data rows and at least not slower on a dataset with 100 rows. The performance gap difference grows fast exponentially for larger datasets. For example, 2.x learns from 200'000 rows in merely 37 minutes while 1.x can't finish the same task even within one day.
The old GS 1.x.x spends most processing time on validation of a model structure hypothesis about the model structure while estimation of model coefficients takes only a small portion of time. In GS 2.x we have implemented the recurrent procedure for calculation of testing errors that make the model validation stage very cheap in terms of processing time, even for large datasets. So the latest version is significantly faster by an order of magnitude while validation results exactly matches the results of the old validation procedure.
Along with improvements in the processing speed implemented in GMDH Shell 2.1, we continue to improve embedded data exploration tools, user interface and fix the reported issues. You can read more about the latest changes in the program changelog.

Like SVMs, GMDH is a wide class of algorithms and GMDH Shell software implements only part of them. Here I give a brief overview of the general case of GMDH:
GMDH is a machine learning method that gradually complicates mathematical models in order to detect the optimal complexity. It uses components of a certain nonlinear multiparametric equation with linear parameters as building blocks. It can employ multilayered structures similar to neural networks or other ways of model complication like genetic algorithms or full combinatorial search. GMDH estimates parameters of every generated model and performs model validation using a separate part of data that was not involved in the estimation of parameters. As a result of the model validation GMDH algorithm outputs only those models that show better predictive ability. Such details as the type of validation and the class of building blocks are important, but depend on user preferences and a particular problem case.
To summarize scientific studies on GMDH I'd say it is a method that produce predictive models or multilayered networks of linear, polynomial, logistic, Gaussian, harmonic and other nonlinear functions selecting only those models that show accurate predictions during validation stage.
It is the state-of-the-art so to say, but if you look into older books you'll find a bit different description. At early stages of GMDH development (started in 1968 by A.G. Ivakhnenko) it was a procedure that selects a number of pairs of polynomial components, fits validation data and brings selected components to the next layer where new input pairs are considered, and so on. This process is finite because validation data don't accept complex models. Overcomplicated models are unstable and lose their predictive abilities, so the process stops when a new layer can't show a better validation result.

This blog is intended to support users of GMDH Shell and to explore the background of modeling methods implemented in GMDH Shell.
There are so many modeling techniques in the world that one may think nothing new can be suggested in this field, except a new combination of well-known techniques. In the meanwhile, I agree with this point. But who can answer knows which what combination of those methods will solve my task most efficiently?
I think this question has no answer. Practically, people prefer to use an optimal solution instead of the most complex solution that tends to be the very best one. In particular, I have only one obligatory requirement to a modeling technique - a marginal simplicity that still produces desired accuracy. In this regard the Group Method of Data Handling implemented in GMDH Shell is a real gem. Stay tuned for further posts in this blog ...