Solver

Solver is a plug-in module that solves a modeling problem and returns a set of predictive models. You can accept solver settings loaded form a template or configure it manually. In the solver configuration panel you can do the following

  • Optionally, expand input variables with automatically-generated artificial variables - Additional variables.
  • Optionally, shuffle data rows in order to make statistical characteristics of training and testing data more uniformly - Dataset reordering.
  • Select between a few kinds of validation and cross-validation strategies - Validation strategy.
  • Choose a model ranking criterion - Validation criterion.
  • Optionally, reduce the number of input variables (feature selection) - Variables ranking
  • Choose one of the model generation algorithms - Core algorithm.

Solver panel

Reorder rows

Reordering of rows enables sorting of instances by target column values and/or shuffling of instances within the learning frame. Shuffling usually makes dataset parts equally informative. This works well for all problem types and is appropriate also for Time series.

OptionsMeaning
No Turned-off
Odd/even Places all even instances after odd instances. Example: 1,2,3,4,5,6 → 1,3,5,2,4,6
Variance + Odd/evenSorting of instances by variance of target values prior to Odd/even reordering.
Asc.+ Odd/evenAscending sorting of instances (by target values) prior to Odd/even reordering.
Desc.+ Odd/evenDescending sorting of instances (by target values) prior to Odd/even reordering.

Validation strategy

Sets a strategy for model validation and sorting out.

Options Meaning
Training/testingSplits dataset into two parts, uses the training part to find model coefficients and uses the testing part to compare all generated models.
Whole data testingSplits dataset, trains model using the training part, but uses both parts for testing
k-fold validationSplits dataset onto k parts, trains a model k times using k-1 parts, each time measuring model performance using a new remaining part. Finally residuals obtained from all testing parts are summarized in order to compare the model with other competing models.
Leave-one-out CVThis is a k-fold cross-validation with the number of folds equal to the number of instances in the dataset

Train/test ratio

Sets the amount of instances to be used for training and testing (percentage or exact quantities).

Usage of a proportion allows entering a percentage of instances and absolute quantities in the same box. For example, a split given by the ratio 4:1 can be expressed as a percentage 80:20 or as an exact number of points 160:40.

Validation criterion

Defines model selection criterion for both the core algorithm and variables ranking.

Options Meaning
RMSE select models with the lowest RMSE on the testing sample.
MAE select models with the lowest MAE on the testing sample.
Hit% +RMSE Calculate a percentage of correctly classified instances in a two-class classification task. Models that make equal number of hits will be ranked by their RMSE.
RMSE⋅√c RMSE criterion multiplied by square root of model complexity, i.e. number of terms.
MAE⋅√c MAE criterion multiplied by square root of model complexity, i.e. number of terms.
Hit%+RMSE⋅√c Calculate a percentage of correctly classified instances in a two-class classification task. Models that make equal number of hits will be ranked by their RMSE penalized with square root of model complexity.

Variables ranking

Turns on preliminary ranking and reduction of variables.

OptionsMeaning
No Turned-off
by error (independent) Ranking of variables according to their individual ability to predict testing data
by usage (combinatorial) Ranking of variables according to their importance for Combinatorial Core algorithm with limited complexity (equal to 2). Importance is calculated as the number of times the variables appear in the set of best models.
Drop variables after rank n

Reduces the number of variables to n i.e. keeps n most important variables according to the selected ranking algorithm. Preliminary reduction of variables may reduce the quality of models, but it is definitely useful for quicker processing of high-dimensional datasets.

Core algorithm

You can select one of the available statistical learning algorithms. The description of algorithms implemented in GMDH Shell can be found in Learning algorithms

OptionsMeaning
CombinatorialCombinatorially optimized models.
Neural-typePolynomial neural networks of GMDH-type.

Combinatorial

Limit Complexity to n

Any particular model may consist of not more than n terms.

Additional variables

Expands dataset with the new artificial features. Higher-dimensional space frequently helps to improve the Classification and Regression models. Be careful with expanding of more than 20 initial variables because the number of all possible pairs grows fast.

OptionsMeaning
NoNo additional variables except constant term.
xi·xj Adds all possible multiplied pairs.
xi·xj, xj² Adds all possible multiplied pairs and squares.
xi·xj, xi/xj Adds all possible multiplied and divided pairs. Skips pairs that cause dividing by zero.
Custom Uses terms of custom polynomial function as new variables, see Custom polynomial

Be aware of quick growing of memory and time consumption.

The number of
initial variables
Resulting number of variables
No xi·xj xi·xj, xi²xi·xj, xi/xj
2 3 4 6 5
3 4 7 10 10
5 6 16 21 26
10 11 56 66 101
20 21 211 231 401
50 51 1276 1326 2501
100 101 5051 5151 10001
200 201 20101 20301 40001
500 501 125251 125751 250001

Neural-type

Neuron inputs

The number of input variables allowed for a neuron. It is quite efficient to use two inputs for any neuron. Otherwise the computational task may become too complex.

Neuron function

Sets the type of the internal function for neurons. The neurons are active, i.e. each neuron can drop some of the function terms in order to increase overall predictive power of the model.

OptionsMeaning
a0 + a1·xi + a2·xjLinear
a0 + a1·xi + a2·xj + a3·xi·xj Polynomial
a0 + a1·xi + a2·xj + a3·xi·xj + a4·xi² + a5·xj²Quadratic polynomial
Custom Uses custom polynomial function defined by user, see Custom polynomial
Limit neuron complexity to

Sets complexity limitation for the Neuron function and thus reduces computational resources needed for optimization of neurons. This option is useful for neurons with more than two inputs or neurons with high order custom polynomial functions.

Max. number of layers

Sets the upper limit for the number of network layers created by the algorithm.

Population of best models

For the Neural-type Core algorithm the parameter defines how many neurons should be selected at each layer. This also expands the set of returned models at every layer. So, when the simulation is completed, you can use the Model browser for browsing model structures obtained at hidden layers of the final network.

For the Combinatorial algorithm the parameter defines how many best models should be selected for every target variable. Then you can average these models during the Postprocess stage or just browse them for better understanding of the obtained solution.

This parameter increases memory usage that can be critical for large runs with more than thousand simulations.

Set parallel threads manually

When turned-on, this option allows manual control of the number of parallel processing threads. When turned-off, the number of threads is equal to the number of logical processors i.e. processor cores or hyper-threading cores in your PC.

Custom polynomial

 Custom polynomial You can configure a Custom polynomial function to be used for generation of Additional variables or as a Neuron function. When the 'Custom' option is selected in the corresponding list of options, a dialog window called Custom polynomial is shown.

Max. power of a variable

Sets the upper limit for power of any variable in a polynomial term.

Min. power of a variable

Sets the lower limit for power of any variable in a polynomial term.

For example, if Max. power is 3 and Min. power is -2, then the following terms are included to the custom polynomial: x13, x12, x1, 1 (constant term), 1/x1, 1/x12. If Min. power is 1 or higher then resulting polynomial will not include a constant term.

Max. total power in a term

Sets a limit for sum of absolute powers of all variables in a polynomial term.

For example, if Max total power is 3 then the following terms can be included: x1*x22, x1*x2-2, x1*x2*x3, …

Max. number of variables in a term

Sets the maximum number of variables in any polynomial term. For example, if Max. number of variables is 3, then the following terms can be included: x1*x2*x3, (x12)*(x24)*(x3-1) …

Here are some configuration examples for two input variables:

Max. power = 2, Min. power = 0, Max total power = 2, Max number in a term = 2 results in:

y(x1,x2) = a0 + a1*x1 + a2*x2 + a3*x1*x2 + a4*x12 + a5*x22

Max. power = 4, Min. power = 0, Max total power = 4, Max number in a term = 1 results in:

y(x1,x2) = a0 + a1*x1 + a2*x12 + a3*x13 + a4*x14 + a5*x2 + a6*x22 + a7*x23 + a8*x24

HPC Solver

HPC Solver panel This solver is available only in GMDH Shell Enterprise Edition.

HPC Solver provides an ability to send computational tasks to a remote clustered computer with installed Linux and MPI. Aside from the support for remote computing the HPC Solver is similar to the base Solver.

CPUs The amount of CPUs to be requested from the remote system. The remote system may put the request to queue.
Info buttonInforms about availability of remote CPU resources and gives control over already submitted tasks.
Auto-retrievalAllow the HPC client to check the remote system for task completion every 30 seconds.
Reserve clusterKeep your CPU resources reserved after the task completion and ready to start the next task immediately.
You are here: IntroductionSolver
CC Attribution-Noncommercial 3.0 Unported
Valid CSS Driven by DokuWiki Recent changes RSS feed Valid XHTML 1.0