Return to gmdhshell.com
Table of contents
Getting Started
Work-Flow
Other Topics
Solver is a plug-in module that solves a modeling problem and returns a set of predictive models. You can accept solver settings loaded form a template or configure it manually. In the solver configuration panel you can do the following
Reordering of rows enables sorting of instances by target column values and/or shuffling of instances within the learning frame. Shuffling usually makes dataset parts equally informative. This works well for all problem types and is appropriate also for Time series.
| Options | Meaning |
No | Turned-off |
Odd/even | Places all even instances after odd instances. Example: 1,2,3,4,5,6 → 1,3,5,2,4,6 |
Variance + Odd/even | Sorting of instances by variance of target values prior to Odd/even reordering. |
Asc.+ Odd/even | Ascending sorting of instances (by target values) prior to Odd/even reordering. |
Desc.+ Odd/even | Descending sorting of instances (by target values) prior to Odd/even reordering. |
Sets a strategy for model validation and sorting out.
| Options | Meaning |
Training/testing | Splits dataset into two parts, uses the training part to find model coefficients and uses the testing part to compare all generated models. |
Whole data testing | Splits dataset, trains model using the training part, but uses both parts for testing |
k-fold validation | Splits dataset onto k parts, trains a model k times using k-1 parts, each time measuring model performance using a new remaining part. Finally residuals obtained from all testing parts are summarized in order to compare the model with other competing models. |
Leave-one-out CV | This is a k-fold cross-validation with the number of folds equal to the number of instances in the dataset |
Sets the amount of instances to be used for training and testing (percentage or exact quantities).
Usage of a proportion allows entering a percentage of instances and absolute quantities in the same box. For example, a split given by the ratio 4:1 can be expressed as a percentage 80:20 or as an exact number of points 160:40.
Defines model selection criterion for both the core algorithm and variables ranking.
| Options | Meaning |
RMSE | select models with the lowest RMSE on the testing sample. |
MAE | select models with the lowest MAE on the testing sample. |
Hit% +RMSE | Calculate a percentage of correctly classified instances in a two-class classification task. Models that make equal number of hits will be ranked by their RMSE. |
RMSE⋅√c | RMSE criterion multiplied by square root of model complexity, i.e. number of terms. |
MAE⋅√c | MAE criterion multiplied by square root of model complexity, i.e. number of terms. |
Hit%+RMSE⋅√c | Calculate a percentage of correctly classified instances in a two-class classification task. Models that make equal number of hits will be ranked by their RMSE penalized with square root of model complexity. |
Turns on preliminary ranking and reduction of variables.
| Options | Meaning |
No | Turned-off |
by error (independent) | Ranking of variables according to their individual ability to predict testing data |
by usage (combinatorial) | Ranking of variables according to their importance for Combinatorial Core algorithm with limited complexity (equal to 2). Importance is calculated as the number of times the variables appear in the set of best models. |
Reduces the number of variables to n i.e. keeps n most important variables according to the selected ranking algorithm. Preliminary reduction of variables may reduce the quality of models, but it is definitely useful for quicker processing of high-dimensional datasets.
You can select one of the available statistical learning algorithms. The description of algorithms implemented in GMDH Shell can be found in Learning algorithms
| Options | Meaning |
Combinatorial | Combinatorially optimized models. |
Neural-type | Polynomial neural networks of GMDH-type. |
Any particular model may consist of not more than n terms.
Expands dataset with the new artificial features. Higher-dimensional space frequently helps to improve the Classification and Regression models. Be careful with expanding of more than 20 initial variables because the number of all possible pairs grows fast.
| Options | Meaning |
No | No additional variables except constant term. |
xi·xj | Adds all possible multiplied pairs. |
xi·xj, xj² | Adds all possible multiplied pairs and squares. |
xi·xj, xi/xj | Adds all possible multiplied and divided pairs. Skips pairs that cause dividing by zero. |
Custom | Uses terms of custom polynomial function as new variables, see Custom polynomial |
Be aware of quick growing of memory and time consumption.
| The number of initial variables | Resulting number of variables | |||
No | xi·xj | xi·xj, xi² | xi·xj, xi/xj |
|
| 2 | 3 | 4 | 6 | 5 |
| 3 | 4 | 7 | 10 | 10 |
| 5 | 6 | 16 | 21 | 26 |
| 10 | 11 | 56 | 66 | 101 |
| 20 | 21 | 211 | 231 | 401 |
| 50 | 51 | 1276 | 1326 | 2501 |
| 100 | 101 | 5051 | 5151 | 10001 |
| 200 | 201 | 20101 | 20301 | 40001 |
| 500 | 501 | 125251 | 125751 | 250001 |
The number of input variables allowed for a neuron. It is quite efficient to use two inputs for any neuron. Otherwise the computational task may become too complex.
Sets the type of the internal function for neurons. The neurons are active, i.e. each neuron can drop some of the function terms in order to increase overall predictive power of the model.
| Options | Meaning |
a0 + a1·xi + a2·xj | Linear |
a0 + a1·xi + a2·xj + a3·xi·xj | Polynomial |
a0 + a1·xi + a2·xj + a3·xi·xj + a4·xi² + a5·xj² | Quadratic polynomial |
Custom | Uses custom polynomial function defined by user, see Custom polynomial |
Sets complexity limitation for the Neuron function and thus reduces computational resources needed for optimization of neurons. This option is useful for neurons with more than two inputs or neurons with high order custom polynomial functions.
Sets the upper limit for the number of network layers created by the algorithm.
For the Neural-type Core algorithm the parameter defines how many neurons should be selected at each layer. This also expands the set of returned models at every layer. So, when the simulation is completed, you can use the Model browser for browsing model structures obtained at hidden layers of the final network.
For the Combinatorial algorithm the parameter defines how many best models should be selected for every target variable. Then you can average these models during the Postprocess stage or just browse them for better understanding of the obtained solution.
This parameter increases memory usage that can be critical for large runs with more than thousand simulations.
When turned-on, this option allows manual control of the number of parallel processing threads. When turned-off, the number of threads is equal to the number of logical processors i.e. processor cores or hyper-threading cores in your PC.
You can configure a Custom polynomial function to be used for generation of Additional variables or as a Neuron function. When the 'Custom' option is selected in the corresponding list of options, a dialog window called Custom polynomial is shown.
Sets the upper limit for power of any variable in a polynomial term.
Sets the lower limit for power of any variable in a polynomial term.
For example, if Max. power is 3 and Min. power is -2, then the following terms are included to the custom polynomial: x13, x12, x1, 1 (constant term), 1/x1, 1/x12. If Min. power is 1 or higher then resulting polynomial will not include a constant term.
Sets a limit for sum of absolute powers of all variables in a polynomial term.
For example, if Max total power is 3 then the following terms can be included: x1*x22, x1*x2-2, x1*x2*x3, …
Sets the maximum number of variables in any polynomial term. For example, if Max. number of variables is 3, then the following terms can be included: x1*x2*x3, (x12)*(x24)*(x3-1) …
Here are some configuration examples for two input variables:
Max. power = 2, Min. power = 0, Max total power = 2, Max number in a term = 2 results in:
y(x1,x2) = a0 + a1*x1 + a2*x2 + a3*x1*x2 + a4*x12 + a5*x22
Max. power = 4, Min. power = 0, Max total power = 4, Max number in a term = 1 results in:
y(x1,x2) = a0 + a1*x1 + a2*x12 + a3*x13 + a4*x14 + a5*x2 + a6*x22 + a7*x23 + a8*x24
This solver is available only in GMDH Shell Enterprise Edition.
HPC Solver provides an ability to send computational tasks to a remote clustered computer with installed Linux and MPI. Aside from the support for remote computing the HPC Solver is similar to the base Solver.
CPUs | The amount of CPUs to be requested from the remote system. The remote system may put the request to queue. |
Info button | Informs about availability of remote CPU resources and gives control over already submitted tasks. |
Auto-retrieval | Allow the HPC client to check the remote system for task completion every 30 seconds. |
Reserve cluster | Keep your CPU resources reserved after the task completion and ready to start the next task immediately. |