Differences

This shows you the differences between two versions of the page.

--- import [2012/06/10 23:10]
Oleksiy [CSV/XLS/XLSX Import]
+++ import [2014/02/11 09:48]
Oleksiy [Import]
@@ Line 1: / Line 1: @@
 ====== Import ======
-The import module reads data from data sources.
+Import module is used to setup connections to files or databases. GMDH Shell performs import of connected datasets in the beginning of each modeling simulation.
-  * CSV/XLS/XLSX Import
-  * ODBC/OLEDB Import
-==== CSV/XLS/XLSX Import ====
-{{:img:dialog_import_csv-xls-xlsx.png?width=513|Import dialog}}
-The module has support for
-  * Tables that store variables as columns (by default).
-  * Tables that store variables as rows.
-  * CSV, XLS and XLSX file formats.
-  * Reading table headings from the the first row.
-  * Reading row labels from a column that contains unique Timestamps or IDs.
-  * Reading row labels composed from two or three separate columns: Year + Month, Year + Week, Year + Quarter, Year + Month + Day.
-  * Detection of text (categorical) variables. (blue columns)
-  * Detection of Date/time variables in the dataset.(green columns)
-  * Reading records in backward order.
-  * Missing values detection.
-:!: Limitations: \\
-  * The module reads data only from the first sheet of .xls or .xlsx file.\\
-  * The module can't read from password-protected files.
-Data files may consist of numeral, categorical (text) or date/time columns. There is a color-based indication of column types:
-  * Categorical (text) - blue.
+The following connections are available:
-  * Date/time - green.
+  * **CSV/XLS/XLSX connection** is used to import one or more spreadsheets or text files with delimiters.
-  * Missing values - light grey. (if the cell in not empty)
+  * **ODBC/OLEDB connection** is used to import the data from most databases and third party programs via their ODBC interface.
+  * **Order list connection** is used to extract historical demand from the raw list of customer orders. The module is only available in the 'Business forecasting' version of GMDH Shell.
+  * **Sage Accounting connection** is used to extract retail sales data from 'Sage 50' and other versions of Sage accounting software. The module is only available in the 'Business forecasting' version of GMDH Shell.
-Column names are allowed but not required.
+===== CSV/XLS/XLSX connection =====
-GMDH Shell does not provide tools for file editing but allows a user to keep data file opened for editing. It is not required to close the editor (Excel, Notepad, etc) before clicking at the Import or the Start button. You can modify data file using your editor, save changes and immediately start recalculation of results in GMDH Shell.
-=== Import dialog ===
+**File > New project > CSV/XLS/XLSX connection** is used to connect data files, this can be one or several Excel files or ASCI files with delimiters, for example, ''.csv'', ''.txt'', etc. At all stages you can keep your files opened for editing, for example in Excel or Notepad. The following limitations apply:
+  * The module works with columns. If series of observations are rows (not columns), **Transpose data** option must be turned-on.
+  * The module cannot read password-protected spreadsheets.
-  - Click on the Import button {{:img:button-24_import.png?height=16|Import button}} located in the toolbar.
+In the right part of the connection dialog there is a preview window with preliminary import results. It uses the following color-based indication of data types:
-  - Select one of your data files, press OK. Then the Import configuration dialog opens.
+  * Numeric cells - black.
-  - In the Import dialog set the importing parameters, press OK.
+  * Text cells (categorical data) - blue.
-  - If your project folder doesn't contain project settings yet, the Template selection dialog appears next to the Import configuration dialog.
+  * Date/time cells - green.
-  - In the Template selection dialog choose a relevant [[Templates|Template]] and press OK.
+  * Missing values - light gray. (if the cell is not empty)
+  * Discarded columns - gray cells.
-If your project folder contains several data files, the Import module makes all of them available in the Data manager. Selection of just one file points importer module to the whole directory.
+{{ :img:dialog_data_connection.png?width=543 |Import dialog}}
-The file selected during the import procedure receives a special status ''Current'' in the Data manager. Only variables from the Current file can be used as model inputs or targets without filename prefixes, for example, ''var1'' instead of ''filename.var1'' required for other files.
+== Column labels ==
-Current file path will appear in the title bar of GMDH Shell window:
+**Column labels** are used as variable names, therefore you should either instruct GMDH Shell to **Use 1st row** for this purpose or to generate labels automatically: ''**x1, x2, x3...**''. Both variants can be used simultaneously with **Custom labels** allowing you to replace any labels with your own. For example, **Custom labels** ''**,,Date,**'' will be interpreted as labels ''**x1, x2, Date, x4**''
-{{:img:_window_title_bar.png|}}
+== Row labels (ID, timestamp) ==
-=== Import configuration ===
+**Row labels** are used to refer to observations, **Row labels** must be unique like date and time or like identifiers.
-== Read column labels from the 1st row ==
+If your dataset store timestamps elements (year, month, day, week and time) in separate columns, you can compose the timestamp from several columns. For this purpose use the option **Compose ID from several columns**. All timestamp elements in the dataset must be neighboring columns and the option **Read row lables from column N** must point to the first of them. Supported combinations are: **Year + Month**, **Year + Week**, **Year + Quarter**, **Year + Month + Day**, **Date + Time**.
-Reads column names from the first row of data file(s). The number of elements in the first row is used for  detection of data table width.
-== Read row labels (ID, timestamp) from column N ==
-If you have unique data row identifiers, for example, calendar dates then you can tell the Importer in which column they are located and use them for visualization instead of default ID marks. For example, row labels serve as timestamps for time series charts.
-In case of multiple data files, the row labels will be taken from the ''Current'' data file.
-Quite often datasets have date marks such as year and month or week located in separate columns. Then you can compose timestamps from several columns using the option **Compose ID from several columns**.
+==Missing values==
-In this case all aggregated columns in the dataset must be neighboring and the option **Read row lables from column N** must point to the first of them.
-==CSV delimiter==
+Import  module is responsible for detection of missing values. It replaces various types of missing values with regular NULL values and thus allows us to handle them properly at the [[Preprocess]] stage.
-Sets a delimiter type. Applicable to CSV files only.
+**Missing value mark** is used to type in the missing mark or to select one of the standard marks.
-==Missing value mark==
+**Consider text cells as missing** is used to replace all non-numeric cells with regular NULL values.
-Import  module is responsible for detection of missing values. It replaces various cells that fall into missing value conditions with regular NULL values and thus allows [[Preprocess|Preprocessor]] module to handle missing values appropriately.
+== Other settings ==
-==Consider text cells as missing==
+**Delimiter** is used to set the delimiter type such as coma, space, tab, or any other. This option is applicable only to ASCI files with delimiters (''.csv'', ''.txt'', etc.).
-Replaces any non-numeric values with regular NULL values.
+**Import all files with the same extension** is used to connect all files with the same extension within the current directory.
+**Import all sheets of a workbook** is used to connect all sheets of one or many ''.xls'' or ''.xlsx'' workbooks.
-==== Data file examples ====
+**Transpose tables, i.e. read columns from rows** is used to support data series formatted as rows (not columns).
-\\
+**Reverse order of rows** is used to support data series where the most resent observations are in the top of the table and the oldest observations are in the bottom.
-{{:img:window_openoffice_dataset.png|}}
+**Import rows starting from** is used to skip a number of rows in the top of the table, in particular this option is used to skip the header information.
-\\
+==== ODBC/OLEDB connection ====
-{{:img:window_notepad_dataset.png|}}
+**File > New project > ODBC/OLEDB connection** is used to connect various databases. Most database vendors provide at least a minimal ODBC driver with their database. This import module requires knowledge of SQL quires.
+==== Sage Accounting connection ====
-==== ODBC/OLEDB Import ====
+**File > New project > Sage Accounting connection** is used to easily extract retail sales data from 'Sage 50' and other versions of Sage accounting software (www.sage.co.uk). The module is only available in the 'Business forecasting' version of GMDH Shell.
-This preprocessor will be available soon.

GMDH Shell Documentation

User Tools

Site Tools

Differences

Page Tools