Differences

This shows you the differences between two versions of the page.

--- import [2013/11/08 22:43]
Oleksiy
+++ import [2014/03/25 21:42]
Oleksiy [Import]
@@ Line 1: / Line 1: @@
-====== Import ======
-Import module is used to setup connection to various data sources such as files or databases. Each time the forecasting process is started, GMDH Shell performs import of connected data and converts the data on the fly to a format compliant with GMDH Shell.
+====== Data connection ======
-The following connections are available:
+**Data connection** module is used to set up connections with files or databases. GMDH Shell accesses data files or databases in the beginning of each forecasting launch.
-  * **CSV/XLS/XLSX connection** is used to import one or more spreadsheets or text files with delimiters.
-  * **ODBC/OLEDB connection** is used to import the data from most databases and third party programs via their ODBC interface.
-  * **Sage Accounting connection** is used to extract retail sales data from 'Sage 50' and other versions of Sage accounting software.
-==== CSV/XLS/XLSX connection ====
+There are a number of data connection options:
+  * **CSV/XLS/XLSX connection** is used to connect one or more Excel spreadsheets or text files with delimiters.
+  * **ODBC/OLEDB connection** is used to connect databases or third party programs via the ODBC interface.
+  * **Order list connection** available only in the Business Forecasting package is used to extract historical demand from the raw list of customer orders.
+  * **Sage Accounting connection** available only in the Business Forecasting package is used to extract retail sales data from 'Sage 50' and other versions of Sage accounting software.
+===== CSV/XLS/XLSX connection =====
-**File > New project > CSV/XLS/XLSX connection** is used to connect data files, this can be one or several Excel files or ASCI files with delimiters, for example, //.csv//, //.txt//, etc.
+**File > New project > CSV/XLS/XLSX connection** is used to connect data files, this can be one or several Excel files or ASCI files with delimiters, for example, ''.csv'', ''.txt'', etc. At all stages you can keep your files opened for editing, for example in Excel or Notepad. The following limitations apply:
+  * The module works with columns. If series of observations are rows (not columns), **Transpose data** option must be turned-on.
+  * The module cannot read password-protected spreadsheets.
+In the right part of the connection dialog there is a preview window with preliminary import results. It uses the following color-based indication of data types:
+  * Numeric cells - black.
+  * Text cells (categorical data) - blue.
+  * Date/time cells - green.
+  * Missing values - light gray. (if the cell is not empty)
+  * Discarded columns - gray cells.
-The module has support for
+{{ :img:dialog_data_connection.png?width=543 |Import dialog}}
-  * Tables that store variables as columns (by default).
-  * Tables that store variables as rows.
-  * CSV, XLS and XLSX file formats.
-  * Reading table headings from the the first row.
-  * Reading row labels from a column that contains unique Timestamps or IDs.
-  * Reading row labels composed from two or three separate columns: Year + Month, Year + Week, Year + Quarter, Year + Month + Day.
-  * Detection of text (categorical) variables. (blue columns)
-  * Detection of Date/time variables in the dataset.(green columns)
-  * Reading records in backward order.
-  * Missing values detection.
-:!: Limitations: \\
+== Column labels ==
-  * The module can't read from password-protected files.
-Data files may consist of numeral, categorical (text) or date/time columns. There is a color-based indication of column types:
+**Column labels** are used as variable names, therefore you should either instruct GMDH Shell to **Use 1st row** for this purpose or to generate labels automatically: ''**x1, x2, x3...**''. Both variants can be used simultaneously with **Custom labels** allowing you to replace any labels with your own. For example, **Custom labels** ''**,,Date,**'' will be interpreted as labels ''**x1, x2, Date, x4**''
-  * Categorical (text) - blue.
+== Row labels (ID, timestamp) ==
-  * Date/time - green.
-  * Missing values - light grey. (if the cell in not empty)
-Column names are allowed but not required.
+**Row labels** are used to refer to observations, **Row labels** must be unique like date and time or like identifiers.
-GMDH Shell does not provide tools for file editing but allows a user to keep data file opened for editing. It is not required to close the editor (Excel, Notepad, etc) before clicking at the Import or the Start button. You can modify data file using your editor, save changes and immediately start recalculation of results in GMDH Shell.
-=== Import dialog ===
+If your dataset store timestamps elements (year, month, day, week and time) in separate columns, you can compose the timestamp from several columns. For this purpose use the option **Compose ID from several columns**. All timestamp elements in the dataset must be neighboring columns and the option **Read row lables from column N** must point to the first of them. Supported combinations are: **Year + Month**, **Year + Week**, **Year + Quarter**, **Year + Month + Day**, **Date + Time**.
-  - Click on the Import button {{:img:button-24_import.png?height=16|Import button}} located in the toolbar.
-  - Select one of your data files, press OK. Then the Import configuration dialog opens.
-  - In the Import dialog set the importing parameters, press OK.
-  - If your project folder doesn't contain project settings yet, the Template selection dialog appears next to the Import configuration dialog.
-  - In the Template selection dialog choose a relevant [[Templates|Template]] and press OK.
-If your project folder contains several data files, the Import module makes all of them available in the Data manager. Selection of just one file points importer module to the whole directory.
+==Missing values==
-The file selected during the import procedure receives a special status ''Current'' in the Data manager. Only variables from the Current file can be used as model inputs or targets without filename prefixes, for example, ''var1'' instead of ''filename.var1'' required for other files.
+Import  module is responsible for detection of missing values. It replaces various types of missing values with regular NULL values and thus allows us to handle them properly at the [[Preprocess]] stage.
-Current file path will appear in the title bar of GMDH Shell window:
+**Missing value mark** is used to type in the missing mark or to select one of the standard marks.
-{{:img:_window_title_bar.png|}}
+**Consider text cells as missing** is used to replace all non-numeric cells with regular NULL values.
-==== Import configuration ====
+== Other settings ==
-{{:img:dialog_import_csv-xls-xlsx.png?width=513|Import dialog}}
+**Delimiter** is used to set the delimiter type such as coma, space, tab, or any other. This option is applicable only to ASCI files with delimiters (''.csv'', ''.txt'', etc.).
-== Read column labels from the 1st row ==
+**Import all files with the same extension** is used to connect all files with the same extension within the current directory.
-Reads column names from the first row of data file(s). The number of elements in the first row is used for  detection of data table width.
-== Read row labels (ID, timestamp) from column N ==
+**Import all sheets of a workbook** is used to connect all sheets of one or many ''.xls'' or ''.xlsx'' workbooks.
-If you have unique data row identifiers, for example, calendar dates then you can tell the Importer in which column they are located and use them for visualization instead of default ID marks. For example, row labels serve as timestamps for time series charts.
-In case of multiple data files, the row labels will be taken from the ''Current'' data file.
-Quite often datasets have date marks such as year and month or week located in separate columns. Then you can compose timestamps from several columns using the option **Compose ID from several columns**.
+**Transpose tables, i.e. read columns from rows** is used to support data series formatted as rows (not columns).
-In this case all aggregated columns in the dataset must be neighboring and the option **Read row lables from column N** must point to the first of them.
-==CSV delimiter==
+**Reverse order of rows** is used to support data series where the most resent observations are in the top of the table and the oldest observations are in the bottom.
-Sets a delimiter type. Applicable to CSV files only.
-==Missing value mark==
-Import  module is responsible for detection of missing values. It replaces various cells that fall into missing value conditions with regular NULL values and thus allows [[Preprocess|Preprocessor]] module to handle missing values appropriately.
-==Consider text cells as missing==
-Replaces any non-numeric values with regular NULL values.
+**Import rows starting from** is used to skip a number of rows in the top of the table, in particular this option is used to skip the header information.
 ==== ODBC/OLEDB connection ====
@@ Line 90: / Line 66: @@
 **File > New project > Sage Accounting connection** is used to easily extract retail sales data from 'Sage 50' and other versions of Sage accounting software (www.sage.co.uk). The module is only available in the 'Business forecasting' version of GMDH Shell.
-The import module can be selected in the toolbar

GMDH Shell Documentation

User Tools

Site Tools

Differences

Page Tools