This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
import [2012/10/17 14:34] Oleksiy [ODBC/OLEDB Import] |
import [2014/02/11 09:48] Oleksiy [Import] |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
====== Import ====== | ====== Import ====== | ||
- | The import module reads data from data sources. | + | Import module is used to setup connections |
- | * CSV/ | + | |
- | * ODBC/OLEDB Import | + | |
- | + | ||
- | + | ||
- | ==== CSV/ | + | |
- | + | ||
- | The module | + | |
- | * Tables that store variables as columns (by default). | + | |
- | * Tables that store variables as rows. | + | |
- | * CSV, XLS and XLSX file formats. | + | |
- | * Reading table headings from the the first row. | + | |
- | * Reading row labels from a column that contains unique Timestamps or IDs. | + | |
- | * Reading row labels composed from two or three separate columns: Year + Month, Year + Week, Year + Quarter, Year + Month + Day. | + | |
- | * Detection of text (categorical) variables. (blue columns) | + | |
- | * Detection of Date/time variables in the dataset.(green columns) | + | |
- | * Reading records in backward order. | + | |
- | * Missing values detection. | + | |
- | + | ||
- | :!: Limitations: | + | |
- | * The module reads data only from the first sheet of .xls or .xlsx file.\\ | + | |
- | * The module can't read from password-protected files. | + | |
- | + | ||
- | Data files may consist of numeral, categorical (text) or date/time columns. There is a color-based indication of column types: | + | |
- | + | ||
- | * Categorical (text) - blue. | + | |
- | * Date/time - green. | + | |
- | * Missing values - light grey. (if the cell in not empty) | + | |
- | + | ||
- | Column names are allowed but not required. | + | |
- | GMDH Shell does not provide tools for file editing but allows a user to keep data file opened for editing. It is not required | + | |
- | + | ||
- | === Import dialog === | + | |
- | + | ||
- | - Click on the Import button {{: | + | |
- | - Select one of your data files, press OK. Then the Import configuration dialog opens. | + | |
- | - In the Import dialog set the importing parameters, press OK. | + | |
- | - If your project folder doesn' | + | |
- | - In the Template selection dialog choose a relevant [[Templates|Template]] and press OK. | + | |
- | + | ||
- | If your project folder contains several data files, the Import module makes all of them available | + | |
- | + | ||
- | The file selected during the import procedure receives a special status '' | + | |
- | + | ||
- | Current file path will appear in the title bar of GMDH Shell window: | + | |
- | {{:img: | + | The following connections are available: |
+ | * **CSV/ | ||
+ | * **ODBC/ | ||
+ | * **Order list connection** is used to extract historical demand from the raw list of customer orders. The module is only available in the ' | ||
+ | * **Sage Accounting connection** is used to extract retail sales data from 'Sage 50' and other versions of Sage accounting software. The module is only available in the ' | ||
- | ==== Import configuration | + | ===== CSV/ |
- | {{:img: | + | **File > New project > CSV/ |
+ | * The module works with columns. If series of observations are rows (not columns), **Transpose data** option must be turned-on. | ||
+ | * The module cannot read password-protected spreadsheets. | ||
- | == Read column labels from the 1st row == | + | In the right part of the connection dialog there is a preview window with preliminary import results. It uses the following color-based indication |
- | Reads column names from the first row of data file(s). The number of elements in the first row is used for detection of data table width. | + | * Numeric cells - black. |
+ | * Text cells (categorical data) - blue. | ||
+ | * Date/time cells - green. | ||
+ | * Missing values - light gray. (if the cell is not empty) | ||
+ | * Discarded columns - gray cells. | ||
- | == Read row labels (ID, timestamp) from column N == | + | {{ : |
- | If you have unique data row identifiers, | + | |
- | In case of multiple data files, the row labels will be taken from the '' | + | |
- | Quite often datasets have date marks such as year and month or week located in separate columns. Then you can compose timestamps from several columns using the option **Compose ID from several columns**. | + | == Column labels == |
- | In this case all aggregated columns in the dataset must be neighboring and the option **Read row lables from column N** must point to the first of them. | + | |
- | ==CSV delimiter== | + | **Column labels** are used as variable names, therefore you should either instruct GMDH Shell to **Use 1st row** for this purpose or to generate labels automatically: |
- | Sets a delimiter type. Applicable to CSV files only. | + | == Row labels (ID, timestamp) == |
- | ==Missing value mark== | + | **Row labels** are used to refer to observations, |
- | Import | + | If your dataset store timestamps elements (year, month, day, week and time) in separate columns, you can compose the timestamp from several columns. For this purpose use the option **Compose ID from several columns**. All timestamp elements in the dataset must be neighboring columns |
- | ==Consider text cells as missing== | ||
- | Replaces any non-numeric | + | ==Missing |
+ | Import | ||
- | ==== Data file examples ==== | + | **Missing value mark** is used to type in the missing mark or to select one of the standard marks. |
- | \\ | + | **Consider text cells as missing** is used to replace all non-numeric cells with regular NULL values. |
- | {{: | + | == Other settings == |
- | \\ | + | **Delimiter** is used to set the delimiter type such as coma, space, tab, or any other. This option is applicable only to ASCI files with delimiters ('' |
- | {{: | + | **Import all files with the same extension** is used to connect all files with the same extension within the current directory. |
+ | **Import all sheets of a workbook** is used to connect all sheets of one or many '' | ||
+ | **Transpose tables, i.e. read columns from rows** is used to support data series formatted as rows (not columns). | ||
- | ==== ODBC/OLEDB Import ==== | + | **Reverse order of rows** is used to support data series where the most resent observations are in the top of the table and the oldest observations are in the bottom. |
- | ODBC/ | + | **Import |
- | 1. First of all select the import module: | + | ==== ODBC/ |
- | Menu > File > ODBC/ | + | |
- | 2. In the appeared dialog select directory where all settings and quires | + | **File > New project > ODBC/OLEDB connection** is used to connect various databases. Most database vendors provide at least a minimal ODBC driver with their database. This import module requires knowledge of SQL quires. |
+ | ==== Sage Accounting connection ==== | ||
+ | **File > New project > Sage Accounting connection** is used to easily extract retail sales data from 'Sage 50' and other versions of Sage accounting software (www.sage.co.uk). The module is only available in the ' | ||
- | The import module can be selected in the toolbar | ||