This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
import [2012/06/12 13:34] Oleksiy [CSV/XLS/XLSX Import] |
import [2014/02/11 09:48] Oleksiy [Import] |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
====== Import ====== | ====== Import ====== | ||
- | The import module reads data from data sources. | + | Import module |
- | * CSV/ | + | |
- | * ODBC/OLEDB Import | + | |
- | + | ||
- | + | ||
- | ==== CSV/ | + | |
- | + | ||
- | The module | + | |
- | * Tables that store variables as columns (by default). | + | |
- | * Tables that store variables as rows. | + | |
- | * CSV, XLS and XLSX file formats. | + | |
- | * Reading table headings from the the first row. | + | |
- | * Reading row labels from a column that contains unique Timestamps | + | |
- | * Reading row labels composed from two or three separate columns: Year + Month, Year + Week, Year + Quarter, Year + Month + Day. | + | |
- | * Detection | + | |
- | * Detection of Date/time variables | + | |
- | * Reading records in backward order. | + | |
- | * Missing values detection. | + | |
- | + | ||
- | :!: Limitations: | + | |
- | * The module reads data only from the first sheet of .xls or .xlsx file.\\ | + | |
- | * The module can't read from password-protected files. | + | |
- | + | ||
- | Data files may consist of numeral, categorical (text) or date/time columns. There is a color-based indication of column types: | + | |
- | + | ||
- | * Categorical (text) - blue. | + | |
- | * Date/time - green. | + | |
- | * Missing values - light grey. (if the cell in not empty) | + | |
- | Column names are allowed but not required. | + | The following connections |
- | GMDH Shell does not provide tools for file editing but allows a user to keep data file opened for editing. It is not required | + | * **CSV/ |
+ | * **ODBC/ | ||
+ | * **Order list connection** | ||
+ | * **Sage Accounting connection** is used to extract retail sales data from 'Sage 50' | ||
- | === Import dialog | + | ===== CSV/ |
- | - Click on the Import button {{: | + | **File > New project > CSV/ |
- | - Select one of your data files, | + | |
- | - In the Import dialog set the importing parameters, press OK. | + | |
- | | + | |
- | - In the Template selection dialog choose a relevant [[Templates|Template]] and press OK. | + | |
- | If your project folder contains several data files, | + | In the right part of the connection dialog there is a preview window with preliminary import results. It uses the following color-based indication |
+ | * Numeric cells - black. | ||
+ | * Text cells (categorical data) - blue. | ||
+ | * Date/time cells - green. | ||
+ | * Missing values - light gray. (if the cell is not empty) | ||
+ | * Discarded columns - gray cells. | ||
- | The file selected during the import procedure receives a special status '' | + | {{ : |
- | Current file path will appear in the title bar of GMDH Shell window: | + | == Column labels == |
- | {{:img: | + | **Column labels** are used as variable names, therefore you should either instruct GMDH Shell to **Use 1st row** for this purpose or to generate labels automatically: '' |
- | ==== Import configuration ==== | + | == Row labels (ID, timestamp) |
- | {{: | + | **Row labels** are used to refer to observations, |
- | == Read column labels from the 1st row == | + | If your dataset store timestamps elements (year, month, day, week and time) in separate columns, you can compose |
- | Reads column names from the first row of data file(s). The number of elements in the first row is used for detection | + | |
- | == Read row labels (ID, timestamp) from column N == | ||
- | If you have unique data row identifiers, | ||
- | In case of multiple data files, the row labels will be taken from the '' | ||
- | Quite often datasets have date marks such as year and month or week located in separate columns. Then you can compose timestamps from several columns using the option **Compose ID from several columns**. | + | ==Missing values== |
- | In this case all aggregated columns in the dataset must be neighboring and the option **Read row lables from column N** must point to the first of them. | + | |
- | ==CSV delimiter== | + | Import |
- | Sets a delimiter | + | **Missing value mark** is used to type in the missing mark or to select one of the standard marks. |
- | ==Missing value mark== | + | **Consider text cells as missing** is used to replace all non-numeric cells with regular NULL values. |
- | Import | + | == Other settings == |
- | ==Consider text cells as missing== | + | **Delimiter** is used to set the delimiter type such as coma, space, tab, or any other. This option is applicable only to ASCI files with delimiters ('' |
- | Replaces any non-numeric values | + | **Import all files with the same extension** is used to connect all files with the same extension within the current directory. |
+ | **Import all sheets of a workbook** is used to connect all sheets of one or many '' | ||
- | ==== Data file examples ==== | + | **Transpose tables, i.e. read columns from rows** is used to support data series formatted as rows (not columns). |
- | \\ | + | **Reverse order of rows** is used to support data series where the most resent observations are in the top of the table and the oldest observations are in the bottom. |
- | {{: | + | **Import rows starting from** is used to skip a number of rows in the top of the table, in particular this option is used to skip the header information. |
- | \\ | + | ==== ODBC/OLEDB connection ==== |
- | {{: | + | **File > New project > ODBC/OLEDB connection** is used to connect various databases. Most database vendors provide at least a minimal ODBC driver with their database. This import module requires knowledge of SQL quires. |
+ | ==== Sage Accounting connection ==== | ||
- | ==== ODBC/OLEDB Import ==== | + | **File > New project > Sage Accounting connection** is used to easily extract retail sales data from 'Sage 50' and other versions of Sage accounting software (www.sage.co.uk). The module is only available in the ' |
- | This preprocessor will be available soon. | ||