ALAMO Version 18.7.13 User Guide
Automatic Learning of Algebraic Models
Copyright © 2018 The Optimization Firm. All rights reserved.
This guide provides the essential information you need in order to get the most from the ALAMO software. For more information about ALAMO, contact Nick Sahinidis at niksah@minlp.com.
1. Get Started
2. ALAMO at a Glance
3. ALAMO Examples
4. ALAMO Data and Options Specification Statements
Constrained regression
5. Keywords, handling, and regulatory information
ALAMO screen output
Error messages
Termination conditions
Compatibility with previous versions of ALAMO
GET STARTED
ALAMO is a powerful machine learning software that generates algebraic models from data and first principles. It was designed for students in fields from engineering and computer science to sociology and psychology—and anyone else who uses Excel for data analysis and fitting data. Working alongside the BARON optimization software, ALAMO interrogates models and refines experimental design parts to give you the simplest possible model. And, while fitting data to models, ALAMO lets you determine where to run the simulations or experiments, which models to fit, and how to determine your model’s accuracy and simplicity. ALAMO generates insightful graphical representations of data and results so you can facilitate system analysis, optimization, and decision making.
GET STARTED
To install ALAMO you must be running on Windows 64-bit (installer), Linux 64 bit, or Mac OS X 64 bit.
Important: Before you begin, make sure that you have a valid ALAMO license. The license file is sent via email, with installation instructions. If you have an ALAMO license file, save it on your computer before installing the software.
Install ALAMO for the first time. Download the software from the ALAMO Downloads page, then follow the instructions from the ALAMO Installation Wizard and click Next as appropriate. When the dialog box ask you to choose a location where you want to download the ALAMO system directory, follow the instructions to install ALAMO and your ALAMO license file in the directory of your choice and add it to your system PATH. To complete the installation, you will be asked to click Install or Finish.
The installation will be saved to your computer in one large zip archive. You can now locate and open the ALAMO package (it will usually be in your Downloads folder).
GAMS: ALAMO makes calls to the third-party software GAMS. A separate installation and license are required for use in GAMS, along with at least one of the mixed-integer quadratic programming solvers under GAMS, preferably GAMS/BARON. Without a GAMS license, ALAMO will attempt to use enumerative approaches that may be more time-consuming or impractical for large problems. For constrained regression only, GAMS/BARON is required. For adaptive sampling, ALAMO makes calls to MATLAB (separate installation and license are required). Additionally, adaptive sampling requires the user to install the free MATLAB codes SNOBFIT and MING. If adaptive sampling is not used, ALAMO does not require MATLAB, SNOBFIT, or MINQ.
Installation of GAMS: The same steps apply for GAMS. Installation of GAMS is optional but recommended. Install MATLAB if the adaptive sampling capabilities of ALAMO are needed. Octave does not work in the place of MATLAB.
ALAMO AT A GLANCE
You can run ALAMO on three platforms, including Windows 64-bit (installer), Linux 64 bit, and Mac OS X 64 bit. All features are available on all platforms.
ALAMO has many user-friendly features, including:
- Spreadsheet-based GUI: Copy data from Excel and bring it into the GUI and then plot your results to visualize the results. Input and store data, navigate and select cells, insert/delete, extend data, and much more.
- Simple command-line interface: Type commands to ALAMO to perform specific tasks easily and intuitively. Automate execution in the background, call it through scripts on a large number of problems,
- Powerful embedded optimization technology: Use optimization technology alongside machine learning and statistical techniques to interrogate models, identify weaknesses, and refine experimental design parts. ALAMO is embedded with BARON, the world-leader in optimization technology.
- Insightful graphs of data and results: Generate algebraic surrogate models of black-box systems for which a simulator or experimental setup is available.
With a few clicks, you can:
- Build and learn an algebraic model of a simulation or experimental black-box system. Import and plot your data, then choose from a number of different graphical formats. ALAMO will eliminate basis functions that are not numerically robust. It will go through a number of iterations and give you the best model, along with measurements about the quality of the model.
- Use previously collected data for model building. Open up the GUI. Go to File and then Import to import your data. Make sure to save your data file in Excel format.
- Call a user-specified (simulation) function to collect measurements.
- Enforce physical constraints. response variable bounds, physical limits, and boundary conditions.
- Use a preexisting data set for model validation.
- Output models in simple algebraic form.
Build accurate algebraic models from data and first principles. Consider a system for which the outputs zare an unknown function fof the system inputs x. ALAMO identifies a function f (i.e., a relationship between the inputs and outputs of the system) that best matches the data (pairs of x and corresponding z values), which are collected via simulation or experimentation.
ALAMO AT A GLANCE
To identify low-complexity surrogate models, ALAMO uses a minimal amount of data for a system that is described by a simulator or experiment. Compared to common techniques that investigate model sensitivities with respect to one basis function at a time (i.e., forward or backward regression), ALAMO’s best subset selection techniques ensure that its model-building steps account for the synergistic effects between different basis functions.
ALAMO’s three steps to construct surrogate models are:
- Generate an initial design of experiments and query the simulation.
- Build an algebraic model using the initial training set. Build the model using integer optimization techniques to select the best subset from a collection of potential sets of basic functions that can be used to build up the model.
- Identify an adaptive sampling methodology based on derivative-free optimization techniques in order to identify points where the model is inaccurate. Once these points are added to the training set, execution returns to the second step of the algorithm. The process continues until the third step confirms the accuracy of a previously built model.
Nonlinear integer programming techniques: Before ALAMO was developed, best subset selection techniques were too time-consuming for application to realistic data sets. While developing ALAMO, engineers devised nonlinear integer programming techniques that rely on the BARON software to solve these models in realistic computing times for many industrially relevant systems.
Systematic approach to interrogate models: ALAMO utilizes derivative-free optimization techniques in its adaptive sampling step. These techniques provide a systematic approach to interrogate models, identify weaknesses, and guide experimental design toward parts of the space requiring more attention.
Constrained regression: ALAMO also features constrained regression, which is capable of enforcing theory-driven requirements on response variables, including response variable bounds, thermodynamic limitations, and boundary conditions. To enforce these requirements over the entire domain of input variables, ALAMO relies on BARON to solve semi-infinite nonconvex optimization problems.
Advanced methodology: The types of problems addressed by the ALAMO software have long been studied in the fields of statistics, engineering, and computer science. Though similar software can fit data to models, ALAMO’s capabilities extend even further. The bibliography at the end of this document offers more details of the methodology implemented in ALAMO and demonstrates the advantages of this methodology in comparison to currently utilized approaches, including classical regression and the lasso.
ALAMO AT A GLANCE
To run ALAMO using Example 1 that comes in the ALAMO software package, start with opening up the GUI. The GUI opens up a window that looks like a spreadsheet.
Import your data. Go to File then Import. Import Example 1. You will see one input (which is variable X), one output (which is variable Y), and 11 data points.
Plot the data. Click Plot Data from the upper left side of the screen. The data will automatically plot z versusx1, with an independent x-axes and a common y-axis.
Plot a histogram. To do a histogram, you will plot zinstead of x1. In the x-axis dialog box, select the x1 variable. In the y-axis dialog box, click z for the variable. Then, select Histogram from the variable options in x axis. Now you can see how your measurements are distributed.
Accept the data. If you are satisfied with the data, select the Run ALAMO tab. You will be presented with options telling you what functions you have allowed in the model.
Select the information criterion. Under Basis functions options, you can select the information criterion.
Select the output format from a number of different formats. UnderMiscellaneous options, you can select the output in the FUNFORM dialog box, such as EXCEL format. Hovering over each format option will allow you to see descriptions for each option as well as alternatives.
Run ALAMO. To run ALAMO, click the Run ALAMOstart button. ALAMO will run very quickly and you will see a ALAMO terminated successfullydialog box. In it, you will be presented with various information (i.e., software version, platform used, etc.,).
Discard basis functions. While running, ALAMO will eliminate basis functions that are out of bounds, keeping only the ones that are numerically robust.
View results and measure the quality of the model. In the View Results tab, you will see quality metrics for the output variable (i.e., the sum of squared errors, the Bayesian information criterion, etc.,).
Check the important measure. Under Solution statistics, The important measure is that the worst case error is zero. Under Solution statistics, check that the the Max abs error is 0.0%.
Select your plot type. Under the Model fitting box, select the plot type from the drop-down options (i.e., parity plot, scatter plot, etc.,). The plot will be along the diagonal with all the measurements and calculation values. Your data and predictions will fit the measurements exactly.
ALAMO EXAMPLES
ALAMO reads model data and algorithmic options from a text file in a relatively simple format. Although not required, it is strongly recommended that all ALAMO input files have the extension ‘.alm.’ALAMO will parse test.alm and solve the problem if the input file is named ‘test.bar’ and the ALAMO executable is named ‘alamo,’ issuing the command
alamo test
or
alamo test.alm
In addition to screen displays, ALAMO also provides results in the listing file ‘test.lst’ generated during the run. The .lst file is always stored in the execute directory, even when the .alm file is in a different path. During execution, ALAMO creates and utilizes a directory for storing various work files. When calling ALAMO, the user may include a second optional command line argument in order to specify ALAMO’s working directory,
alamo test.alm myscratchdir
where ‘myscratchdir’ denotes the name of ALAMO’s scratch directory. If this argument is not specified, ALAMO will create and utilize a directory named ‘almscr’ in the execute directory. If the scratch exists, ALAMO erases it in the beginning of the run.
ALAMO EXAMPLES
The following file is referred to as ‘e1.alm’ and pertains to learning the simple function z = x². The model contains one input and one output. The input is restricted between −5 and 5. An initial sampling data set is specified and contains 11 preexisting data points. The user options do not call for adaptive sampling, effectively requesting the best possible model that can be derived from the preexisting data set. Finally, the following functions are permitted in the model: linear, logarithmic, exponential, sine, cosine, and monomials with powers 2 and 3.
! Example 1 with data from z = x^2 ninputs 1
noutputs 1
xmin -5
xmax 5
ndata 11
linfcns 1
logfcns 1
expfcns 1
sinfcns 1 cosfcns 1
monomialpower 2 3
BEGIN_DATA
-5 25
-4 16
-3 9
-2 4
-1 1
0 0
1 1
2 4
3 9
4 1
6 5
2 5
END_DATA
Several additional examples of ALAMO input files accompany the distributed code.
ALAMO EXAMPLES
The following rules should be followed when preparing an ALAMO input file:
- The name of the input file should include its exact path location if the file is not present in the execute directory.
- The name of the input file should not exceed 1000 characters in length.
- The input should not be case sensitive.
- Most options are entered one per line, in the form of ‘keyword’ followed by ‘value’. Certain vector options are entered in multiple lines, starting with ‘BEGIN <keyword>’, followed by the vector input, followed by ‘END <keyword>’.
- Certain options must appear first in the input file. This requirement is discussed explicitly in option descriptions provided below.
- With the exception of arguments involving paths, character-valued options should not contain spaces.
- Blank lines, white space, and lines beginning with *, #, %, or ! will be skipped. Inline comments that are preceded by #, %, or ! are permitted in any line containing alphanumeric options.
- Blocks of comment lines are allowed using ‘BEGIN COMMENT’, followed by the block of comment lines, followed by ‘END COMMENT’; these comment blocks are entirely ignored by ALAMO.
ALAMO DATA AND OPTIONS SPECIFICATION STATEMENTS
The following parameters must be specified in the input file in the order listed below.
|
Parameter |
Description |
|
NINPUTS |
Number of model input variables. NINPUTS must be positive integer. It defines the dimension of the vector x. |
|
NOUTPUTS |
Number of the model output variables. NOUTPUTS must be a positive integer. It defines the dimension of the vector z. |
ALAMO DATA AND OPTIONS SPECIFICATION STATEMENTS
The following parameters must be specified in the input file in the order listed below and only after the scalar required parameters have been specified.
| Parameter | Description |
| XMIN | Row vector specifying minimum values for each of the input variables. This should contain exact NINPUTS entries that are space delimited. |
| NOUTPUTS | Row vector specifying maximum values for each of the input variables. This should contain exact NINPUTS entries that are space delimited. |