ML Basis Files
By Martin Elsman
An ML Basis File, in short MLB-file, is a file that lists the SML source files that make up a project or a library. An MLB-file can also reference other MLB-files, so one can organise projects in a hierarchical manner. MLB-files are enforced not to be cyclic and works well with smlpkg package management.
Syntax and Semantics of MLB-files
MLB-files have file extension .mlb
. The content of an MLB-file is a
basis declaration, for which the grammar is given as follows. We
assume a denumerable infinite set of basis identifiers Bid, ranged
over by bid. We use longbid to range over long basis identifiers, that
is, non-empty lists of basis identifiers separated by a punctuation
letter (.). Basis identifiers can be used for giving a name to a group
of compilation units and allow for expressing source dependencies,
exactly, as a directed acyclic graph, within one MLB-file.
bdec ::= bdec bdec sequential basis declaration
| (empty) empty basis declaration
| local bdec in bdec end local declaration
| basis bid = bexp basis identifier binding
| open longbid* opening of bases
| atbdec
| path.mlb include
atbdec ::= path.sml source file
| path.sig source file
bexp ::= bas bdec end basis declaration grouping
| let bdec in bexp end let expression
| longbid
In an MLB-file, one can reference source files and other MLB-files
using absolute or relative paths. Relative paths are relative to the
location of the MLB-file. Paths can reference, so-called MLB path
variables using the $(VAR) notation, where VAR is an MLB path
variable. In particular, MLB-files can reference the Basis Library,
using the MLB path variable $(SML_LIB), by including the path
$(SML_LIB)/basis/basis.mlb
. An MLB path variable V is resolved
according to the following rules:
-
First, look for an environment variable with name V.
-
Then, look for a definition of the variable V in one of the files provided with an option –mlb_path_maps (see
mlkit -help
for details). -
Then, look for a definition of V in the user’s local file $(HOME)/.mlkit/mlb-path-map.
-
Finally, look for a definition of V in the global file /usr/local/etc/mlkit/mlb-path-map.
MLB-files may contain Standard ML style comments. The declared
identifiers of an MLB-file is the union of the identifiers being
declared by source files in the MLB-file, excluding source files that
are included using local. As an example of the use of basis
identifiers and local to limit what identifiers are declared by an
MLB-file, consult the MLB-file basis/basis.mlb
.
Every source file must contain a Standard ML top-level declaration; the scope of the declaration is all the subsequent source files mentioned in the MLB-file and all other MLB-files that reference this MLB-file. Thus, a source file may depend on source files mentioned earlier in the MLB-file and on other referenced MLB-files. The meaning of an entire MLB-file is the meaning of the top-level declaration that would arise by expanding all referenced MLB-files and then concatenating all the source files listed in the MLB-file (with appropriate renaming of declared identifiers of source files that are included using local), in the order they are listed, except that each MLB-file is executed only the first time it is imported. To be precise, MLB files can be used to hide the definition of signature and functor declarations, which cannot be accomodated using Stadard ML toplevel declarations alone.
Managing Compilation and Recompilation with MLB-files
The MLKit has a system for managing compilation and recompilation of MLB-files. The system guarantees that the result of first modifying one or more source files and then using the separate compilation system to rebuild the executable is the same as if all source files were recompiled.
Thus, the separate compilation system is a way of avoiding recompiling
parts of a (possibly) long sequence of declarations, while ensuring
that the result is always the same as if one had compiled the entire
program from scratch. As an example, consider the MLB-file
(kitdemo/scan.mlb
) for a text scanning example. It contains the
following three lines:
$(SML_LIB)/basis/basis.mlb
lib.sml
scan.sml
The source files for the project are lib.sml
and scan.sml
, which are
both located in the directory where scan.mlb
is located. Whereas each
of the source files lib.sml
and scan.sml
depends on the Basis Library,
the source file scan.sml
also depends on lib.sml
.
Compiling an MLB-file is easy; simply give it as an argument to the MLKit executable. When the MLB-file is first compiled, the MLKit detects automatically when a source file has been modified (by checking file modification dates). After a project has been successfully compiled and linked, it can be executed by running the command
run
in the working directory.
The MLKit compiles each source file of an MLB-file one at a time, in the order mentioned in the project file. A source file is compiled under a given set of assumptions, which provides, for instance, region-annotated type schemes with places for free variables of the source file. Also, compilation of a source file gives rise to exported information about declared identifiers. Exported information may occur in assumptions for source files mentioned later in the MLB-file.
There are two rules that govern when a source file is recompiled. A source file is recompiled if either (1) the user has modified the source file or (2) the assumptions under which the source file was previously compiled have changed. To avoid unnecessary recompilation, assumptions for a source file depend on only its free identifiers. Moreover, if a source file has been compiled earlier, the MLKit seeks to match the new exported information to the old exported information by renaming generated names to names generated when the source file was first compiled. Matching allows the compiler to use fresh names (stamps) for implementing generative data types, for instance, and still achieve that a source file is not necessarily recompiled even though source files, on which it depends, are modified.
Let us assume that we modify the source file lib.sml
of the text
scanning example, after having compiled the MLB-file kitdemo/scan.mlb
once. When compiling the MLB-file again, the MLKit checks whether the
assumptions under which the source file scan.sml
was compiled have
changed, and if so, recompiles scan.sml
. Modifying only comments or
string constants inside lib.sml
or extending its set of declared
identifiers does not trigger recompilation of scan.sml
.
Some of the information a source file depends on is the ML type schemes of its free variables. It also depends on, for example, the region-annotated type schemes with places of its free variables. Thus it can happen that a source file is recompiled even though the ML type assumptions for free variables are unchanged. For instance, the region-annotated type scheme with place for a free variable may have changed, even though the underlying ML type scheme has not.
As an example, consider what happens if we modify the function
readWord in the source file lib.sml
so that it puts its result in a
global region. This modification will trigger recompilation of the
source file scan.sml
, because the assumptions under which it was
previously compiled have changed. Besides changes in region-annotated
type schemes with places, changes in multiplicities and in physical
sizes of formal region variables of functions may also trigger
recompilation.
For details about the implementation of the recompilation scheme used for ML Basis Files in the MLKit, see