-
Notifications
You must be signed in to change notification settings - Fork 89
Macros extended course. Part 4
I have already touched on the matter of typing inside Nemerle macros in the second part of this article. However, this problem is so important that I decided to devote a separate article to it. Besides, in the time passed since having written the second part, I managed to study this problem more deeply and can now relate it more fully.
WARNING |
---|
I want to start by warning that this course, and especially this (fourth) part of, is meant for anyone, but beginners. I fear that an unprepared reader, having read about the problems I encountered, might wince and say something like: "oh, no, macros are too hard…" The means of solving simple problems must also be simple. However, the hard problems require adequately powerful means to solve. Powerful means can be very difficult to make simple. Macros are a powerful tool for solving the hardest problems that cannot be solved any other way. But, at the same time, macros can be very simple to use for solving the simple problems, such as creation of syntactic sugar, similar to the operator if/else, described in the previous parts. If you begin reading this part, and get the impression that everything is overly complicated, just stop reading it and start with simpler things. |
Many macro writers, having encountered the necessity of extracting and processing type information inside macros, find it difficult. There is an objective reason behind it: Nemerle comes with a sophisticated type inference system that makes extensive use of so-called "delayed typing". This means that a macro (expanded at the method typing stage) might have incomplete type information. However, applying some knowledge and experience, one could create macros of almost any complexity.
I think that in order to make effective use of the typing subsystem, one should have at least an approximate idea of how it works. Hence, I will try to give a short summary of the typing process:
One could take the following simplified view of the compilation process. The compiler:
- Reads type descriptions imported from external assemblies.
- Pre-parses files, creating a token tree, folded by brackets. At this time, a so-called "environment" is formed. The environment contains information about open namespaces and, therefore, the available macros and operators.
- Parses, building up a top-level AST (TopDeclaration) and untyped member body ASTs (PExpr).
- Builds up the type tree. The tree is built based on information about the types imported from other type trees and the TopDeclaration of parsed source files.
- Extends inheritance (sets relationships between types).
- Types type members, i.e. links textual parameter and return value descriptions with the type tree types. The compiler does not allow type inference for type members, therefore the types of all members become known at the end of this stage.
- Types member bodies. It is this stage that macro expansion and type inference are done at. So, it can be safely said that this is the most complicated compilation stage. The result of this stage is typed AST (TExpr).
- Performs various transformation of the typed AST, optimization etc.
- Generates IL.
NOTE |
---|
We should make a separate mention of type binding for properties, fields, function arguments and return types, since this information might be required by macros. |
Type-level and member-level macros working at the MacroPhaseWithTypedMembers stage can use the types bound by the compiler, since they run after stage 6, listed in the previous section. The work at this stage is done with the appropriate Builders (TypeBuilder, MethodBuilder, FieldBuilder, PropertyBuilder, and EventBuilder), which make type descriptions available. For instance, in order to query method parameter types, one could call the MethodBuilder.GetParameters() method, which returns a list of typed parameter descriptions, making it possible to get at types of specific parameters.
You can read more about this list in the first part of this article and about matching macro type parameters to the phases in the third part.
However, there is a problem. The problem is that it at the WithTypedMembers stage, it is no longer possible to change the type hierarchy and member definitions. Or, to be precise, definitions can be changed, but possibly leading to dire consequences, so the default assumption should be that only method bodies can be modified (including field initializers and property accessors).
So, what do you need to use type information from the code being compiled to generate some new top-level code? This should only be done at early compilation stages (BeforeInheritance and BeforeTypedMembers), but these early stages lack type information!
The solution to this problem comes in the form of manual type binding of types represented by untyped AST (PExpr). This could be done using the methods Typer.BindFixedType (used to be called MonoBindType) and Typer.BindType:
public BindFixedType(@type : PT.PExpr) : FixedType
public BindType(@type : PT.PExpr) : TypeVar
public BindType(@type : PExpr, allow_tyvars : bool, check_parms : bool) : TypeVar
The difference between these lies in that the first method only lets you bind a fully defined type (without the placeholder symbol "_") and returns a fixed type (FixedType), while the other two methods let you bind types that are not fully defined (using placeholder symbols "_"). The second version of the BindType method lets you explicitly state, whether type variables are acceptable for binding (using the allow_tyvars parameter) and whether it is necessary to check constraints that could be set by a generic method, i.e. the "where T: .." construct (using the check_parms parameter). In essence, when you set the last two parameters to true, the method will behave analogously to BindFixedType.
If you allow the use of variable types (allow_tyvars = true), then you will be able to bind partially defined types, such as "System.Collections.Generic.List[_]". Note the underscore in the type name in place of the generic argument. This leads to a free type variable being substituted in place of the given argument. In this way, you could get a so-called "open type". You cannot create an instance of an open type, but you can query its properties or unify it with another type (more about this later in this article).
In order to demonstrate the use of the Typer.BindType method, I created a macro that verifies whether given types are interfaces. If the types are not interfaces, then an error message is output, otherwise nothing happens. The given macro is not very practical, but it could be a part of a more sophisticated one. For instance, this capability was required by one of the developers of a Mock framework for implementation of the GenerateMockObjects macro. This macro needed to take a list of types as input and (supposedly) generate Mock objects implementing these interfaces. Of course, the author had to check whether the given types were interfaces. So, here is the code for the AssertListedTypeIsInterface macro, implementing this check at compile type:
using Nemerle.Compiler;
using PExpr = Nemerle.Compiler.Parsetree.PExpr;
[Nemerle.MacroUsage (Nemerle.MacroPhase.BeforeTypedMembers,
Nemerle.MacroTargets.Assembly)]
macro AssertListedTypeIsInterface(params types : list[PExpr])
{
def typer = Nemerle.Macros.ImplicitCTX();
foreach(tyAst in types)
{
def bindTy = typer.BindType(tyAst, true, false);
match (bindTy.Hint)
{
| Some(ty) =>
when (!ty.IsInterface && ty.ToString() != "object")
Message.Error(tyAst.Location, $"Type $ty is not a interface type");
| None => Message.Error(tyAst.Location, "Type inference not allowed here");
}
}
}
And here is the application of this macro:
using System;
using System.Console;
using Nemerle.Utility;
[assembly: AssertListedTypeIsInterface(Collections.Generic.List[_], ints,
Collections.Generic.IList[_])]
module Program
{
Main() : void { }
}
Compilation of this creates the following messages in the Output window:
Main.n(5,40):Error: Type System.Collections.Generic.List[?] is not a interface type Main.n(5,69):Error: unbound type name 'ints'
Similarly, you can use BindType and BindFixedType to bind method arguments and return types when compiling at BeforeInheritance and BeforeTypedMembers stages.
By "typing" we mean the method body typing stage.
Why only methods, but not properties or field initializers? Well, simply because field initializers, as well as property getters and setters, are first transformed into methods. Still, there are some nuances. For instance, field initializers are actually copied into type constructors. But, these are minor details. The important thing is that the typer only types method bodies (including constructors); argument and return value types are known before typing.
Simplified, method body typing consists of
- Reading untyped AST (PExpr) expressions in the order they are listed in program source code.
- Macro expansion. This results in untyped AST with no macro calls.
- Typing the AST constructed in the preceding step. This results in a typed AST (TExpr).
- Delayed typing object calculation, which could be left over from the preceding stage.
So far so good, but when you scratch a little deeper, the picture becomes more complicated…
Typing is done using the Typer object (from the Nemerle.Compiler namespace). It reads every expression (valid combination of AST branches) of the untyped AST – PExpr, checking whether the expression matches a pattern described in one of the expression-level macros visible in the current scope. If a match is found, then the expression is transferred to the macro for processing. Before transferring the expression, the typer decomposes it according to the template described by the macro. At this point, the expression can be divided into subexpressions corresponding to the macro's parameters.
For instance, if the following expression is encountered in code:
if (a > b) a - b else b - a
and the scope contains the macro "if else", then the compiler picks out the expressions:
a > b a - b b - a
and transfers them into the corresponding parameters of the standard macro:
macro @if (cond, e1, e2)
syntax ("if", "(", cond, ")", e1, Optional (";"), "else", e2)
{
<[
match ($cond)
{
| true => $e1
| _ => $e2
}
]>
}
The goal of the macro is to transform the expressions given to it as parameters into some output expression, which the typer will use for subsequent typing. Put more formally, the expression-level macro is a function X -> PExpr, where X is a zero or more PExpr, or an array of PExpr (to emulate variadic arguments). If the code returned by the macro contains other macros, then they are also expanded.
Once no more macros remain, the typer types the resulting untyped AST, transforming it into a typed one (TExpr).
It seems that everything is so easy; it is just wonderful. But the devil is, as it happens, in the details. In order to demonstrate this, lets take a look at a simple example.
mutable x = 1;
WriteLine(x);
x = 2;
The example might seem contrived, but it serves well to demonstrate the problem, while more realistic examples are too cumbersome.
So, how does the typer proceed?
It processes the code from left to right, from top to bottom, trying to type everything properly.
Encountering the line:
mutable x = 1;
the compiler separates the expression into a variable declaration:
mutable x
and the initializing value.
As you can see, the type of the variable is not set explicitly. The compiler gives it type "new type variable" (variable with no type constraints). This means that at this moment the type is unknown, but the typer counts on it being calculated and substituted as the variable type later on. Actually, it is one of the typer's functions to convert "free type variables" (not fixed, see below) into "concrete types", also known as "fixed types".
Inside the compiler, type variables are represented by the class TypeVar from the Nemerle.Compiler namespace. Fixed types are represented by the FixedType variant type (from the same namespace). FixedType inherits TypeVar and, consequently, can be used anywhere TypeVar is required. This is why all the type links in the compiler (and, therefore, in macros) are given in the form of TypeVar.
In the next section we give a detailed description of the types TypeVar and FixedType. For now, lets continue looking at the example.
So, the typer gives a "new" type (new TypeVar instance) to the variable "x" and moves on with typing the expression. At this stage, the expression is required to have the same type as the variable. This requirement is materialized by passing the variable's type as the parameter "expected" into the expression-typing function. Since the expression consists of a single literal, the typer quickly infers its type and, subsequently, the type of the whole expression – int.
Next, it is verified that the type of the expression matches the type of the variable. The type of the variable was not set explicitly so, in essence, any type will do. This leads to the requirement being satisfied; hence the type of the variable now references the type of the expression, i.e. int, in our case. In fact, the type variable for the variable "x" becomes an alias for the type variable of the expression (or, more precisely, the type variables are unified, but more about this later). This way, the typer infers the type of the variable to be int.
When the typer parses the next line, it discovers a call to the static method WriteLine of class Console, which receives the expression "x" (a reference or, in other words, a name).
The typer tries to type the expression passed as the parameter, discovers that it is a reference to a local variable and returns a link to it.
Since the method WriteLine is overloaded (this functions has tons of overloads), the compiler had to create a list of these overloads and find out which should be called in this case.
In order to do this, the compiler types the expression "WriteLine" and gets the following list of possible overloads:
WriteLine(format : string, params arg : array[object]) : void
WriteLine(format : string, arg0 : object, arg1 : object, arg2 : object) : void
WriteLine(format : string, arg0 : object, arg1 : object) : void
WriteLine(format : string, arg0 : object) : void
WriteLine(value : string) : void
WriteLine(value : object) : void
WriteLine(value : ulong) : void
WriteLine(value : long) : void
WriteLine(value : uint) : void
WriteLine(value : int) : void
WriteLine(value : float) : void
WriteLine(value : double) : void
WriteLine(value : decimal) : void
WriteLine(buffer : array[char], index : int, count : int) : void
WriteLine(buffer : array[char]) : void
WriteLine(value : char) : void
WriteLine(value : bool) : void
WriteLine() : void
Next, in order to resolve the overload, the typer attempts to match the number of parameters and their types with the number and types of the arguments passed.
Since it is known that the type of the variable "x" is int, the compiler can resolve the overload. In the process of resolution it finds that the best match is:
WriteLine(value : int) : void
In this way, the code is considered typed, and no problems are encountered.
Now, lets take a look at a slightly modified example:
mutable x;
WriteLine(x);
x = 2;
At first glance, very little has changed (only the initializing expression is gone), but this adjustment changes everything.
The typer cannot infer the type of the variable "x" until it types the expression "x = 2". This means that WriteLine overloading cannot be resolved on the first pass. The typer creates a delayed typing object – DelayedTyping, which encapsulates information about the available overloads. A reference to this object is placed in the typed AST. This way, instead of an AST for calling a concrete method, a placeholder AST is created to hold information about calling a certain, but not yet exactly known, method from a given list.
This way, when the process of method body typing ends, the typed AST contains incompletely typed fragments (unresolved overloading, in our case). All incompletely typed AST fragments are placed into a special queue. At the end of the typing process, the typer goes through this queue and tries to type each of the delayed typing objects in turn.
The reasoning is that type information acquired later may affect the preceding code fragments, so that they could be typed.
In our example, the fragment "x = 2" affects the type of the variable "x", letting the delayed typing object containing the list of WriteLine overloads to be typed successfully. This way, delayed typing makes it possible to infer types not only from initialization (as done in C# 3.0), but also from usage. Moreover, usage allows inference of not only primary types, but also type parameters. The following fragment shows parameter (TKey and TValue) inference for the Dictionary type:
def dic = Dictionary(); // no need to specify parameters!
dic["test"] = 123;
In this example, the typer infers that the first type parameter (TKey) for the dictionary type should be string, and the second (TValue) – int.
The example from the previous section was needed to explain the following problem.
As I said above, macros are expanded during the typing process. In essense, a macro operates with yet untyped AST. However, the presence of even that greatly expands macro capabilities. In essense, a macro that operates with only untyped AST is very limited; it can only perform the simplest code transformation (as, for instance, the aforementioned "if else" macro"), but for such complicated macros as custom DSL implementations (including Linq support implementation), this is obviously insufficient.
It is very easy to get type information acquired by the time of macro expansion. This can be done in one of two ways:
- Via direct PExpr typing with the Typer.TypeExpr() method. This method accepts an untyped expression (PExpr) and (optionally) the expected expression type.
- By accessing the PExpr.TypedObject property. It is set by the compiler during the typing process. This property has type object, since it is used for storing different types of objects (for instance, a typed parameter representation or a typed expression), but it usually contains a reference to a corresponding TExpr (the corresponding typed AST). Note that the value of the TypedObject property is only available after PExpr (directly, or as part of another PExpr) is typed by the Typer.TypeExpr() method.
If our example is expanded in the following way:
mutable x;
WriteLine(MyMacro(x));
x = 2;
then it is apparent that the type of the variable "x" will not be known when the macro is called. Of course, we can get the typed AST for the variable "x", but then we will only see that it is definitely a variable reference. We will not know its final type, since it will only be available later, when the compiler types the expression "x = 2".
Evidently, if the proper functioning of the macro requires knowing the final type of the expression, we cannot do what we need directly within the code of the macro. However, we can use the delayed typing subsystem and do what we need, when the required type has been inferred by the typer and accessible to us.
But, here is the dilemma… the macro must return something! What ever do we do?
This kind of predicament can be resolved by using a PExpr placeholder. Here is its definition:
public variant PExpr : Located
{
...
| Typed { body : Typedtree.TExpr; }
This placeholder stores a branch of typed AST (TExpr), and the TExpr can be used to store that very object of delayed typing:
public variant TExpr : Located
{
...
| Delayed { susp : Typer.DelayedTyping; }
The given object should be placed in the delayed typing queue, which will make the typer call it at the delayed typing stage, which will eventually give us the required expression's type and allow us to do whatever we have to. The easiest way to do this is to use the Typer.DelayMacro() method. Below, we give the overall pattern for this approach:
macro SomeMacro(param1, param2, ..., paramN)
{
def typer : Typer = Macros.ImplicitCTX();
// some actions for analysis and parameter transformation
...
// Analyses the typed expression it gets as a parameter
// and uses the results of this analysis to create a
// resulting untyped expression.
def makeResult(arg : TExpr) : PExpr
{
...
}
def tExpr = typer.TypeExpr(param2);
if (tExpr.Type.Hint.IsSome) // if the type is known...
makeResult(tExpr) // ... perform the transformation immediately!
else
{ // otherwise, put off the transformation until better times.
typer.DelayMacro(lastTry =>
if (tExpr.Type.Hint.IsSome) // if the type is known...
Some(makeResult(tExpr)) // ... perform the transformation.
else
{
when (lastTry)
Message.Error(
$"The type of ‘$param2’ could not be inferred.");
None()
})
}
}
In principle, all of this could be written slightly simpler by removing one of the checks, since the DelayMacro() method immediately attempts to perform delayed typing, calling the lambda it receives. However, our version may be more efficient, if the type is available right away, since we avoid creating additional objects (a lambda with a closure, option[T].Some(), PExpr.Typed, TExpr.Delayed, DelayedTyping and, possibly, a few other temporary objects. If your macro is to be used widely, it is better to not be lazy and make the additional check. If it is not used more than several dozen times, however, you can use the simplified scheme:
macro SomeMacro(param1, param2, ..., paramN)
{
def typer : Typer = Macros.ImplicitCTX();
// some actions for analysis and parameter transformation
...
// Analyses the typed expression it gets as a parameter
// and uses the results of this analysis to create a
// resulting untyped expression.
def makeResult(arg : TExpr) : PExpr
{
...
}
def tExpr = typer.TypeExpr(param2);
typer.DelayMacro(lastTry =>
if (tExpr.Type.Hint.IsSome) // if the type is known...
Some(makeResult(tExpr)) // ... perform the transformation.
else
{
when (lastTry)
Message.Error(
$"The type of ‘$param2’ could not be inferred.");
None()
})
}
The lambda passed to the DelayMacro method can conduct more complicated analysis. For instance, not only could you check whether the type has been inferred or not, but you can take a look inside. The macro "lock" from the standard Nemerle library (see https://github.com/rsdn/nemerle/tree/master/macros/core.n) is an excellent and informative example of such a macro. Here is its code, commented:
macro @lock (lockOnExpr, body)
syntax ("lock", "(", lockOnExpr, ")", body)
{
def typer = Macros.ImplicitCTX();
// Type the expression, use the result to block…
def lockOnTExpr = typer.TypeExpr(lockOnExpr);
// Create a delayed typing object, taking a lambda as a parameter.
typer.DelayMacro(lastTry =>
// the value of the parameter lastTry – true, means that the last
// attempt to perform delayed typing is performed, and it is
// necessary to output a diagnostic message about possible
// reasons for failure.
// get the type inferred by this point and pick it apart:
match (lockOnTExpr.Type.Hint)
{
// The type is known. It is a FixedType.Class. Check, whether it is
// a value type…
| Some(Class(typeInfo, _)) when typeInfo.IsValueType =>
// ...if it is, and this is the last attempt at delayed typing,
// then produce an error message.
when (lastTry)
Message.Error (lockOnExpr.Location,
$"`$typeInfo' is not a reference type as required "
"by the lock expression");
None() // delayed typing failed
| None => // ... the type is still unknown
// ...if it is the last attempt at delayed typing, then produce
// an error message (type could not be inferred).
when (lastTry)
Message.Error (lockOnExpr.Location,
"compiler was unable to analyze type of locked object, but it "
"must verify that it is reference type");
None()
| _ => // in some cases, the type is unknown, but it is not a value type.
// form the code for blocking and automatic unblocking…
def result =
<[
def toLock = $(lockOnTExpr : typed);
System.Threading.Monitor.Enter(toLock);
try { $body }
finally { System.Threading.Monitor.Exit(toLock); }
]>;
// inform of success by returning the result, wrapped in Some.
}
);
}
I think this code, together with comments, is clear without additional explanations. The only thing that I would like to clarify is the quotation "$(lockOnExpr : typed)". This type of quotation makes it possible to place a TExpr type value inside a PExpr. That is, the code is analogous to:
$(PExpr.Typed(lockOnTExpr))
The most important thing to look for in this example is the type analysis. The pattern (with a check):
| Some(Class(typeInfo, _)) when typeInfo.IsValueType
recognizes that the type of the expression is a regular type (i.e. FixedType.Class, not an array, a function, etc) with any number of type parameters (as set using the placeholder "_" in the second parameter of the constructor). This pattern binds type information to the variable "typeInfo". The check "when" verifies that the type is a type variable.
One of the following sections will explain how to work directly with types in FixedType and TypeVar formats.
Delayed typing can be used not only to wait until the type of some expression becomes known, but also to wait until the compiler computes other objects of delayed typing.
For instance, if you are taking apart returned values, then you need to guarantee that the result does not contain partially typed objects, and you can, as in the previous example, use the method DelayMacro(), but rather than check the type, check the expression itself:
...
typer.DelayMacro(lastTry =>
match (somePExpr.TypedObject)
{
| TExpr.Delayed(susp) when (susp.IsResolved) => // susp : Typer.DelayedTyping
Some(doSomething(susp.ResolutionResult));
| TExpr.Delayed(susp) =>
None();
...
We are now done with the relatively simple case of having a macro depend on a small number of expressions and their types. However, it happens that a macro must analyze a lot of code. Creating a delayed typing object for each uninferred type would be not only slow, but also laborous.
When we need to analyze some fragment of code (expression), and we need all of the expressions in this fragment to be fully typed, we can use the method Typer.TransformWhenAllTypesWouldBeInferred(). Here is its signature:
public TransformWhenAllTypesWouldBeInfered(
transform : PExpr * TExpr -> PExpr
tExpr : TExpr,
expr : PExpr = null,
) : PExpr
As you can see, it takes a transformation function, a typed AST (TExpr) branch and an untyped AST (PExpr) branch.
The last parameter is optional, but it must be noted that, if it is not set, then the first parameter should be null.
The essence of this method is very simple. It calls the function "transform" only after all of the subexpressions in "TExpr" are successfully typed, that is, when they contain no delayed typing objects and/or incompletely inferred types.
The implementation of this method is fairly simple, so lets show it here (with additional comments, of course):
public TransformWhenAllTypesWouldBeInfered(
transform : PExpr * TExpr -> PExpr
tExpr : TExpr,
expr : PExpr = null,
)
: PExpr
{
// ExprWalker walks the AST (either typed TExpr or untyped (PExpr)
// and performs some action on its branches.
def walker = ExprWalker();
// The method IsWellTyped() checks, whether an expression contains
// incompletely typed subexpressions (ie delayed typing objects,
// branches of uninferred type, or typing errors).
match (walker.IsWellTyped(tExpr))
{
// The tree contains branches for delayed optimization or uninferred
// types, but not typing errors.
| WellTyped.NotYet =>
this.DelayMacro(lastTime =>
// Once more, check for untyped subexpressions.
// if everything is typed, then…
if (walker.IsWellTyped(tExpr) == WellTyped.Yes)
// perform the transformation and return the result.
Some(transform(expr, tExpr))
else if (lastTime)
// We don't need to output the error, since the nested object
// or the compiler itself will do it.
None()
else
None()
);
// All subexpressions have been typed successfully.
| WellTyped.Yes => transform(expr, tExpr)
// One or more typing errors have been found among the subexpressions.
| WellTyped.Error => PExpr.Error(tExpr.Location)
}
}
As you can see, everything is simple and clear. Most of the complexity is contained in the ExprWalker object.
You can see an example of using Typer.TransformWhenAlllTypesWouldBeInfered in the source code of the "ToExpression" macro (Linq support in Nemerle) implementation:
https://github.com/rsdn/nemerle/tree/master/Linq/Macro/ToExpressionImpl.n
There, the expression that must be converted into an expression tree (i.e. into System.Linq.Expressions.Expression.Expression), must first be typed without transformation (i.e., as if it is not transformed), and then transformed into the expression tree. Also, the transformation code is called using the method TransformWhenAllTypesWouldBeInfered(). This leads to the function that takes care of the actual transformation always working with a fully typed AST, which can have no typing errors, incompletely inferred types, or delayed typing objects. This greatly simplifies the transformation function.
TransformWhenAllTypesWouldBeInfered is convenient anywhere, where complex analysis of typed AST is necessary, and where the analyzed code must be compiled.
Until now, we have only spoken about type analysis and AST transformation inside macros. However, a macro can not only analyze and transform AST, but also control the type inference process. Moreover, it might be necessary to compare one type with another for analysis.
Functions participating in type inference can also be used to check (and create) relationships between type variables.
Before going further, I have to tell you a little bit about the Nemerle type inference subsystem.
You might be surprised, but the type inference subsystem in Nemerle is based on a small interpreter of the Prolog language. To be more precise, it is not Prolog, but something very similar – a constraint solver. Constraint programming is a whole, relatively new, programming paradigm of declarative programming, which you can find out more about from Wikipedia:
http://en.wikipedia.org/wiki/Constraint_solver
http://en.wikipedia.org/wiki/Constraint_logic_programming
Nemerle uses a special Solver (constraint solver) object to infer types:
https://github.com/rsdn/nemerle/tree/master/ncc/typing/Solver.n
Here is a description from this file:
There are two kinds of type variables:
- free type variable with upper and lower bounds on types that can be substituted for it
NOTE The upper bound sets the maximum type for a variable, while the lower, the minimum type for a variable (i.e. the type that cannot be more abstract that the variable's type) – author's note. - fixed type variable, already bound to a concrete type
The constraint solver maintains a graph of type variables (vertices) and subtyping relations (edges). The graph follows several invariants:
- There are only free type variables in it.
- There are no cycles in it. If a cycle emerges, all type variables involved in it are merged into one type variable. (The graph is therefore a DAG).
- The graph is transitively closed, that is if A :> B and B :> C, then A :> C, where X :> Y stands for an edge in the graph from X to Y.
- The upper and lower bounds are also transitively closed, that is if t :> A, A :> B, B :> t' then t :> t', where :> stands for a subtyping relation.
- If t :> A and A :> t', then t :> t' (that is upper bound has to be bigger than lower bound). If t = t', then the type t is substituted for the variable A (that is A gets fixed), since it is the only type to fulfill both upper and lower limits. To maintain 1., it is then removed from the graph.
The last sentence means that Solver supports versioning. This is required for speculative typing. The thing is that inference changes the graph, and if the result of typing is not needed, then rolling them back requires returning the graph to a previous state. Moreover, Solver supports versioning for delayed typing objects. In reality, their state is stored in Solver, while the objects themselves are nothing more than a convenient abstract interface.
WARNING |
---|
Versioning creates a certain limitation. If you call (explicitly or implicitly) PushState, then you can no longer use typing results (i.e. inferred types and delayed typing objects, as well as the typed AST they produce) after calling PopState. In the case of speculative typing, this makes it necessary to repeat typing even after it is done successfully (in order to get the result). |
In order to understand how Solver infers types, we have to describe the types FixedType and TypeVar. Lets begin by taking a look at TypeVar:
public class TypeVar : System.IComparable[TypeVar]
{
[System.Flags]
public enum Flags
{
| None = 0x0000
| IsFromNull = 0x0001
| IsMonoType = 0x0002
| IsAliased = 0x0004
| IsFresh = 0x0008
}
/// Makes @type and this the same type.
public Unify(@type : TypeVar) : bool;
/// Same as Unify, but in case of failure, does not change the type
/// variable's value and does not affect compiler state.
public TryUnify(t : TypeVar) : bool;
/// Require that this be at least @type (or its descendant). Call this
/// method when you need a lower bound for a type.
/// Returns true, when this is possible.
/// If not possible, then the value of the variable is corrupted
/// and the compiler might throw an error. Therefore, if you only need to
/// check whether it is possible, then use the method TryRequire.
public virtual Require(@type : TypeVar) : bool;
/// Same as Require, but does not corrupt state in case of failure and does
/// not affect compiler state.
public TryRequire (@type : TypeVar) : bool;
/// Set @type as the upper bound for this.
public virtual Provide(t : TypeVar) : bool;
/// Same as Provide, but does not corrupt state in case of failure and does
/// not affect compiler state.
public TryProvide(@type : TypeVar) : bool;
/// Lower bound for the type.
public LowerBound : option [FixedType];
/// Upper bound for the type.
public UpperBound : option [FixedType]
/// If the lower and/or upper bounds for the type are defined, then the Hint
/// property returns a mono-type (fixed type), packed into the variant
/// option.Some(). This type is analogous to that, which would be returned
/// by fixing the type variable (using the Fix() method).
/// However, the type might be made more concrete in future (in the process
/// of type inference). If the variable is not bound by any constraints,
/// this property returns option.None().
public Hint : option [FixedType];
/// Same as Hint. The difference is that in case when the type has an
/// upper bound (UpperBound), set as type object, AnyHint will return it,
/// while Hint will return None(). This occurs as a result of having an
/// upper bound set to the type object being practically meaningless,
/// since object is the base class for all others.
public AnyHint : option [FixedType];
/// Fixes the value of the type variable. After this, the current variable
/// can no longer be modified by the type inference subsystem.
/// DO NOT USE THIS METHOD UNLESS ABSOLUTELY NECESSARY!!!
public Fixate() : void;
/// Returns a fixed value. If the type variable has not been fixed by now,
/// an exption will be generated.
/// If you require a fixed value, it might be better to use Hint or AnyHint.
public virtual FixedValue : FixedType;
/// Fixes the value of the type variable and returns a fixed type (FixedType).
/// Analogous to callingn Fixate() and FixedValue.
/// DO NOT USE THIS METHOD UNLESS ABSOLUTELY NECESSARY!!!
public virtual Fix () : FixedType;
[Nemerle.OverrideObjectEquals]
public Equals (t : TypeVar) : bool;
/// Not supported (not implemented).
public virtual IsAccessibleFrom (_ : TypeInfo) : bool;
/// The type variable is fixed (cannot be modified further).
public IsFixed : bool { get; }
/// The type variable is not fixed (analogous to !IsFixed).
public IsFree : bool { get; }
/// The variable is free (not fixed): it has neither a lower nor an
/// upper bound.
public IsFresh : bool { get; }
/// The type variable is created for a variable or a parameter that
/// has null value. The null constant has no type information, but adds
/// the constraint "the type must support null". This constraint allows
/// unification only with reference and Nullable types.
public IsFromNull : bool { get; set; }
/// The type can support null. CanBeNull becomes equal to true, when the
/// type is a reference type, Nullable, or the variable is free (IsFree).
public virtual CanBeNull : bool;
public CurrentSolver : Solver { get; }
/// Starts deep unification, i.e. calls Fix() for the current type variable,
/// as well as all of its type arguments. Avoid using this method.
public DeepFix() : FixedType;
public override ToString() : string;
public CompareTo(other : TypeVar) : int;
public override GetHashCode() : int;
}
WARNING |
---|
Since Nemerle is a language designed for .NET, your macros can get references to System.Type. Moreover, TypeVar has the method GetSystemType(), which returns a reference to a System.Type. This method can be used for IL generation. However, as macro developers, you should not use this type (even if you generate code that must manipulate System.Type). Within macros, you should only use TypeVar or FixedType. Otherwise, you are sure to encounter serious problems. First off, calling GetSystemType() fixes the type, preveneting its further inference (for instance, you might get type object where a specific type could have been inferred). Second, System.Type is not available for types defined within project code, which may lead the macro to function incorrectly in some situation. |
The main properties of this class:
- LowerBound – the lower bound for the type. The current type cannot be "below" this.
- UpperBound – the upper bound for the type. The current type cannot be "above" this.
- Hint – the current value of the type. That is, the value of the type, if it were fixed (using the Fix() method) at this moment in time. Here is how to interpret its value: if the upper bound for the type is not defined, then the type would be equal to the lower bound, when fixed; if the upper bound is defined, then it will be the value of the variable. If neither the lower or the upper bound is defined, then None() is returned, meaning that no type could be inferred (the variable is fresh, IsFresh() == true). If the variable is fixed while Hint is None(), then it gets the value "object".
NOTE |
---|
By "above" and "below" we mean the subtype relation. For instance, if we take the type System.Collections.Generic.IList[T], List[T] would be above IList[T]. IList[T], in turn, implements the interface System.Collections.Generic.ICollection[T], which is below IList[T]. Of course, ICollection is also below List[T]. The terms "above" and "below" are only applicable to types that have inheritance (or interface implementation) relationships. We exclude the types "object" and "System.ValueType" from consideration. For instance, the types int and long cannot be compared. The ability to use int in place of long is made possible by implicit type conversion. |
In practice, we almost never use the properties LowerBound and UpperBound, but it may be useful to take a look at them in order to understand which types would be inferred in the end and how it is done.
Type variable string representation (also used by the debugger) is in fact the value of the Hint property, processed in a special way. Fresh variables are visually displayed as question marks – "?". If a variable has only an upper bound, then it is upended a minus sign – "-". If a variable has only a lower bound, then the name is upended a plus sign – "+". If both the upper and lower bound are set, then the string representation will look as "(L TILL H)", where L is the type of the lower bound and H the upper.
What sets the upper bound and what the lower?
Code of the form:
mutable x = System.Collections.Generic.List.[T]();
sets the upper bound for the type variable associated with the variable "x".
Code of the form:
module X
{
Foo(_ : System.Collections.Generic.IEnumerable[T]) : void { }
Main() : void
{
mutable x;
Foo(x); // тип задается вот здесь
}
}
sets the lower bound for the type variable associated with the variable "x".
Code of the form:
mutable x = null : System.Collections.Generic.IEnumerable[T];
sets both a lower and an upper bound.
Of course, bounds can be set in turn.
It is in order to set constraints (i.e. for construction of the type graph) that TypeVar class methods are used.
Here are the main methods of this class:
Unify – allows unification of the lower and the upper bounds of two type variables. Unification looks like copying the value, but unlike copying, it acts on both variables, as well as all the variables connected with the ones that participate in the unification process. If the unified types have type arguments (the same number of them), then they are also unified. The unification process is recursive.
This way, unification can affect a set of connected variables at once. You can find out more about the unification process in any introduction to the Prolog language or in Wikipedia (http://en.wikipedia.org/wiki/Unification#Unification_in_logic_programming_and_type_theory). In fact, when we perform the operation type1.Unify(type2), we tell Solver that the two variables must be equal, letting it do the job of reorganizing all of the relationshipos such that all invariants still hold. The Solver is quite a tricky "tool"…
If t2 is unified with t3, and t1 is unified with t2, then t1 and t3 are also unified.
Unification with FixedType makes the lower and the upper bound equal to the fixed type. This, in essence, makes the unified variable an alias for the FixedType.
In case unification is not possible, the method returns true.
TIP |
---|
Unification can fail. If it happens, then the type variables are corrupted (unable to participate in future type inference and have useless values), and so all of the Solver's state is corrupted. In order to avoid corrupting Solver's state, use functions with the prefix Try (such as TryUnify) or call PushState and PopState explicitely to save and restore the state. The method TryUnify could be used to find out whether a variable can be a certain type. |
WARNING |
---|
One should be very careful, since a fresh variable could be unified with any other variable or fixed type, but it would subsequently get constraints preventing it from being unified in future. Now, if you use the Unify method, then future unification will take the constraints into account. This would certainly affect type inference. |
The compiler conducts unification in many cases, such as when it types the operation of assignment of the initial value to a varaible.
Require – establishes a relationship between two type variables, setting the lower bound for the current type (the one the method is called for). This sets the constraint on the current type to be "at least the type passed as a parameter to the Require method".
In case the constraint cannot be set, the method returns false. This corrupts graph state, so if you only need to know whether the constraint can be encorced, you should use the method TryRequire.
Provide – is practically the function complementary to Require. A call to x.Provide(y) is equivalent to y.Require(x). Although, I might have failed to grasp a subtle difference between these methods. In any case, the method Provide is there to set the upper bound for the current type variable.
In order to demonstrate the effects of methods Unify, Require, and Provide on LowerBound and UppderBound for type variables, I created a simple example.
Before I demonstrate the code, I want to describe the method Typer.JustTry[T]. It makes it possible to perform some typing action within a sandbox, so to speak. JustTry gets a reference to a function of type void -> T, saves the typing context, executes the function, then restores the typing context. This way, the function passed to JustTry could perform any typing actions and observe their results, then have all of the changes get rolled back once it finishes. This function is required by our example in order to demonstrate the actions of all three methods without creating three separate tests.
Below is the macro code that gets two expressions as parameters, types them, prints the descriptions of their types, then sets each of the constraint Unify, Require, and Provide in turn, while reporting the effects on the type variables:
using Nemerle.Compiler;
using Nemerle.Compiler.Parsetree;
using Nemerle.Imperative;
namespace TypingTestMacroLibrary
{
public macro Macro1(arg1, arg2)
{
Impl.Macro1(Nemerle.Macros.ImplicitCTX(), arg1, arg2)
}
module Impl
{
public Macro1(typer : Typer, arg1 : PExpr, arg2 : PExpr) : PExpr
{
// Type both expressions
def tExpr1 = typer.TypeExpr(arg1);
def tExpr2 = typer.TypeExpr(arg2);
// The function printType prints an expression and its current type.
def printType(expr : PExpr, ty : TypeVar) : void
{
// Only outputs messages during the main compiler pass, so that
// the messages are not duplicated when the code contains errors.
when (typer.IsMainPass)
// Printing is done using a compiler hint, since it can be seen
// in the "Output" window in Visual Studio during compilation.
Message.Hint($" $expr is '$ty'");
}
// The function printTypeAndBounds prints the values of the lower
// and upper bounds for the type
def printTypeAndBounds(expr : PExpr, ty : TypeVar) : void
{
when (typer.IsMainPass)
{
printType(expr, ty);
Message.Hint($" LowerBound: '$(ty.LowerBound)'");
Message.Hint($" UpperBound: '$(ty.UpperBound)'");
}
}
when (typer.IsMainPass)
Message.Hint("Initial state:");
// Outputs type descriptions before any additional constraints are set
printTypeAndBounds(arg1, tExpr1.Type);
printTypeAndBounds(arg2, tExpr2.Type);
// This function sets constraints, given as the func parameter and
// outputs information about the effect on the type variable.
def test(methodName, func)
{
// The method Typer.JustTry makes it possible to perform speculative
// typing. The results of such typing are only accessible within
// the lambda passed to JustTry. After the lambda is done, Solver's
// state is restored.
_ = typer.JustTry(fun()
{
_ = func(tExpr1.Type, tExpr2.Type);
when (typer.IsMainPass)
{
Message.Hint("");
Message.Hint($"After: $arg1.$methodName($arg2):");
}
// Output type information after setting the constraint.
printTypeAndBounds(arg1, tExpr1.Type);
printTypeAndBounds(arg2, tExpr2.Type);
// Fix type variables, making the Solver calculate the final type.
tExpr1.Type.Fixate();
tExpr2.Type.Fixate();
when (typer.IsMainPass)
Message.Hint("After Fixate():");
// Output textual representations of the fixed types.
printType(arg1, tExpr1.Type);
printType(arg2, tExpr2.Type);
42 // we have to return something
})
}
// Perform each test in turn, setting different constraints and
// outputing the result.
test("Unify", (x, y) => x.Unify(y));
test("Require", (x, y) => x.Require(y));
test("Provide", (x, y) => x.Provide(y));
test("Empty", _ => true);
<[ () ]> // Expression-level macros must return something.
}
}
}
If this macro is used like this:
using TypingTestMacroLibrary;
module Program
{
Foo(_ : System.Collections.Generic.IEnumerable[int]) : void { }
Main() : void
{
mutable a = System.Collections.Generic.List.[int]();
mutable b;
Foo(b);
Macro1(a, b);
ignore(a); ignore(b);
}
}
then the compiler's Output window contains the following:
After: a.Unify(b): a is '(System.Collections.Generic.IEnumerable[int] TILL System.Collections.Generic.List[int])' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.List[int])' b is '(System.Collections.Generic.IEnumerable[int] TILL System.Collections.Generic.List[int])' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.List[int])' After Fixate(): a is 'System.Collections.Generic.List[int]' b is 'System.Collections.Generic.List[int]' After: a.Require(b): a is '(System.Collections.Generic.IEnumerable[int] TILL System.Collections.Generic.List[int])' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.List[int])' b is '(System.Collections.Generic.IEnumerable[int] TILL System.Collections.Generic.List[int])' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.List[int])' After Fixate(): a is 'System.Collections.Generic.List[int]' b is 'System.Collections.Generic.List[int]' After: a.Provide(b): a is 'System.Collections.Generic.List[int]-' LowerBound: 'None' UpperBound: 'Some (System.Collections.Generic.List[int])' b is 'System.Collections.Generic.IEnumerable[int]+' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'None' After Fixate(): a is 'System.Collections.Generic.List[int]' b is 'System.Collections.Generic.List[int]' After: a.Empty(b): a is 'System.Collections.Generic.List[int]-' LowerBound: 'None' UpperBound: 'Some (System.Collections.Generic.List[int])' b is 'System.Collections.Generic.IEnumerable[int]+' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'None' After Fixate(): a is 'System.Collections.Generic.List[int]' b is 'System.Collections.Generic.IEnumerable[int]'
If we change the tests slightly:
using TypingTestMacroLibrary;
module Program
{
Foo(_ : System.Collections.Generic.IEnumerable[int]) : void { }
Main() : void
{
mutable a = System.Collections.Generic.List.[int]();
mutable b = [1, 2];
Foo(b);
Macro1(a, b);
ignore(a); ignore(b);
}
}
then the following will be output:
After: a.Unify(b): a is 'System.Collections.Generic.IEnumerable[int]' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.IEnumerable[int])' b is 'System.Collections.Generic.IEnumerable[int]' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.IEnumerable[int])' after Fixate(): a is 'System.Collections.Generic.IEnumerable[int]' b is 'System.Collections.Generic.IEnumerable[int]' After: a.Require(b): a is '(System.Collections.Generic.IEnumerable[int] TILL System.Collections.Generic.List[int])' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.List[int])' b is 'System.Collections.Generic.IEnumerable[int]' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (System.Collections.Generic.IEnumerable[int])' After Fixate(): a is 'System.Collections.Generic.List[int]' b is 'System.Collections.Generic.IEnumerable[int]' After: a.Provide(b): a is 'System.Collections.Generic.IEnumerable[int]-' LowerBound: 'None' UpperBound: 'Some (System.Collections.Generic.IEnumerable[int])' b is '(System.Collections.Generic.IEnumerable[int] TILL list[int])' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (list[int])' After Fixate(): a is 'System.Collections.Generic.IEnumerable[int]' b is 'list[int]' After: a.Empty(b): a is 'System.Collections.Generic.List[int]-' LowerBound: 'None' UpperBound: 'Some (System.Collections.Generic.List[int])' b is '(System.Collections.Generic.IEnumerable[int] TILL list[int])' LowerBound: 'Some (System.Collections.Generic.IEnumerable[int])' UpperBound: 'Some (list[int])' After Fixate(): a is 'System.Collections.Generic.List[int]' b is 'list[int]'
I think that careful examination of these tests' results can improve understanding of the principles behind the methods Unify, Require, and Provide.
In practice, there are few cases, in which Unify, Require, and Provide can be used. Much more often, we use the Hint property. However, understanding their use can help in solving complicated problems.
The macro foreach is an example of using Require:
https://github.com/rsdn/nemerle/tree/master/macros/core.n (look for @foreach in this file).
This method uses Require and its twin TryRequire to check, whether the object (formed as the result of the expression passed into the foreach macro) implements interfaces IEnumerable[T] and IDisposable. The following fragment checks, whether the expression implements the interface IDisposable:
def is_disposable = typer.JustTry (fun()
{
def expr = typer.TypeExpr (init_body);
expr.Type.Require (<[ ttype: System.IDisposable ]>)
});
And here is the fragment that checks whether the type implements IEnumerable[T]:
// If some type has been inferred...
| Some(ty) =>
// The following two lines are required for initialization of the current
// context. This is nothing more than a regrettable shortcoming of the
// compiler, and these lines should really not be necessary. However, at
// this time these lines are required for quotation to work.
def Manager = typer.Manager;
Macros.DefineCTX(typer);
// Get a type variable corresponding to the fixed type IEnumerable[T].
def iEnumerableType =
<[ ttype: System.Collections.Generic.IEnumerable[_] ]>;
// Check whetheher the type variable "ty" is compatible with Ienumerable[T].
if (ty.TryRequire(iEnumerableType))
{
// ...if so, make it a "hard" requirement
ty.ForceRequire(iEnumerableType);
// ...and continue with other work. At this point, the type is certain to
// implement Ienumerable[T].
make(iEnumerableType.Hint)
}
else error(t)
There are two things in this code that need explanation: ForceRequire is simply a version of the method Require that outputs an internal compiler error in case of failure. You could use Require instead, but the given method makes it easier to find the error. Quotation type "ttype" makes it possible to get a type variable corresponding to the type in the quotation, rather than an AST (PExpr).
You can also get a reference to a type by its description using the method Typer.BindType:
def ty = typer.BindType(<[ System.Collections.Generic.IEnumerable[_] ]>);
Here you should be most careful, since, if the type is not completely specified (missing a namespace, for instance), then it will be computed based on the current environment (currently stored in the typer). Sometimes this may even be desirable.
There is a simpler way to set a constraint. If your macro requires the compiler to allow only certain types, then you can set them directly in quotation:
macro SomeMacro(expr)
{
<[ $expr : IEnumerable[_] ]>
}
It may seem that this is not as flexible, but in practice it works wonderfully. Here you avoid a lot of boilerplate code. The compiler will verify the types and produce an error message, if it has to.
The while macro is an example of this approach:
macro @while (cond, body)
syntax ("while", "(", cond, ")", body)
{
def loop = Nemerle.Macros.Symbol(Util.tmpname("while_"));
<[
($("_N_break" : global) :
{
def $(loop : name)() : void
{
when ($cond)
{
($("_N_continue" : global) :
{
$body
}) : void; // this is what I'm talking about!
$(loop : name)()
}
}
$(loop : name)();
}) : void
]>
}
Notice the fragment ": void". If the programmer using this macro makes the mistake of using a function that returns a value, the compiler will give him a meaningful error message. The most pleasant thing in this approach is that you barely have to do any work. The compiler does everything for you.
It looks like that this is all I can tell you about Nemerle types, their inference and working with them. I hope this article will be useful to those, who want a deeper knowledge of the Nemerle macro system, as well as those who want to make sense of the compiler (and possibly join in the project's development).
- http://nemerle.org/Macros – Nemerle macros.
- http://nemerle.org/~malekith/msc.pdf – Michal Moskal. Type Inference with Deferral. Paper describing the Nemerle inference algorithm.
- https://github.com/rsdn/nemerle/tree/master/macros – Source code for the macros included in the standard library.
- https://github.com/rsdn/nemerle/tree/master/Linq – LINQ implementation based on Nemerle macros.