-
Notifications
You must be signed in to change notification settings - Fork 89
Nemerle for OOP Programmers Week 4
This week we will learn about the remaining patterns, get some insights about performance of functional programs, and finally learn how to define polymorphic (generic) types.
This is the last lesson about FP during this course.
This section discusses a few matching constructs that remain to be covered.
The type check pattern checks if the given value has the given type.
It is similar to is
in C# and Java, even the syntax looks alike:
using System.Console;
def check (o : object) {
match (o) {
| i is int => WriteLine ($ "an int, $(i * 2) / 2!");
| s is string => WriteLine ($ "a string: $s!");
| _ => WriteLine ("something else");
}
}
check (21);
check ("foo");
check (3.0);
/* Output:
an int, 42 / 2!
a string: foo!
something else
*/
In this matching pattern, you can see how i
is used as an
int
, and s
as a string
. The is
pattern checks if the match value (o
, an object
)
has one of the given types, and if so, binds the value to specified
identifier. The identifier is given a static type for each branch in
the match. The compiler implicitly casts the value to the type of the
matching identifier. So the common C# pattern:
if (x is Foo) {
Foo y = (Foo)x;
... use y ...
} else { ... }
or:
Foo y = x as Foo;
if (y != null) {
... use y ...
} else { ... }
becomes:
match (x) {
| y is Foo => ... use y ...
| _ => ...
}
This is useful when you need to cast an object as one of its inherited base classes, or as an interface implemented by the object.
Note that the C# construction as
doesn't have anything to do
with the Nemerle as
pattern, presented below.
We have already seen the record pattern as a subpattern of the constructor pattern used for variants. The idea is to take an object and match on its fields:
class MyClass {
public foo : int;
public bar : string;
}
def c = MyClass ();
match (c) {
| (foo = 0, bar = null) =>
System.Console.WriteLine ("fine");
| _ =>
System.Console.WriteLine ("oops");
}
Field matching is quite flexible: you can skip some fields and reorder them.
You can also explicitly state the type of the class you're matching on. This is useful when the compiler complains about unknown types (there are some limitations of type inference in the current implementation, that may prevent it from working here). For instance, you could write a variation of the match above as:
def check_class (c) {
match (c) {
| MyClass where (foo = 0) => true
| _ => false
}
}
def result = check_class (MyClass ())
The record pattern can also match on individual readable properties:
class MyClass {
[Accessor]
public foo : int;
public bar : string;
}
def c = MyClass ();
match (c) {
| (Foo = 0) =>
System.Console.WriteLine ("fine");
| _ =>
System.Console.WriteLine ("oops");
}
Properties do not take part in tuple-pattern -> record-pattern automatic conversion, where you can write a tuple pattern that is automatically translated to record pattern (we talked about this in Week 3).
The as
pattern binds a value matched by a subpattern to
an identifier. This is useful when you want to check if a value
has a given structure, but you're interested in still handling
it as a whole, for example:
match (construct_list ()) {
| [1, _] as l =>
handle_one_something (l)
| _ => {}
}
Another example, perhaps more interesting, is:
variant Foo {
| A { x : int; mutable y : string; }
| B
}
match (some_foo ()) {
| A (3, _) as a =>
a.y = "three";
| _ => {}
}
Here the compiler knows the static type of a
is Foo.A
,
so you can assign values to its y
field.
This pattern matches to explicitly declared types. It is used for hinting the compiler about the static type of given matched value or subvalue. If the compiler gets the wrong idea about a type you have left for it to infer, it will scream. If it doesn't know the type, it will appreciate the hint. Example:
def foo (l) {
| (s : string) :: _ => s [0]
| [] => '?'
}
assert (foo (["foo", "bar"]) == 'f');
Here we hint the compiler that the type of s
(the first
element of the list) will be string
so we can use the
indexer to get its first character (at the time of this writing,
there is a bug in our type inference engine that prevents
deducing the type of s
early enough for this snippet to
work unhinted).
The assert
macro checks if given condition is true, and if
it's not, it throws an exception.
The example above can be rewritten as:
def foo (l : list [string]) {
| s :: _ => s [0]
| [] => '?'
}
assert (foo (["foo", "bar"]) == 'f');
with the same effect, although it is slightly longer (because we have
to specify the list
part of the type).
You can combine several matching clauses in a single branch. For example:
match (my_list) {
| [1]
| [1, 2] => WriteLine ("one or one, two");
| [x]
| [7, x] => WriteLine ($ "$x or seven, $x");
| _ => WriteLine ("something else");
}
Here, the first two branches match against multiple lists. The second
branch even matches against some value x
, appearing in both its
clauses. If you use a value in a matching clause, it can't change types,
and it has to appear for each clause in the branch. In other words,
you couldn't do this:
| [7] // x needs to appear here,
| [7, x] => WriteLine ($ "$x or seven, $x"); // or not here
There is a way to work around this limitation, by using a with clause.
When using alternative clauses it is sometimes useful to match some
smaller pattern with a value x
, and a bigger pattern with
x
and another value y
. Now when we construct a
single matching branch for the two patterns, we cannot use y
in the smaller, as it doesn't occur there. The rub is we have to
include y
for all matches in the branch, because of the
rule on matching values explained above.
The with
clause solves this problem by providing a way
of specifying a default y
value in such cases. For example:
def foo (_) {
| [x] with y = 3
| [x, y] => x * y
| _ => 42
}
assert (foo (3) == 9);
assert (foo (3, 4) == 12);
assert (foo (3, 4, 5) == 42);
The ability to match irregular length lists by providing with
clause defaults comes in handy when creating more complex patterns.
We now switch gears to the topic of FP performance. Tail calls, an FP method of implementing functions, are covered in detail. We will also examine looping and in-lining, which are compiler optimizations for tail calls that minimize stack use and function call overhead. Accumulators, an FP construct implemented with tail calls, are also introduced here.
When you have a while loop, like this:
while (true)
// do something
it is possible to rewrite it like this:
def my_loop () {
// do something
my_loop ();
}
my_loop ();
This code calls my_loop
which first executes the body of the
loop, and at the end calls itself (and executes the body of the loop, and
calls itself, and...). As you can see the result is the same as with the
while loop.
If the while loop has a condition, it is also possible to rewrite:
while (some_condition)
// do something
into:
def my_loop () {
when (some_condition) {
// do something
my_loop ();
}
}
my_loop ();
The performance of the two pieces of code is exactly the same. The
Nemerle compiler can see that there is nothing left to do after
call to my_loop
inside my_loop
, and replaces this
call with an internal jump. Moreover, it notes that my_loop
is called just once outside the body of the function, and therefore
inlines it, replacing the
call with the function body code itself. These optimizations together
eliminate stack overhead in this example.
In fact, the compiler internally transforms all kinds of loops into local functions.
Calling some function f
within f
, in such a place
that there is nothing more left to do in the body of f
, is
called tail recursion.
It can be always replaced with a jump (internally, by the compiler).
To make functional programs run fast, tail recursion is a good idea
for loops that are going to execute, say, one million times or more.
Of course, there is no point rewriting things that are more readable
as while
loops to tail recursion, except when you want to
obey some strict FP rules, like "never use mutable variables".
However there are problems where a solution with tail recursion is actually shorter and more readable (well, readablity depends on expertise with FP) than one with a loop.
An accumulator is a parameter of a function that is used to hold the result constructed so far. When the function executes for the last time, it returns the accumulator, possibly transformed using some other function.
def rev (l, acc = []) {
match (l) {
| x :: xs => rev (xs, x :: acc)
| [] => acc
}
}
System.Console.WriteLine (rev ([1, 2, 3]));
// Output: [3, 2, 1]
This function adds the first element of the input list l
to the accumulator acc
, and calls itself with the rest
of the list. It accumulates head elements until the list is
exhausted, at which point it returns acc
.
Because the result list is built from the end, while the input list is traversed from the beginning, the function returns a reversed list.
This is a common property of programming with list accumulators -- the result lists are reversed.
Note how rev
is called inside itself in a tail position.
This means the call is transformed to a jump, so it is very fast
and doesn't consume stack space.
It is also possible to use types other than list for accumulators:
def length (l, acc = 0) {
match (l) {
| _ :: xs => length (xs, acc + 1)
| [] => acc
}
}
System.Console.WriteLine (length ([4, 44, 22]));
// Output: 3
Here we use an integer as a accumulator, to compute the length of the list.
There is one more important concept related to programming with accumulators: the fold function. Fold is a built-in method for most Nemerle collections, so it is a handy shortcut to writing certain accumulator functions yourself. We will talk about fold for lists here. The signature is:
List.FoldLeft['a,'b] (l : list ['a], ini : 'b, f : 'a * 'b) : 'b
'a
, an initial value of type 'b
,
a function relating 'a
to 'b
, and returns a value of type 'b
.
The expression List.FoldLeft ([x1, x2, ..., xN], ini, f)
is equivalent to: f (xN, f (... f (x2, f (x1, ini))...)
.
So, folding is a way to efficiently nest function calls on a list
of arguments. When folding, the function is applied to the first
element of the list, then successively to the result of that call
plus the next element, until the whole list is evaluated.
This definition in code may be easier to understand:
def fold (l, acc, f) {
match (l) {
| x :: xs =>
fold (xs, f (x, acc), f)
| [] => acc
}
}
This has a very similar form to both functions shown above. The
rev
function replaces f (x, acc)
with x :: acc
,
and length
replaces it with 1 + acc
. Now, because
function f
is a parameter to fold
, you can
implement both rev
and length
with the built-in
FoldLeft
method:
def rev (l) { List.FoldLeft (l, [], fun (x, xs) { x :: xs }) }
def length (l) { List.FoldLeft (l, 0, fun (_, acc) { 1 + acc }) }
One can also use the FoldLeft
member of the list
variant:
def rev (l) { l.FoldLeft ([], fun (x, xs) { x :: xs }) }
def length (l) { l.FoldLeft (0, fun (_, acc) { 1 + acc }) }
which makes the implementations even shorter.
Gaining proficiency with the fold function may be a little hard at first. It's good idea to take a function using direct tail recursion and try to rewrite it using fold to learn the cases where it is useful (this is only to learn things, of course there is no point rewriting already working stuff just to use fold).
As we wrap up our exploration of functional performance, let's take a moment to look at the topic of symbol redefinition.
From Week 2, recall this list filter example:
def filter (l, f) {
match (l) {
| x :: xs =>
def xs' = filter (xs, f);
if (f (x)) x :: xs'
else xs'
| [] => []
}
}
For clarity, we used xs'
as the name for the filtered
result of list xs
. But, we could have simply reused xs
.
It would look like:
def xs = filter (xs, f);
While this reminds us of assignment in imperative programming
(from this point on in the program the new xs
is in
scope, so it looks as if the original was changed), it is quite
different from the theoretical point of view.
Note that you can rebind an identifier to a different type. For example, the fragment:
def x = [1, 2];
def x = "foo";
is perfectly valid. You could even do that with mutable variables, though a warning would be generated. However, the more common use is to rebind an identifier to the results of some computations on its previous meaning. For example:
def result = 2 * 2;
def result = $ "the result is: $result";
Both bindings of result
have similar semantic meaning,
though they are quite different from the compiler point of view.
The second result
is an entirely new entity, and not
related to the first in any physical way.
The advantage of this approach is that you won't accidently use
xs
instead of xs'
in your code. The disadvantage
is that it is less clear what is going on as xs
changes
its meaning.
The Nemerle compiler will warn about redefinitions of mutable symbols (and redefinitions with mutable symbols), because it is not a very mainstream programming concept, and the relative likelihood of misunderstanding the difference between redefinition and assignment.
Generic types allow the definition of strongly typed containers that can hold values of arbitrary types. An example of such a container is the array -- the type of elements stored must match the type of the array. Similarly, lists contain only elements that match the list's type. The difference is that for arrays and lists, the type must be specified at compile time.
Restricting a list's type improves static code safety. Instead of
using a hold-anything list
type, object, you can restrict
your list to a single type, say list [int]
. This prevents
adding a string to this list, and gives the guarantee that when you
read an element from it, it will be an integer. This saves you a
runtime cast, which is costly and can crash the application if you
don't provide extra error handling to handle miscasts.
Of course there are cases when you would want ints and strings in a
single list; you can use list [object]
in such cases. Then
you can add anything to the list, but you need to cast the values
taken from it during runtime, with the drawbacks previously mentioned.
Generics offer a middle road: they allow you to build classes that enforce type consistency internally, but don't explicitly define the type, allowing the user to use instances of the class with any arbitrary type at runtime. With generics, you can build type flexibility into your classes, while avoiding the problems associated with runtime casting.
Let's start with a naive Set
implementation:
class Set ['a]
{
mutable storage : list ['a] = [];
public Add (e : 'a) : void
{
when (! Contains (e))
storage ::= e;
}
public Contains (e : 'a) : bool
{
storage.Contains (e)
}
}
def s1 = Set ();
s1.Add (3);
s1.Add (42);
assert (s1.Contains (3));
// s1.Add ("foo"); // error here!
def s2 = Set ();
s2.Add ("foo");
assert (s2.Contains ("foo"));
Instances of the same Set
class are used at runtime to handle
sets of integers and strings. Having type affinity, however, the same
set can't store elements of different types.
Here, the generic parameter 'a
is used to define the type of the
storage
list and method parameters. We talked about generic
parameters in Week 2,
refer to it for more details.
We can add elements to the set and check if a given element is already there. This is not a very functional approach as the set is updated in-place. So another implementation (and interface!) would be:
class Set ['a]
{
storage : list ['a] = [];
public this ()
{
this ([])
}
this (s : list ['a])
{
storage = s;
}
public Add (e : 'a) : Set ['a]
{
if (Contains (e)) this
else Set (e :: storage)
}
public Contains (e : 'a) : bool
{
storage.Contains (e)
}
}
def s1 = Set ();
def s1 = s1.Add (3).Add (42);
assert (s1.Contains (3));
// s1.Add ("foo"); // error here!
def s2 = Set ().Add ("foo");
assert (s2.Contains ("foo"));
A few details of this class bear some explanation:
- The public constructor,
this ()
, is used when creating instances in outside code - A private constructor,
this (s : list ['a])
, is used in the Add method when returning a new instance of Set containing the new element - The internal object reference,
this
, is also used in Add to return itself if the element already exists in the list
def s1 = s1.Add (3).Add (42);
may also raise questions.
Because .
binds to the left, elements are added from left to
right, with the last element added being the first in the list. It is
equivalent to:
def s1 = s1.Add (3);
def s1 = s1.Add (42);
The key idea here is that we create a new set instance for each element added. When we don't explicitly save older instances, they are garbage collected, as is the case here. However, when we do save an instance of a set by maintaining a reference to it, it will not be affected by adding new elements later. There are cases when this is useful.
For example, a map holding name -> value bindings in the compiler needs to be restored when the scope ends. If we use an immutable map for this, all we have to do is save the state before we enter the scope, and restore the saved reference when we leave.
Additionally, it is easy to wrap immutable structures in a mutable object. This is exactly what we did in the first example -- we wrapped an immutable list in a mutable set. The Nemerle standard library contains a handful of immutable collections.
We've already seen how to define generic methods. You just need to specify the type parameters:
public Length['a] (l : list ['a], acc = 0) : int
{
match (l) {
| _ :: xs => Length (xs, acc + 1)
| [] => acc
}
}
You are required to specify the generic parameters you're going to use. So, the following declaration is not valid:
class C {
public foo (x : list [T]) : void { }
}
but this one is valid:
class C {
public foo[T] (x : list [T]) : void { }
}
There are three kinds of generic specifiers. They are all used to specify generic parameters explicitly in the code. All three cases are optional, the compiler will always try to infer the generic parameters.
This is a very obvious one. If you have a Set['a]
, like the
one before, and you want to ensure a set of ints is being created use:
def s = Set.[int] ();
s.Add ("foo"); // error!
def s2 = Set.[int] ();
s2.Add (3); // OK
This is also quite simple, when a method has generic parameters (like
the Length
method above) you can supply them:
def l1 = Length.[string] (["foo","bar"]); // OK
def l2 = Length.[string] ([1, 2, 3]); // error
And this one is more complicated. The type parameters are in scope also in static members. This means the following code is valid:
class Foo['a] {
static public bar () : void
{
System.Console.WriteLine (typeof ('a));
}
}
When you try to call Foo.bar ()
the compiler cannot know
what 'a
you had in mind (in this case it will assume
System.Object
). However you cannot do Foo.bar.[int] ()
because the bar
method is not generic. Therefore
you can do: Foo[int].bar ()
. You also can do:
Foo.[int].bar ()
, so the dot is optional (but only in this case,
we know this sucks, we're working on it).
For certain projects, like model-view-controller frameworks, it is sometimes necessary for types to be substituted with typed variables that also conform to some specific interface. We will address this issue in more detail, as it is probably new for most readers.
For example, elements stored in an ordered binary tree must provide a
comparison method so they can be properly sorted. To do this, we define
an appropriate interface, IComparable
, and then build a variant
structure, Tree
, which defines a node that can store generic
type 'a
. Further, Tree
requires that any 'a
to be stored must implement the IComparable
interface:
interface IComparable ['a] {
CompareTo (elem : 'a) : int;
}
variant Tree['a]
where 'a : IComparable['a]
{
| Node {
left : Tree['a];
elem : 'a;
right : Tree['a];
}
| Tip
}
In our variant declaration, we have added a where
parameter to
further constrain the type of 'a
that the variant will accept.
Such lower bounds on type variables, where the type variables can occur in
types that bound them, are called F-bounded polymorphism in type theory.
In fact the IComparable
interface is already defined in the
standard library, but that is beside the point.
Once we have ensured that all elements in the tree have both subtype
IComparable
and a generic type 'a
, we can use the
CompareTo
method to insert an element into a tree:
module TreeOperations {
public Insert['a] (t : Tree['a], e : 'a) : Tree['a]
where 'a : IComparable['a]
{
match (t) {
| Node (l, c, r) =>
if (e.CompareTo (c) < 0)
Node (Insert (l, e), c, r)
else if (e.CompareTo (c) > 0)
Node (r, c, Insert (r, e))
else
Node (r, e, l)
| Tip =>
Node (Tip (), e, Tip ())
}
}
}
Now, people familiar with C# or Java might ask, why not simply use something like:
interface IComparable {
CompareTo (elem : IComparable) : int;
}
variant Tree
{
| Node {
left : Tree;
elem : IComparable;
right : Tree;
}
| Tip
}
But this is only half good. The most common use for a tree is to store
elements of some specific type, for example strings. We don't want integers
and strings to be stored in the same tree, for the very simple reason that
we cannot compare integers with strings in a reasonable way. Well, even if we
could, we plainly cannot predict what other types beside integers and strings
might implement IComparable
, and thus have their value passed to
a string's CompareTo
.
The above design makes it impossible to ensure statically whether we're using
the tree with consistent types. In this scheme, when inserting nodes to the
tree, we upcast all elements to IComparable
, regardless of type. But
we will get a runtime exception when one type's CompareTo
method is
passed an argument from another, incompatible type. The second drawback is that
when we extract elements out of the tree, we need to downcast them to a specific
type. This is another possibility for runtime errors in trees of mixed type.
So, to guarantee runtime safety in such a design, we would have to implement
CompareTo
for all permutations of types we wish to support, plus
build a downcast that would somehow coerce all types into the target type.
Not only is this a lot of work, it is not likely to work very well. F-bounded
polymorphism neatly avoids this mess by ensuring not only conformance to an
interface, but to an underlying type as well. This two-layered checking makes
type safety possible in systems that are bounded by interface and type.
To better understand this issue, look at the following example:
interface IFrobincatable {
Frobnicate (x : int) : void;
}
class C1 : IFrobincatable
{
public this () {}
public Frobnicate (_ : int) : void {}
}
class C2 : IFrobincatable
{
public this () {}
public Frobnicate (_ : int) : void {}
}
module M {
f1['a] (o : 'a) : 'a
where 'a : IFrobincatable
{
o.Frobnicate (3);
o
}
f2 (o : IFrobincatable) : IFrobincatable
{
o.Frobnicate (3);
C1 ()
}
Main () : void
{
def x1 = f1 (C1 ()); // x1 : C1
def x2 = f1 (C2 ()); // x2 : C2
def x3 = f2 (C1 ()); // x3 : IFrobincatable
def x4 = f2 (C2 ()); // x4 : IFrobincatable
}
}
In the Main
function, calls to f1
always return a value
of the type passed to it. However, f2
doesn't give this guarantee:
even though the line def x4 = f2 (C2 ())
passed in a value of type
C2
, what it got back was C1
. Even though f2
compiles, users of f2
may get runtime errors or misbehavior
because their type assumptions aren't correctly supported.
Write a version of the functional Set class above using (unbalanced) trees, like the ones we've seen last week. Use the System.IComparable [T]
interface.
Write the following tail recursive functions:
-
Sum (l : list [int]) : int
returning sum of elements of the list -
RevMap ['a, 'b] (l : list ['a], f : 'a -> 'b) : list ['b]
doing exactly whatMap
does, but returning reversed list -
RevFilter ['a] (l : list ['a], f : 'a -> bool) : list ['a]
(similar) -
Map ['a, 'b] (l : list ['a], f : 'a -> 'b) : list ['b]
usingRevMap
andRev
(remember to ensure f's are called on the list in the same order as with regular map)
Rewrite the functions above (except for Map) to use FoldLeft
.