This project explores adding an interoperability layer for Java 8 streams to Scala.
Java 8 has added streams, a type of high-level parallelizable iterator that supports closure-based processing. The stream API, residing at java.util.streams
, presents a set of interfaces for generic (object-based) streams plus streams manually specialized for primitive doubles, longs, or ints. A stream is created using a single generator method. Zero or more intermediate operations may be requested by calling methods. Finally, a terminal operation will run, possibly traversing the entire stream, that computes some sort of result or produces a side-effect. Side-effects are discouraged, as parallelized streams make few guarantees about order of processing.
Streams are implemented using Spliterator
s, a type of iterator that can partition itself into multiple pieces for parallel processing. Although in principle this can be done with any Iterator
, it is generally more efficient to implement the partitioning at a lower level. By using or not using the partitioning capability of Spliterator
s, streams can effectively work either in parallel or serial.
Streams have many fewer built-in operations than do even the most limited of the Scala collections.
Generating operations for Stream
are static methods of the Stream
class and include
concat
(equivalent to Scala++
but as a static method)empty
(same as Scalaempty
on companion object)generate
(same as ScalaIterator.continually
)iterate
(same as ScalaIterator.iterate
)of
(same as Scalaapply
on companion object--both single and varargs forms)
plus there is a builder
method to create a Stream.Builder
(which works much like Scala builders).
Intermediate operations, which have a new stream as a result type and perform their operations lazily, include
distinct
(same as Scaladistinct
)filter
(same as Scalafilter
)flatMap
(same as ScalaflatMap
)flatMapToDouble
(manually specializedflatMap
)flatMapToInt
(manually specializedflatMap
)flatMapToLong
(manually specializedflatMap
)map
(same as Scalamap
)mapToDouble
(manually specializedmap
)mapToInt
(manually specializedmap
)mapToLong
(manually specializedmap
)peek
(no direct equivalent--runs a side effect as the stream is consumed)skip
(same as Scaladrop
)sorted
(two variants, equivalent to Scalasort
andsortWith
)
Terminal operations, which eagerly (but perhaps incompletely) evaluate include
allMatch
(same as Scalaforall
)anyMatch
(same as Scalaexists
)collect
(two forms; similar in principle to Scala'saggregate
)count
(same as Scalasize
, but returns along
)findAny
(similar to ScalaheadOption
, but may be any element)findFirst
(same as ScalaheadOption
)forEach
(similar to Scalaforeach
, but explicitly has "anything goes" order)forEachOrdered
(similar to Scalaforeach
, but guarantees processing in natural order)max
(would be calledmaxWith
in Scala)min
(would be calledminWith
in Scala)noneMatch
(equivalent to Scala!exists
)reduce
(three forms, one the same as ScalareduceOption
, and two likefold
)toArray
(two forms, essentially the same as ScalatoArray
)
Streams also have methods to switch between parallel and sequential processing, and between ordered and unordered representations.
Scala iteroperability with Java 8 Streams should accomplish five goals.
- Seamlessly use Java 8 Streams as in Java 8, but with the syntactic advantages of Scala.
- Easily use Java 8 Streams as a Scala collection (perhaps behind an
asScala
guard). - Provide the full set of Scala collections methods transparently and with minimal runtime penalty on top of Java 8 Streams. May or may not be the same as 2.
- Generate
Spliterator
s for Scala collections that are compatible with Java 8 Streams and can be used in Java or Scala. In particular, this will enable all of Scala's collections to run operations in parallel. - Reduce the specialization burden for
Object
vs.double
,int
, orlong
. - Provide ancillary interoperability for Java 8
Optional
andPrimitiveIterator
, which are used by streams.
Special care must be taken to avoid superfluous boxing of Array
-based streams. Note that java.lang.Arrays
contains a profusion of manually specialized methods to accomplish this in Java.
This should not require any extra tooling; basic Scala-Java compatibility should suffice. However, comprehensive unit tests should be written to make sure they do suffice.
An implicit value class can be used to add an asScala
method to each Stream
class. This method can instantiate a wrapper class that implements the Scala methods in terms of the Java ones (extending TraversableOnce
, most likely).
To be determined: do we wrap only Stream
, requiring calls to boxed
to work with IntStream
etc., or do we wrap directly? What do we do about the lack of specialization in Scala collections?
Implementing Scala methods in terms of Java 8 Stream methods should not require any state. Thus, a value class should be able to implement the Scala methods, possibly with type classes to provide specialized functionality for Double etc. specialized versions.
Note that to keep the JVM from having to do too much work, the value classes should @inline as many methods as practical, and the library should (as usual) be compiled with -optimize. The inlining burden can quickly grow beyond the JVM's capability to handle.
This is one of the most challenging goals to achieve since in many cases acceptable performance requires an implementation that has access to private methods.
Adding the functionality using implicit conversion would reduce the intersection with the rest of the library, though it would need to be determined to what extent pattern matching would be needed to figure out the underlying type of the collection. By making key methods private[collection] instead of private, adequate safety should be maintained while still allowing the implicit conversion strategy to work.
Should investigate replacing the profusion of manually defined methods in Java with Scala specialized variants that defer to type class selected implementations. Note that Java does not allow you to abstract over type of stream, despite all four interfaces having nearly-identical method names (e.g. map
on IntStream
maps from Int
to Int
, and there is a mapToObj
instead of the mapToInt
on object Stream
).
This may be too awkward to succeed, but inspiration can be taken from Spire.
Neither Optional
nor PrimitiveIterator
have particularly engaging interfaces.
PrimitiveIterator
adds one method, forEachRemaining
with a T_CONS
type, and the specialized variants .OfInt
etc. specify manually specified subinterfaces of Consumer
to perform the operation, plus they provide e.g. nextInt
to get the next int
(for .OfInt
). Unless we can provide specialized Scala iterators, there is little point customizing anything here save to provide a mapping from collections typed with [Int]
so that we can use them as a source for IntStream
.
Optional
follows the same pattern of specialization as Stream
itself. Providing conversions to and from Scala Option
are probably adequate.
However, if the pattern for converting manually specialized implementations into Scala @specialized
ones works particularly smoothly, we may wish to consider using the same strategy to specialize parts of the Scala library, and then provide more robust interoperability. (This is probably best deferred for 2.13, however.)