Spliterator: Concurrency in Java Parallel Stream

Vishal Kumar
3 min readMar 21, 2021

--

Parallel Lines
Image By Gurdain Bharj via Unsplash

Java 1.8 came with streams API which supports parallel execution on Collection. It is simple, elegant but when and how to use them is not well looked at by the developer community. In this post, I will explore Spliterator, which helps to use parallel stream effectively.

Q. How would you create a new list of uppercased words given a list of words.

A. Using Stream this can be done in a single line.

return Arrays.asList("hello", "world")
.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());

But shall we use parallelStream() in place of stream() ? The answer is, it depends. If the list size is small, it is not suggested to use ParallelStream. The next question could be what is “not small” for ParallelStream? As a rule of thumb, for non-trivial CPU bound operation, if the size of the Collection is more than 1000, it can be considered “not small” and Parallel Streams can become useful. Doug Lea has guidance here http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html . Going by guidance, it is not recommended to use parallelStream() in the above code snippet.

Q. What if operations on Stream are blocking I/O?

For blocking I/O bound traditionally, there would a thread pool which gets one job at a time or all jobs together and the application program hopes it does the right thing. With ParallelStream work is much simple, just adding parallelStream() in stream execution pipeline makes execution concurrent. Caution: As all streams by default share common java run time environment Fork Join Pool, it is strongly recommended to have a custom pool for blocking I/O bound execution. Many executors, along with Fork Join Pool works fine for this, though testing for different scenario is recommended.

In the above example, all words, Hello, World, With, Parallel, Stream would be printed after 5 seconds. This may or may not be desirable. Say if these words are links of action that came as part of the request, and sleep is actually blocking rest calls on the links. Few requests with a high number of links can make the application unresponsive to almost all other requests. This is the classical starvation problem.

Q. How to control the number of concurrent operations in ParallelStream.

A. For this we have to use a new Iterator which is specifically introduced in 1.8 for ParallelStream called Spliterator. Spliterator is a portmanteau of split and iterator. At a high level, this is used to split the stream into chunks of streams which can be executed concurrently, with each chunk iterated sequentially. trySplit() takes care of the first part, i.e. split. It either returns a smaller chunk if the stream is large enough or reruns null signifying stream should not be divided further for parallel execution. tryAdvance() takes care of the second part, i.e. iteration. Iteration for a given chunk of the stream should be continued till tryAdvance() returns true. Let’s see how can we override these two methods and control the number of splits.

Above is an implementation of Spliterator for List . It divides the input List into N splits passed in as the second argument of the static factory method. Disclaimer: Code has no validation and is not production-ready.

Let’s re-write our steam with I/O using the above implementation.

It is almost the same as the last try. The difference is the usage of custom Spliterator . With this change in the above solution for a given request, we will have a maximum of 2 splits. Thus, it ensures all work submitted via CappedListSpliterator would get a maximum of 2 threads, eliminating starvation.

In the above example, elements would be divided into two splits, Hello, World,With, Parallel, Stream. We should see Hello, With followed by World, Parallel and at end Stream printed with a 5-second interval. Below is my try to represent splits using pictures. The same color represents the same passage of time from the original submission to ForkJoinPool. Note this assumes the system had enough thread and no other requests are executing on the pool.

Original List in two splits, with each down arrow representing thread during the 5-sec interval

Hopefully, this will encourage you to look at Spliterator and get to know them more. Happy Learning !!!

--

--

Vishal Kumar

I am a software engineer who enjoys collaborative environment and building value !!!