-
Notifications
You must be signed in to change notification settings - Fork 5
/
TODO.txt
30 lines (30 loc) · 4.17 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
TODO: Consider switching interfaces of most functions to use std::vector instead of old-style C arrays/pointers. Returning vectors should be fine and less for users to think about for deallocation.
TODO: Create functions for integer division with ceiling (round up) and division with round.
TODO: Explore parallelism at small work load, such as small arrays. For example, sum of an array I was exploring in the book didn't improve in performance when I tried to reduce the overhead of small tasks in the body
of the recursive tree. The mistake I made is using a large tree. Instead, this reduction should pay off big time for small trees, possibly with small parallel_threshold (cutoff) points. It may enable smaller parallel cutoff,
which would be a win for parallelism for smaller arrays.
Idea: The whole "parallelism for the small" could be our value add. We could develop techniques and technologies that increase usefulness at the small array scale and enable use of parallelism and higher performance for more use cases,
expanding the market for parallelism. It would benefit not only C++, but also C#, both of which would enjoy the benefit of parallelism performance gains in more cases - i.e. not just the very large cases. This would blow the doors
off the parallel performance market.
One tool is to reduce the overhead of the "recursive body nodes" by eliminating them as I did in the book by replacing recursion with a single
level of recursion. This should work for small arrays well, to reduce the overhead of these due to their inefficiency. Allocation of the single
array for the single recursion level can also be done on the stack for small arrays, reduring overhead of allocation. This needs to be benchmarked
for small arrays to show how much benefit can be harvested for small arrays. Hopefully, it's enough to increase parallelism usefulness to smallest
of arrays, extending usefulness of parallelism to all arrays.
TODO: C++ added support for new without throwing exception to improve performance. This needs to be used everywhere, especially in Adaptive algorithms.
_Type* sorted = new(std::nothrow) _Type[src_size];
Benchmarking for comparison would be a great one too, to see how much time exception throwing takes
TODO: Change all interfaces to be either start_iterator/end_iterator or to start_pointer/size. Otherwise, size_t left and size_t right has a problem of not being able to
support zero element at zero starting location, forcing the right/end index to be exclusive and left/start being inclusive which seems ackward since they are not the same.
Another possibility is to adapt C++ method of iterators for both bounds, and C++ users are used to having the end iterator being exclusive.
Another possibility is to adapt C# method of using start/length pair for bounds specification, which I like better than C++ convention of inclusive/exclusive start/end iterator pair.
Right now we are getting away with it because sorting 0 elements and sorting 1 element are equivalent to not doing anything and it's impossible to tell the difference, as sorting
starts permuting with 2 array elements and larger, otherwise nothing is done.
TODO: For interfaces that use size_t left and right boundary, support for zero elements in the array needs to be supported in some way.
This can be done in two ways: start-length pair like C# prefers, or left-inclusive and right-exclusive like C++ prefers.
For C++ it seems like this should be handled in a C++ fashion to make algorithms have a familiar interface for C++ developers.
It's possible to add a wrapper that does inclusive-left and exclusive-right in C++, checks for zero length and returns. This handles size_t left and right
boundary condition. This may be the solution at the moment.
TODO: Add output and possibly input buffering to merge where C++ copy is used to bring inputs and outputs in and out by using SSE/SIMD instructions. (Using Microsoft's version of copy to buffer did not help.
Need to use Intel's unseq version.)
TODO: Implement adaptive algorithms that provide a way to save and restore settings for tuning of algorithms - i.e. provide persistence.