Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add OrderedSet<T> #110882

Open
ptasev opened this issue Dec 21, 2024 · 9 comments
Open

[API Proposal]: Add OrderedSet<T> #110882

ptasev opened this issue Dec 21, 2024 · 9 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@ptasev
Copy link

ptasev commented Dec 21, 2024

Background and motivation

Bringing this issue back since it was closed without being resolved. (#24828 and dotnet/corefxlab#2457):

"Sometimes I've come across places when needing a HashSet where the insertion order of the elements is important to me. Unfortunately, .NET does not have an OrderedSet class even though it has a SortedSet which to me has less value but perhaps not to others. This has led to users rolling their own solution, typically by using a combination of a LinkedList and Dictionary field resulting in the worst of both worlds in terms of performance and resulting in larger memory usage, and even worse sometimes users instead rely on implementation details of HashSet for ordering which is quite dangerous."

API Proposal

namespace System.Collections.Generic;

public class OrderedSet<T> : ISet<T>, IReadOnlySet<T>, IList<T>, IReadOnlyList<T>
{
    public struct Enumerator : IDisposable, IEnumerator, IEnumerator<T> {
        public T Current { get; }
        public void Dispose();
        public bool MoveNext();
    }
    public OrderedSet();
    public OrderedSet(int capacity);
    public OrderedSet(IEqualityComparer<T> comparer);
    public OrderedSet(int capacity, IEqualityComparer<T> comparer);
    public OrderedSet(IEnumerable<T> collection);
    public OrderedSet(IEnumerable<T> collection, IEqualityComparer<T> comparer);
    public IEqualityComparer<T> Comparer { get; }
    public int Count { get; }
    public T this[int index] { get; set; }
    public bool Add(T item);
    public void Clear();
    public bool Contains(T item);
    public void CopyTo(T[] array);
    public void CopyTo(T[] array, int arrayIndex);
    public void CopyTo(T[] array, int arrayIndex, int count);
    public void ExceptWith(IEnumerable<T> other);
    public OrderedSet<T>.Enumerator GetEnumerator();
    public int IndexOf(T item);
    public bool Insert(int index, T item);
    public void IntersectWith(IEnumerable<T> other);
    public bool IsProperSubsetOf(IEnumerable<T> other);
    public bool IsProperSupersetOf(IEnumerable<T> other);
    public bool IsSubsetOf(IEnumerable<T> other);
    public bool IsSupersetOf(IEnumerable<T> other);
    public bool Overlaps(IEnumerable<T> other);
    public bool Remove(T item);
    public void RemoveAt(int index);
    public bool SetEquals(IEnumerable<T> other);
    public void SymmetricExceptWith(IEnumerable<T> other);
    public int TrimExcess();
    public bool TryGetValue(T equalValue, out T actualValue);
    public void UnionWith(IEnumerable<T> other);
}

API Usage

var triangles = new List<(int A, int B, int C)>();
var vertices = new OrderedSet<Vector3>();

var vert0 = new Vector3(1f);
var vert1 = new Vector3(2f);
var vert2 = new Vector3(3f);

var a = vertices.IndexOf(vert0);
if (a == -1)
{
    a = vertices.Count;
    vertices.Add(vert0);
}

var b = vertices.IndexOf(vert1);
if (b == -1)
{
    b = vertices.Count;
    vertices.Add(vert1);
}

var c = vertices.IndexOf(vert2);
if (c == -1)
{
    c = vertices.Count;
    vertices.Add(vert2);
}

// Add triangle to list now that we have unique vertex indices
triangles.Add((a, b, c));

Alternative Designs

No response

Risks

No response

@ptasev ptasev added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 21, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 21, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Dec 21, 2024
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Dec 21, 2024
@Wraith2
Copy link
Contributor

Wraith2 commented Dec 24, 2024

Why does this need to be in the BCL and not as a third party nuget package?

@PranavSenthilnathan
Copy link
Member

This can mostly be implemented with an OrderedDictionary<T, byte> but has some limitations. Depending on T, this could add space overhead per element compared to having a first class OrderedSet<T>. For example, a reference type T will cause 7 bytes extra space per entry for storing the byte with padding. The other limitation of course is that the set operations (intersect, union, etc.) would need to be implemented. It's possible to still get the optimal time complexities for the operations using OrderedDictionary but realistically it'll probably be slower since you'll need to workaround the fact that there's no way to remove a range of items at once in OrderedDictionary.

@vpenades
Copy link

vpenades commented Jan 2, 2025

I am one of those that had to write my own implementation, based on an old version of Dictionary<K,V>

OrderedSet is essentially a Hash when writing, but a List when reading, and its a kind of collection that is very relevant in computer graphics.

The common usage of such a collection is when you need a hash where you want to ensure every added item is unique,, but you also need to keep track of the insertion order and reference the added items by index.

A naive implementation that explains the use case would look like this:

Dictionary<TValue,int> _HashedIndices;
List<TValue> _Values;

int GetOrAdd(TValue val)
{
    if (_HashedIndices.TryGetValue(val, out var index) return index;
    index = _Values.Count;
     _HashedIndices[val] = index;
   _Values.Add(val);
   return index; 
}

Notice this is a naive implementation that has the problem of consuming twice the memory an OrderedSet would consume, specially if TValue is a large struct (which is fairly common when using this pattern)

Btw, if OrderedSet naming is not liked, I usually go for IndexableSet.

@quixoticaxis
Copy link

quixoticaxis commented Jan 2, 2025

This is a fairly common pattern, which has the problem of consuming twice the memory an OrderedSet would consume, specially if TValue is a large struct (which is fairly common when using these patterns)

Btw, if OrderedSet naming is not liked, I usually go for IndexableSet.

Could you elaborate on the implementation?
You would still need two collections internally: one to index your structures by value in order to gurantee unique constraint, and one to index them by, well, their index.

@vpenades
Copy link

vpenades commented Jan 3, 2025

Could you elaborate on the implementation? You would still need two collections internally: one to index your structures by value in order to gurantee unique constraint, and one to index them by, well, their index.

My implementation is a modification of Dictionary<int,TValue> where the Key is implicitly set to int, and since the dictionary stores new items in an ordered way, I simply exposed a this[int index] property getter.

Actually, my implementation goes a bit further and removes the "key" of the dictionary altogether, so it saves a lot of memory.

That way you get the behaviour I exposed in the naive example, but with a single collection and less memory footprint. This is important because in computer graphics it is not uncommon to deal with millions of items.

@quixoticaxis
Copy link

quixoticaxis commented Jan 3, 2025

My implementation is a modification of Dictionary<int,TValue> where the Key is implicitly set to int, and since the dictionary stores new items in an ordered way, I simply exposed a this[int index] property getter.

Actually, my implementation goes a bit further and removes the "key" of the dictionary altogether, so it saves a lot of memory.

But how do you guarantee that the elements are unique, if they are values in the dictionary? And how you get the index of an existing item?
The implementation you posted above provides O(1) uniqueness check (given a fine hash function) and O(1) offset access. How do you guarantee those with one collection?

@vpenades
Copy link

vpenades commented Jan 3, 2025

But how do you guarantee that the elements are unique, if they are values in the dictionary? And how you get the index of an existing item?

Because under the hood, Dictionary<TKey,TValue> is a list with extra sugar to handle access by key.... but if you remove the extra sugar for the key thing, you can simplify the whole code so the "key" is an index.

Here's my implementation (don't mind the naming):

https://github.com/vpenades/SharpGLTF/blob/master/src/SharpGLTF.Toolkit/Collections/ValueListSet.cs

@quixoticaxis
Copy link

quixoticaxis commented Jan 3, 2025

But how do you guarantee that the elements are unique, if they are values in the dictionary? And how you get the index of an existing item?

Because under the hood, Dictionary<TKey,TValue> is a list with extra sugar to handle access by key.... but if you remove the extra sugar for the key thing, you can simplify the whole code so the "key" is an index.

Here's my implementation (don't mind the naming):

https://github.com/vpenades/SharpGLTF/blob/master/src/SharpGLTF.Toolkit/Collections/ValueListSet.cs

Sorry, if I miss something, I have not had the chance to run the code yet, but it looks to me that it heavily relies on the fact that there is no removals and thus there could not be a scenario when an element at index 9999 is in a collection with 1 element (scenario: 9999 elements added, each one except the last removed after the addition) , which seems possible to achive with your naive implementation above, and is probably not what issue's creator proposes: the proposed API includes Remove and RemoveAt methods.

The same as with the original author's API, why do you suppose it should be included in BCL and not provided by a NuGet for specialized tasks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests

6 participants