Thread: Mistakes in face clustering, Shit clusters, Faces not showing up #754
Replies: 26 comments 59 replies
-
That would indicate problems with the clustering algorithm. It sounds like it doesn't run at all, though, because it should create at least some clusters, IMO. |
Beta Was this translation helpful? Give feedback.
-
Like I said, it did create one cluster, with 12 faces. Those were all the same person, too, so what work it did, it did correctly. But just to clarify, the clustering algorithm should be able to create clusters with even a single face? There's no minimum number of faces that need to match before a cluster is created? Edit: is there any way to run just the clustering job from the command line? With verbosity turned up? |
Beta Was this translation helpful? Give feedback.
-
There is a minimum. It's currently at 6 detections.
not at the moment, you can manually create an entry in oc_jobs in your database, though, for \OCA\Recgonize\BackgroundJobs\ClusterFacesJob with argument |
Beta Was this translation helpful? Give feedback.
-
I see. That might explain why faces from my photos aren't showing up. Can I ask why this minimum? And is there any way to change it? |
Beta Was this translation helpful? Give feedback.
-
@IndrekHaav The clustering algorithm needs a few hyperparameters to work well. From testing it transpired that setting a minimum cluster size improves clustering because it prevents accidental face matches to agglomerate into larger clusters that don't represent a single person (Something I like to call shit clusters). I recommend trying out v3.5.0 first to see if that improves the situation for you, since we're shipping a new clustering algorithm with that release. If that doesn't help and you're adventurous, you can change the min cluster size constant here: (We've reduced the value from 6 to 5 in v3.5.0 now) In v3.5.0 there are now convenience occ command for resetting clustering and running clustering manually: |
Beta Was this translation helpful? Give feedback.
-
Thanks for the response! I tried the new algorithm in 3.5.0. Just to be safe, I wiped all detected faces and clusters from the DB and triggered a full re-crawl. This time, it created a few more people, but the vast majority of faces were put into a single cluster, seemingly almost randomly. I retried this a few times and also reran the In other words, a shit cluster. Having incorrect faces in the cluster wouldn't be so bad as they can be removed in the UI, but the problem is that there's no way (at least that I could see, in the Photos or Memories apps) to move photos from one person to a new person. One can only move them to an existing person, or remove them from the cluster completely. I tried the latter, but subsequent I tried changing While I'm messing with the code, is there another constant or parameter that determines how similar the faces have to be to get clustered together? |
Beta Was this translation helpful? Give feedback.
-
How many images do you have?
that would be a bug
yeah, that makes sense
There is no constant value that governs this. HDBSCAN is an adaptive algorithm that learns from the density patterns of the data. The more files you have, the better the outcome. You could also try playing with MIN_SAMPLE_SIZE. |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
I did try setting |
Beta Was this translation helpful? Give feedback.
-
Like @marcelklehr already commented, HDBSCAN, as it is implemented, will try to find the most stable clusters from the data regardless of their size. If the data contains a large number of identities (especially multiple identities with fewer face detections than MIN_CLUSTER_SIZE) it'll still try to find the most stable clusters in the data. This can lead to combining multiple identities of similar looking persons. The easiest way to alleviate this issue is to scan in a larger dataset. Mind you, "similar looking" to the face recognition model may not always be similar looking to you or me. This is especially true in the case of children/infants. The dlib face recognition model hasn't been trained with images of children so they will cause trouble with clustering regardless of the clustering algorithm. (IIRC, Photoprism, for example, had a community effort to retrain their recognition model with datasets containing images of children.) MIN_SAMPLE_SIZE is basically a probability density (i.e. "face detection density") smoothing factor used by HDBSCAN. Reducing this value too much can lead to statistical noise causing issues with larger datasets. Still, it might be that we'll have to reduce this value going forward; the optimal value in my test dataset may not be optimal for all users. Also, the optimal value will depend, to some degree, on how incremental clustering is implemented as that affects the amount of noise in the data that is being clustered. @marcelklehr : Besides fine tuning MIN_SAMPLE_SIZE, another way to improve the clustering in this case might be to implement a limit on the maximum size of a cluster. The obvious, but likely(?) not optimal, solution would be to limit the radius of a face cluster. However, limiting the maximum edge length within a cluster (this was implemented in a previous version of the MstClusterer-class but I stripped it since it was not used) may be a better solution since this will specifically limit forming clusters in sparse areas of the face embedding space (since the mutual reachability distance will also be large in these areas). If the latter is implemented, it may help us get away with a larger MIN_SAMPLE_SIZE which is better for users with larger datasets. It may also be that a combination of both of these limits would provide the best user experience. |
Beta Was this translation helpful? Give feedback.
-
@MB-Finski Thanks for the info, that's an interesting read! However, coming back to the original issue - irrespective of the way the clustering algorithm works, I think there should be a way for the user to review recognised but unclusters faces and, for each one, choose between "not a person" (don't suggest again), "merge with ______" (pick existing cluster) or "new person" (create new cluster). Or is that something that should be handled by another app like Photos or Memories? |
Beta Was this translation helpful? Give feedback.
-
I agree with @IndrekHaav, be able to create a new person is missing when trying to filter out false positive. |
Beta Was this translation helpful? Give feedback.
-
I cannot reproduce this. For me removed face detections are not readded to the same cluster anymore. |
Beta Was this translation helpful? Give feedback.
-
@marcelklehr For me, every time the clustering job ran, the same faces kept getting added to the same person. I ended up deleting the detected faces from the DB, because they were faces I wasn't interested in anyway (random background people, and such). How would this work anyway? Does the app keep track of clusters that a face has been removed from in some way? |
Beta Was this translation helpful? Give feedback.
-
@MB-Finski If you're up for implementing that, I'm happy to merge a pull request (let me know if you need help with git). |
Beta Was this translation helpful? Give feedback.
-
When removing a face from a cluster we store the distance from the cluster centroid along with the face and in the future only add it to a cluster if the distance to the cluster centroid is smaller. |
Beta Was this translation helpful? Give feedback.
-
@MB-Finski I wonder if it would make sense to fall back to DBSCAN clustering for photo collections smaller than x photos, as HDBSCAN results are pretty wild on smaller collections. |
Beta Was this translation helpful? Give feedback.
-
Coming over to this discussion thread for this since #721 was closed. I'm still experiencing this with V3.7.0. Should we be waiting for the pull in #711 to be released before doing more testing? |
Beta Was this translation helpful? Give feedback.
-
I have just installed Recognize in Nextcloud and uploaded 100 purely portrait photos, while Recognize tags all them as portraits, people as it should, it fails to find a single face in them. Face score too low. continuing with next face. |
Beta Was this translation helpful? Give feedback.
-
With recent nextcloud docker container (26), face recognition/clustering must be classified as not working at all. While clusters are created, people are added completely randomly. I wonder whether this feature ever has been tested on a real-life set of pictures? I have about ~30 000 pictures, of which approximately at least 15% have persons on them (I use to be in nature a lot). However, the result is so discouraging that I have not even thought of manually reordering persons and pictures. I remember having set up the dedicated app facerecognition at some point with nextlcoud 24 I think, and that worked way better, but it required a special docker image to be used. |
Beta Was this translation helpful? Give feedback.
-
Sorry about this taking a while; I've been too busy at work. Anyhow... So, just a quick recap of what I think is going on here (just in case someone else has time to work on this):
I've made a fork that has (for my personal library) vastly improved performance (mostly improving on points 1 and 2 from above). I'm still running into issues when getting over 30k face detections/user due to the batch prosessing not working correctly in NC 26 but the clustering results are much better for all my users. I intentionally set the hyperparameter such that splitting of identities is more likely than fusing them since this, I think, is the preferred type of error instead of fusing identities. If you try this fork, please, report back with your results. Only one file needs updating: https://github.com/MB-Finski/recognize/blob/main/lib/Service/FaceClusterAnalyzer.php |
Beta Was this translation helpful? Give feedback.
-
On this topic, I was wondering how Recognize deals with pictures with more than one person in. I might be wrong, but it seems to me as once a picture has been attributed to a person, it is not evaluated for further persons in the image. Could it be? I have a lot of pictures with more people in and once one face is recognized and the picture shows up in that person's "set", the other persons don't seem to show up anywhere, even when there's plenty of pictures with them and their face is clearly visible (even maybe better than the one originally picked). This makes me lose quite a bunch of people. |
Beta Was this translation helpful? Give feedback.
-
I have some news after running a benchmark on the (now improved) clustering algorithm using a part of the IMDb-Face dataset. Here are the numbers: Detected faces out of all faces86% (possibly overestimated) Clustered faces out of all detected faces28% Clustered faces out of detected faces per identity59% Identities with clusters out of all identities87% Shit clusters out of all clusters2% Correctly assigned faces out of all assigned faces per cluster92% Note, that the code that these numbers are based on has not been released yet. |
Beta Was this translation helpful? Give feedback.
-
I ran into something strange today. I see a different amount of faces in Photos/People and Memories/People. Aside the fact that some faces are blank in Photos and don't open any photos when clicked, I see 1 less persons in Memories. The difference is a person with only 1 photo and that photo is in a hidden folder outside of /Photos. Is there a setting to limit the scope of Recognize to not go out of the Photos (or whatever is set to hold pictures) folder AND not go into hidden folders? |
Beta Was this translation helpful? Give feedback.
-
I have a question, I installed Memories and Recognizer. Memories reports that 85,352 media files have been indexed. I then ran a full classification manually |
Beta Was this translation helpful? Give feedback.
-
I have a question, and all I've read leads nowhere or here :) Been using recognize since about March 2023. Over time my collection of digital photos from about 2007-2024 keeps on growing, probably have few hundred thousand photos, As a result I have several big clusters, like me and close family, at about 1200-1300 photos in each. But I also have 11927 "unassigned" faces. And all know clusteres still total way under 10k, so unassigned outnumber all "assigned" ones. To add to that, clustered number includes a big shit-cluster of around 1.1k photos which I named Trash and throw innocent bystanders into it (for lack of better management). What amazed me that my iwn face and few members of the family simply stopped being clustered. If I open unassigned group, we're all there, probably 1-2k photos (or faces) of us just from this summer. But nothing gets clustered. And yes, cron jobs are running, and after jumping from NC26 to NC29 I kept trying manually running scans, classifications, clustering and so on. Yet the unassigned cluster keeps growing, and new additions to any actual person cluster stays very low. It is not zero, but there's like under 100 random photos in total added in past ~18 months to all named clusters in total. My own face zero, as well as two other clusters with 1k+ tagged photos. So here I am wondering - can I tweak something? Maybe something hardcoded in PHP? Like a threshold when a face gets recognized as part of cluster? I'd rather have a bit more false positives than not grt anything clustered anymore. Oh and btw, I did try manually moving unassigned to correct persons, but neither Photos nor Memories allow that. So that doubles my woes, as I can delete or remove wrong tags (eg a stranger being tagged as me), and I'm willing to do that cleanup manually from time to time, but currently I can't in any way force hundreds of photos of myself to get moved to my personal cluster. (Using myself as example, same goes for other family members). I'd be happy to test any ideas. Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
I have about 200 photos uploaded to my Nextcloud instance, most of those containing faces. Recognize has processed them all, but only a single person shows up in the People section (of both Photos and Memories apps), with 12 photos.
I don't think this is the same as #588. I have checked the database, and the
oc_recognize_face_detections
table has 237 records, whileoc_recognize_face_clusters
only has one. Furthermore, if I manually insert a record into the clusters table and then link a record in the detections table to it, it shows up as a new person. I don't understand why the majority of the detected faces are not added to a cluster.The same photos, when imported into PhotoPrism (which also uses Tensorflow) resulted in every single detected face showing up.
Beta Was this translation helpful? Give feedback.
All reactions