You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the list of available data repository integrations is hard-coded into the code base. It would be a rather small change to provide this as a public API. It would allow power users (most likely: library developers) to add their own data repository implementations into pooch. I see this as an interesting option in the light of supporting domain-specific data repositories, which one may not want to support directly from the domain-agnostic pooch library.
Are you willing to help implement and maintain this feature?
Yes, with no specific timeline for the implementation.
The text was updated successfully, but these errors were encountered:
Hey @leouieda, one of my main priorities with this would be that it is an interface that is accessible for the end user (opposed to a library maintainer). Otherwise, library maintainers are limiting the set of data repositories their users can use. From my understanding, this rules out the approach to customize the downloader class.
I thought more of a registering mechanism. This can be done either explicit, like
# within the pooch core implementation:chain_of_responsibility=DataRepository.__subclasses__()
or
# PRO: Allows control of where to insert in chain# within the pooch core implementation:classDataRepository:
_chain= []
def__init_subclass__(cls, prepend=False):
ifprepend:
DataRepository._chain.append(cls)
else:
DataRepository._chain= [cls] +DataRepository._chain
I think my personal preference is the first and very simple register function.
To give you a bit of background on my motivation to do this: I want to advertise pooch in the future for reproducing computations from DOIs. In order to do that, it should support as much data repositories as possible (both generic and domain specific ones). I can understand that the pooch core can only support a limited number of data repositories. Therefore, I think users should be able to contribute data repository implementations to a separate project that has smaller stability guarantees. I have implemented such a thing as a proof of concept: https://github.com/dokempf/pooch_repositories It adds a "meta-repository" that accesses https://re3data.org to dispatch faster to e.g. DataVerse repositories and adds partial support for https://pangaea.de/
Description of the desired feature:
Currently, the list of available data repository integrations is hard-coded into the code base. It would be a rather small change to provide this as a public API. It would allow power users (most likely: library developers) to add their own data repository implementations into pooch. I see this as an interesting option in the light of supporting domain-specific data repositories, which one may not want to support directly from the domain-agnostic pooch library.
Are you willing to help implement and maintain this feature?
Yes, with no specific timeline for the implementation.
The text was updated successfully, but these errors were encountered: