Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): add initial datasources design #103

Closed
wants to merge 1 commit into from

Conversation

JoviDeCroock
Copy link
Contributor

Resolve #89

Summary

The idea here is that we allow folks to create their own implementation of a Datasource all that it needs is to inherit from Datasource and we'll be able to use it in node. A datasource has to implement getOne and optionally can choose to supply getMany if their endpoint supports that.

A node is still allowed to supply the load function itself, the concept of datasources can be used to facilitate a wider ecosystem where folks can export i.e. fuse-shopify where there are a number of datasources and types exported that can be used in node/... folks can then choose to remap those properties to their own names or just use them as-is.

@JoviDeCroock JoviDeCroock requested a review from mxstbr December 18, 2023 06:48
Copy link

vercel bot commented Dec 18, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
fusejs-org ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 18, 2023 6:49am
spacex-fuse ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 18, 2023 6:49am

@JoviDeCroock JoviDeCroock force-pushed the reintroduce-datasources branch from 145cd61 to 6c17907 Compare December 18, 2023 06:48
@JoviDeCroock JoviDeCroock changed the title add initial datasources design feat(core): add initial datasources design Dec 18, 2023
@mxstbr
Copy link
Contributor

mxstbr commented Dec 18, 2023

I think figuring this out will be 🔑 I love that you have started tackling this.


To me, sharing the load function (essentially what this PR is doing) is the part that doesn't really require another abstraction—it could just be import shopifyLoad and then load: shopifyLoad.


What's more interesting to me is where this could go: how can we make it easy to tie underlying microservices, third-party APIs, & (potentially) data stores into your data layer across nodes, queries, and mutations (!) with minimal overhead? (particularly important: gRPC, REST, and GraphQL) (I'm not convinced on easy, direct data store embedding for data layers, but it's a likely outcome of having any kind of data source help)

Looking at some previous art, Gatsby's plugin system was one of its immense strengths but also one of its vast weaknesses: when a source plugin did what you needed, it was amazing and unbelievably quick to get up and running—but the second you needed something even just slightly different, you were stuck because you had little control over the underlying translation from data source to the generated graph. (I wonder what Netlify Connect's Connector implementation looks like, whether they have taken those learnings and iterated on the plugin system 🤔 )

What would it look like to have an ecosystem of generic data sources (specifically most important to me: REST, gRPC, and GraphQL) that users can take to easily connect underlying services across many different parts of their graph?

Potentially related discussion: stitching a whole API vs. stitching a specific type into a node.

@mxstbr
Copy link
Contributor

mxstbr commented Dec 19, 2023

More musings:

  • Thinking about the difference between the underlying data sources and the API frontend teams need, the underlying data sources almost always think in terms of CRUD.
  • (Part of?) The value of an abstraction over the underlying data sources is that one underlying microservice might be used in multiple nodes, the mapping isn't 1:1

What if our Datasource concept modeled these things? (I think the initial implementation of datasources had something like that) Something (pseudo-code) akin to

// types/UserService.ts
type User = {};

const UserService = new RESTDatasource<User>({
  read: (ids) => {},
  create: () => {},
  update: () => {},
  delete: () => {},
})

// types/User.ts
const UserNode = node({
  name: 'User',
  load: UserService.read,
  fields: (t) => ({})
})

addMutationFields(t => ({
  signup: t.field({
    type: UserNode,
    args: {},
    resolve: (_, args) => {
      return UserService.create(args.user);
    }
  })
}))

// types/Payments.ts

addMutationFields(t => ({
  subscribe: t.field({
    type: SubscriptionNode,
    args: {},
    resolve: (_, args) => {
      …create subscription…
      await UserService.update({ isSubscribed: true })
    }
  })
})

…but, I'm not sure that's providing a ton of value since it's doing pretty much nothing automatically, it's just another arbitrary abstraction 😬

Hm. What's the right balance between speeding up the user by doing things automatically vs. giving them control?

@mxstbr
Copy link
Contributor

mxstbr commented Dec 19, 2023

I wonder how

  • Wundergraph
  • Netlify Connect
  • tyk.io
  • Gravitee
  • etc.

handle this.

@kramhuber
Copy link

kramhuber commented Dec 19, 2023

Stepzen may also have inspiration https://stepzen.com/docs/quick-start/with-rest-import.

@kramhuber
Copy link

kramhuber commented Dec 19, 2023

I'm not sure that's providing a ton of value since it's doing pretty much nothing automatically

Assuming it adheres to OpenAPI or similar, we could likely do that lifting.

@mxstbr
Copy link
Contributor

mxstbr commented Dec 20, 2023

Assuming it adheres to OpenAPI or similar, we could likely do that lifting.

Hmm I guess one framing of this idea of making it easy to integrate data sources could be "For any OpenAPI/gRPC/GraphQL endpoint, we generate a fully type-safe client for you to tie that data source into your data layer easily." Kind of "Prisma for any typed API."

@mxstbr
Copy link
Contributor

mxstbr commented Dec 20, 2023

I put together a quick example for a friend that queries one of their REST endpoints (for "projects") and turns it into a GraphQL API. By far the most tedious bit was typing the underlying data source; looking at the code, that is the majority of it (~280 lines of code!):

CleanShot 2023-12-20 at 10 05 55@2x

The other part that would quickly become tedious here is exposing fields; with ~280 lines of types, you would need an equal (if not greater) amount of lines to define all the sub-objects as object types and add all the expose* definitions for all the fields.

Maybe the way to look at simplifying this would be: if your underlying data source has types already, we help you collect them, and then we flip the model on its head and expose everything by default and you can opt-out of exposing certain fields as-is instead of opting-in. (this is akin to how StepZen's REST connector works conceptually)

@mxstbr
Copy link
Contributor

mxstbr commented Dec 20, 2023

Apollo also has a take on this in Apollo Server: https://www.apollographql.com/docs/apollo-server/data/fetching-rest/#creating-subclasses

@mxstbr
Copy link
Contributor

mxstbr commented Dec 21, 2023

This looks like exactly what we need, albeit in a UI: https://tyk.io/docs/universal-data-graph/concepts/datasources/

More context: https://tyk.io/docs/universal-data-graph/udg-concepts/

@mxstbr
Copy link
Contributor

mxstbr commented Dec 22, 2023

"I don't want to have to learn how a new technology (gRPC, REST, Kafka) works just to get some data from a backing service"

@mxstbr
Copy link
Contributor

mxstbr commented Dec 22, 2023

Let me approach this from the angle of the goal for data sources to be "Make it easy for frontend teams to tie microservices & third-party APIs into their data layer across nodes, queries and mutations."

Importantly, as we learned from conversations, I think they need to come with two properties:

  • Require the least amount of knowledge of the underlying technologies as possible (“I don't want to have to learn how a new technology (gRPC, REST, Kafka) works just to get some data from a backing service”)
  • Allow reshaping the data (“It's not enough to just take a data source and turn it into GraphQL”)

(feel free to disagree/discuss about this goal the properties! open to other ideas that reframe the conversation)


Assuming those are the goals & properties, here is a draft of something that I think would achieve this goal with these properties based on the example of REST:

// For a typed REST API (e.g. OpenAPI), we need to document how to easily generate this to avoid this being a tedious task
type UserSource = { id: string, name?: string, isCustomer?: boolean,  }

const userService = new RESTDatasource<UserSource>({
  url: "corp.com/api/users",
  // Pass-through all headers by default; can be overriden with the headers config
  headers?: (ctx: UserContext) => Headers,
  // Because there is no convention for "load many of the same resource," we have to
  // read one resource at a time by default.
  // However, users can specify custom load(ids) fns to opt-into data loading if their
  // underlying API supports it
  load?: (ids) => UserSource[]
})

userService.create({}: UserSource): UserSource // -> POST ${url}
userService.update(id, {}: UserSource): UserSource // -> PUT ${url}/${id}
userService.delete(id): UserSource // -> DELETE ${url}/${id}
// This will not data load, but instead call the "Read one" endpoint once per ID by default
// unless the load(ids) fn is defined in the userService
// QUESTION: Should this be read(ids)? That name would match more closely with CRUD
// but is different than node.load(). 👎
userService.load(ids): UserSource[] // -> GET ${url}/${ids[0]}, GET ${url}/${ids[1]}, …GET ${url}/${ids[n]}
// Usage

// Expose a datasource as-is with slight reshaping of the resource
const User = node({
  ...userService, // load(), create(), update(), & delete()
  fields: (t) => ({
    name: t.exposeString('name'),
  })
})

// EXAMPLE: Use the userService.load/create/update/delete methods manually if needed
addMutationField({
  upgradeToPaidPlan: t.field({
    type: User,
    resolve: (_, args) => {
      
      return userService.update({ isCustomer: true });
    }
  })
})
# Generated schema
type User {
  id: ID!
  name: String
}

type Query {
  user(id: ID!): User
}

# Create mutations if node.create, node.update, and node.delete respectively are defined
input UserInput {
  name: String
}

type Mutation {
  createUser(id: ID!, input: UserInput!): User
  updateUser(id: ID!, input: UserInput!): User
  deleteUser(id: ID!, input: UserInput!): User
}

One important consideration with auto-generating mutations (as well as our auto-generated query!) is that we have to give users the ability to override them to extend their functionality. I think the most intuitive way for that to happen would be this:

const User = node({
  ...userService,
  fields: (t) => ({
    name: t.exposeString('name'),
  })
})

addQueryFields(t => ({
  // REMOVE auto-generated user root query field
  user: null,
}))

addMutationFields((t) => ({
  // REPLACE auto-generated createUser root mutation field
  createUser: t.field({
    type: User,
    args: {
      id: t.args.id(),
      // NOTE: Auto-generated UserInput input type is accessible from User node object
      input: User.UserInput,
    },
    resolve: (_, args) => {
      …custom logic here…
      return userService.create();
    }
  })
}))

Open questions for future iterations:

  • What if the underlying REST API isn't quite a standard REST API? Future improvement could be allowing users to override what exactly create/update/delete do, similar to how load allows them to override what read does.

How does this look to you @JoviDeCroock?

@JoviDeCroock
Copy link
Contributor Author

JoviDeCroock commented Dec 22, 2023

I feel that's a bit against the premise of what we want to achieve, one of our design decisions was to guide folks towards good GraphQL i.e. have a limited set of entry-points and have your entry-points specialised to the needs of your UI.

When we introduced an automatic query-field when node is invoked we kind of already went against that grain as we now have an entry-point per node, so we practically introduced the R from CRUD automatically, while here we'll go further also introducing the create, update and delete.

While create and delete might be useful, I'm not sure whether folks want to be able to create any and every node in their graph as a lot of these could be triggered by side-effects or be implicit. Think about a user making a purchase, the product goes out of stock so it's deleted behind the scenes or similarly a product getting created in the back-office.

I see that we want to provide escape hatches as an opt-out but personally I feel more like opt-in vs opt-out. GraphQL mutations are often modelled according to the interactions you are doing in the UI, you won't updateCart you will addProductToCart and changeCartItemQuantity so this 1:1 modelling of REST brings us back to the problem statement where we wanted to enable folks to expose good non-generic GraphQL.

Linearly, this is very REST-oriented while with gRPC and others it isn't quite as straight forward as those API's will often be modelled after an interaction rather than a single action.

I wholeheartedly support the idea of simplifying the transport protocol by i.e. having a default fetch, gRPC, ... implementation, this leaking into our schema however feels harder to me. The above could be extended with a scaffolding CLI command that takes a spec-compliant REST/... endpoint, generates the output interface type and the endpoints involving that entity on the datasource. That makes the whole Fuse <--> Datasource part generated while the user can still create their own graph and invoke/alter the generated CRUD datasource.

Disclaimer that I could be being a purist here and the majority of people just want to re-expose the CRUD of their datasources without inter-linking/... which is fair enough, my point is mainly trying to convey GraphQL. Re-exposing is probably great for CMS/back-office clients which are centered around these types of interactions

@mxstbr
Copy link
Contributor

mxstbr commented Jan 9, 2024

I think you're right that auto-generating mutations isn't the way to go.

Further, that then leads me to think that us building data sources would essentially amount to building typesafe client generators for various types of data sources—but other people have already done that! (e.g. connect-es for gRPC is awesome and great and I don't want to reinvent it)

So, that makes me think the best way to go is:

  • Don't introduce a concept of data sources or build any custom typesafe clients for these data sources
  • Instead, add a ton of documentation on the best typesafe clients for various data sources to integrate those data sources into your data layer.

E.g., "How to use Fuse with…"

gRPC: connect-es
REST/OpenAPI: ???
SQL Databases: Prisma, Drizzle
MongoDB: Mongoose???
etc.

@JoviDeCroock JoviDeCroock deleted the reintroduce-datasources branch January 9, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Data source connectors
3 participants