-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for comment on API shapes #9
Comments
I still don't understand this. Are you thinking about different parts of the application access the same beacon simultaneously? If so, that sounds like an application problem rather than an API problem. PendingBeacon is a relatively precious resource, and if there are much more (say, 1M) requests than beacons, the application needs to deal with the problem by queueing/filtering/merging requests for example. |
Since the answer is long and detailed and might generate more long and details answers I've replied to @yutakahirano in a new issue (#10). I'd like to leave this issue for user feedback on the API. |
I've given this a fair amount of thought, and generally speaking I think there are two primary use cases for this API, and to answer the question posed in this issue, it's worth evaluating how well the options address each of these use cases:
(There may be other use cases I'm not thinking about, please respond if so.) Saving user stateFor the "saving user state" case, the goal would be for an app to be able to restore the user's last state the next time they visit. This is often done via client-side storage, but there could be cases where an app wants to preserve this state across devices or browsers for a given user. For this use case, the "Replace data" high-level API works well, as you only ever want to send the most up-to-date user data; it never matters what the previous state is. Also in the rare case where the state data fails to be sent, it's only a minor inconvenience to the user, so it's well suited for this API. That being said, the ability to overwrite data without ever having to check Sending analytics dataMost analytics providers operate using an event model, where consumers of their product can report events based on user interactions (or other triggers) and then the analytics code manages sending that data to their backend servers. Because analytics providers tend to have lots of customers and those customers often have lots of users, the amount of data sent to their backends can be large, so anything that limits both the number of events sent as well as the size of those events can have a big impact. For this use case, I do not think the "Replace data" high-level APIs work well because there's a tradeoff where you gain API ergonomic convenience at the expense of sometimes over-sending data or sending data more frequently than you would otherwise need. Let me outline the scenario where that would happen:
This is not ideal as it means that the backend now has to add logic to dedupe these events, and if you're an analytics provider servicing millions or billions of user visits, that can be a ton of processing cost. The "Append data" high-level API would work well for some analytics use cases (e.g. if the event data never changes), but for use cases like RUM analytics where a performance metric value does change as the user is interacting with the page, the "Append data" API is also not ideal because (again) you'd have to write logic on your backend to dedupe that data, which can be expensive. The "Low-level" sync API handles both of these cases nicely because it can make the determination for itself whether to add new data or replace existing data, and if it needs to replace data it can use the
I think this argument is valid for an individual user (e.g. the difference will have little impact on them, their experience, or their data usage), but I don't think it's valid for an analytics provider who has to pay the network and processing cost of every beacon it receives. |
Appreciate you soliciting feedback on this @fergald, and I agree with @philipwalton's assessment. Thinking about this question from the perspective of a RUM analytics provider, the low-level sync API feels the most flexible and natural. In our RUM processing pipeline, the browser (e.g. via our boomerang.js RUM library) will send 1-n RUM beacons to the backend, generally aligned with major "events" that occur from the user. We always want to capture the Page Load's data, but will also send beacons for subsequent in-page interactions, SPA soft navigations, significant errors on the page, etc. We aim to keep each of those beacons as small as possible and the beacon payload "localized" to the most recent event, meaning its payload has data for the current event backwards up to the most recent beacon. In other words, after a beacon goes out, we start with "fresh data" for the next beacon. This allows our back-end processing pipeline to analyze individual beacons without needing to restore or save context from other possible (but maybe zero) other beacons. The data that is duplicated on each beacon is general dimensional data (e.g. what the browser is, location, etc). Timers, Metrics, and Log-style data are generally limited to the most-recent-thing being measured. As you and @philipwalton mention above, RUM can measure specific data points (that may still change over time), as well as events/logs (that can grow in number of entries over time). Some practical examples of both:
For both of these, being able to replace the current pending beacons' data is critical for us. The way I'm envisioning boomerang.js using this API has been along the lines of the proposed low-level sync API. We'd like to continue sending beacons at our existing schedule of major events (Page Load, SPA Soft Nav, etc) with the ability to still "queue" data (for the most recent event) in case the page unloads itself. For example, on a classic MPA (non-SPA) site:
If you add a SPA site into the mix,
Given our use case, the low-level sync API seems the most natural. For the high-level APIs, I think I have a couple questions, to make sure I understand though:
|
@nicjansma Thanks for the detailed feedback. For your questions
The thing that would make the replace/append API insufficient is if you there are cases where you would replace only some of the payload with an update, e.g. you'd have a single beacon that was carrying Page Load Timers, LCP and CLS all in one beacon because then you would need to know whether something had already been sent, e.g. you don't want to send the PLT a second time if it's already been sent once. To use replace/append for that you'd probably want to put them all on different beacons, PLT only being set once but LCP and CLS would just keep getting updated and now you have 3 beacons instead of one. |
It sounds like the main reason you don't see a use for |
@philipwalton apologies for the late reply, but that sounds correct! |
The API shape has been evolved into |
We are considering 3 different APIs and would appreciate feedback. I will include some pros and cons but in order for us to correctly weight these, please comment even if it has already been called out as a pro/con and it's important to you.
This issue focuses on how to set new data on the beacon and deal with a beacon that has already sent its data.
Low-level APIs
These are 2 versions of the API in the explainer. Sync vs async is about the API shape, a sync API does not mean that operations will block, waiting for external events, rather it means that the API does not use Promises and that state cannot spontaneously change mid-task.
Sync
We have
setData
andisPending
and ifisPending
returnstrue
then the beacon has not been sent yet andsetData
will succeed. We could also removeisPending
and havesetData
throw an exception but that's not fundamentally different.Pros
Cons
Async
We have only
setData
and noisPending
.setData
returns a Promise that will resolve if the beacon has not been sent yet and the data was successfully set. It will reject if the data could not be set.The reason we drop
isPending
is because the result could be invalid by the time we try to act on it.Pros
Cons
High-level API
There are two straight-forward use cases for the beacon that suggest higher level APIs. Both of these could be implemented using the lower level API above. The real question is whether these 2 high-level APIs are enough or do we need to expose the low-level API?
In both of these, there is no
isPending
or even a way to tell if data has been sent already.Appending data
The beacon accumulates data and batches it up for sending. Policies like timeouts etc control how batching occurs (some data may be sent before the page is discarded). It guarantees (to the extent possible) that all data appended will eventually be sent.
The page never needs to check if the beacon has already sent some intermediate batch, it just keeps appending data.
Replacing data
The beacon's data is replaced by calls to
setData
. It doesn't matter whether the beacon has already sent data, it can always be replaced. Again, policies like timeouts etc control when sending occurs with a guarantee that the last set value will be definitely be sent.Use case, e.g. reporting LCP values. The page just keeps setting the latest observed LCP, perhaps with a policy that says "don't leave data sitting around unsent for more than 5 minutes".
Discussion
An example of where these APIs might not work well is where the page would like to merge 2 metrics into 1 beacon if possible. With the low-level API, it would check if the beacon has been sent already and if not, replace the data with the combined data. This could reduce network traffic (although arguably connection reuse and header compression makes that a small benefit). It could also reduce processing cost by delivering related data already joined.
It may be that these APIs are capable of doing everything that's needed but impose costs on the backend.
It may also be that there are use-cases that simply cannot be met with these APIs.
Please let us know.
The text was updated successfully, but these errors were encountered: