Skip to content
This repository has been archived by the owner on Dec 30, 2020. It is now read-only.

Feature Request: Time Series #21

Open
jjshoe opened this issue Mar 25, 2020 · 8 comments
Open

Feature Request: Time Series #21

jjshoe opened this issue Mar 25, 2020 · 8 comments

Comments

@jjshoe
Copy link

jjshoe commented Mar 25, 2020

Johns Hopkins has completely fudged their time series, any chance you could host it via api?

I know I could write something to query day by day, and do the work from there, but it would be great if there was a CSV all set to go.

@kevee
Copy link
Member

kevee commented Mar 26, 2020

Hi, @jjshoe, is this purely and API question? If so, can I move it over to the API repo?

@jjshoe
Copy link
Author

jjshoe commented Mar 26, 2020

@kevee it probably belongs best on the repo where data is assembled.

I'm basically asking when the state, and county data sets get compiled each day, that a singular csv and whatever is used to drive the API, have a single data source, that gets data appended.

For example, in CSV, this would just mean a new data column, or several columns really, to contain that days data:

State 1/1/2020 1/2/2020
Alabama 1 2
Alaska 3 4

Or perhaps:

State 1/1/2020 Tested 1/1/2020 Confirmed
Alabama 10 2
Alaska 4 4

@jjshoe
Copy link
Author

jjshoe commented Mar 26, 2020

For anyone interested, the following perl code will generate confirmed cases (excluding deaths) into a time series CSV.

https://gist.github.com/jjshoe/a5c62a7ff12d85f3badaa398fbf0cbff

@schwartzadev schwartzadev transferred this issue from COVID19Tracking/website Mar 26, 2020
@marnen
Copy link

marnen commented Apr 3, 2020

Indeed. Right now I have to make separate queries for date=20200316, date=20200317, and so on. It would be really nice to have startDate=20200315&endDate=20200320.

@Nosferican
Copy link

I would recommend using the timestamp range inclusive, non-inclusive

[2020-04-04T14:26:13.978+00:00 .. 2020-04-05T14:26:13.978+00:00)
An ISO-8601 encoded date string. Assumed to be UTC unless a different timezone is passed.

Maybe something like checked:<=2020-04-05T14:26:00Z

@marnen
Copy link

marnen commented Apr 5, 2020

@Nosferican Why do you think that would be useful in this context? The API doesn’t use that date format to begin with, and at least for myself, I don’t see why a half-open interval would be the desired semantics (unlike in, say, the C++ STL). I do like the idea of date=20200315..20200320 instead of two separate parameters, but I think a closed interval would be easiest to deal with.

@Nosferican
Copy link

Nosferican commented Apr 5, 2020

The date associated with the data is for the day of publication based on the timestamp. It usually means it is about the reported tests for the day before. In some cases a case that was reported to the jurisdiction will show up in the API two days after depending on when it was reported. The actual data collection uses the timestamp of when the source of the website was downloaded and parsed which is a better measure of until when the data is "comprehensive". The metadata has documented the LastUpdated field but that one differs by jurisdiction and the heuristics make it a bit harder to use. The date field is a generated field based on the timestamp. Internally it is the timestamp the raw field that is actually used. Since the data and the API are updated more frequently (e.g., job starts at 16:00 ET and should be done by 17:00 ET) it makes more sense to use the timestamp. For the CSV API backups, those are always delayed since the CRON job runs about every 6 hours.
I think the intervals could be closed but when consuming it, you have the data as [start .. end) since anything after when the data was queried is still unknown.

@marnen
Copy link

marnen commented Apr 5, 2020

@Nosferican That’s useful information, but I’m not sure I understand how it interacts with time series. We already can query the API by date, so shouldn’t the semantics of a date series be the same? AFAIK the API doesn’t offer any finer time granularity (right?).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants