-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(anomaly detection): Maintain historical timeseries #1670
Conversation
"dynamic_alert_time_series_history", | ||
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False), | ||
sa.Column("alert_id", sa.BigInteger(), nullable=False), | ||
sa.Column("timestamp", sa.DateTime(), nullable=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are cleaning up history based on timestamp, do we need an index on this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id column is to avoid duplicate values in the rows since the other columns can be the same including the timestamped ones if they are added at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asking about the timestamp column. Since the where clause of the SQL will have timestamp, we need an index (not primary index as id will still be the primary index).
# ### commands auto generated by Alembic - please adjust! ### | ||
op.create_table( | ||
"dynamic_alert_time_series_history", | ||
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious how come alert_id is biginteger but this id is integer? We will be having more rows in this table than the alert table, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- fixed
src/seer/anomaly_detection/tasks.py
Outdated
with Session() as session: | ||
deleted_count = ( | ||
session.query(DbDynamicAlertTimeSeriesHistory) | ||
.filter(DbDynamicAlertTimeSeriesHistory.saved_at < date_threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is applied across all alerts, so can be a costly operation. Also, should we be filtering on timestamp
instead of save_at
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I am filtering on saved_at
instead of timestamp
is because I thought that we want the timeseries points to be available 90 days after they are deleted, not created. But if that is not the case, I can change it to timestamp
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the costly operation, is there another way to do this since this table doesn't have any specific relationships to other tables aside from it being a storage table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of having 90 days of time series data regardless of when it was saved to seer. As to the deletion operation, can you check if sqlalchemy has any batch deletion feature that we can leverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering by timeseries
and using the Delete object to remove values more efficiently:
DbDynamicAlertTimeSeriesHistory
which stores the required information for us to recreate and test any issuesalert_id
: inttimestamp
: intvalue
: floatanomaly_type
: strsaved_at
: datetime