Bug 207277

Summary: results.webkit.org: Abstract archive storage solutions
Product: WebKit Reporter: Jonathan Bedard <jbedard>
Component: Tools / TestsAssignee: Jonathan Bedard <jbedard>
Status: RESOLVED FIXED    
Severity: Normal CC: aakash_jain, commit-queue, lingho, slewis, webkit-bug-importer, zhifei_fang
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=207295
https://bugs.webkit.org/show_bug.cgi?id=216662
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch for landing none

Description Jonathan Bedard 2020-02-05 11:08:09 PST
This is a step towards storing archives on S3 instead of in Cassandra. Apart from the drive-by bug fix of adding ttl to Cassandra storage, this change will be functionally identical to the existing code.
Comment 1 Jonathan Bedard 2020-02-05 11:17:09 PST
Created attachment 389831 [details]
Patch
Comment 2 Jonathan Bedard 2020-02-05 14:58:50 PST
Created attachment 389876 [details]
Patch
Comment 3 Jonathan Bedard 2020-02-06 16:18:51 PST
Created attachment 390022 [details]
Patch
Comment 4 Zhifei Fang 2020-02-07 10:41:13 PST
Comment on attachment 390022 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=390022&action=review

> Tools/resultsdbpy/resultsdbpy/model/archiver.py:27
> +class Archiver(object):

I guess a delete method is also needed. Besides, I think a timestamp for the archive will be also very useful. We then can have something auto clean the archives.

> Tools/resultsdbpy/resultsdbpy/model/cassandra_archiver.py:31
> +    MAX_ARCHIVE = 500 * 1024 * 1024  # Archives should be smaller than 500 MB

I think I saw some debug archives have 600~ mb
Comment 5 Jonathan Bedard 2020-02-07 10:49:45 PST
(In reply to Zhifei Fang from comment #4)
> Comment on attachment 390022 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=390022&action=review
> 
> > Tools/resultsdbpy/resultsdbpy/model/archiver.py:27
> > +class Archiver(object):
> 
> I guess a delete method is also needed. Besides, I think a timestamp for the
> archive will be also very useful. We then can have something auto clean the
> archives.

The retain_for parameter is intended to provide automatic clean up.

The timestamp is actually less useful because we're sorting them by a primary key hash, meaning you'd need to actually request the specific archive you want deleted. (you could do that through the management table).

Generally, I've avoided putting any specific delete functions into the results database because while we occasionally need to do it, it's not something that the API generally supports. It should be a task that requires management access to the database. Even if we did want a delete function, I don't think it should be part of this patch since this is intended as a no-op and the existing code didn't support deletion either.

> 
> > Tools/resultsdbpy/resultsdbpy/model/cassandra_archiver.py:31
> > +    MAX_ARCHIVE = 500 * 1024 * 1024  # Archives should be smaller than 500 MB
> 
> I think I saw some debug archives have 600~ mb

It's not for builds, only for test result archives, which should all be in the ~50 MB range.

We could revisit this technique for builds later.
Comment 6 Zhifei Fang 2020-02-07 15:56:07 PST
unofficially r=me
Comment 7 Aakash Jain 2020-02-07 16:55:12 PST
rs=me
Comment 8 Jonathan Bedard 2020-02-10 07:25:04 PST
Created attachment 390251 [details]
Patch for landing
Comment 9 WebKit Commit Bot 2020-02-10 08:10:26 PST
Comment on attachment 390251 [details]
Patch for landing

Clearing flags on attachment: 390251

Committed r256172: <https://trac.webkit.org/changeset/256172>
Comment 10 WebKit Commit Bot 2020-02-10 08:10:28 PST
All reviewed patches have been landed.  Closing bug.
Comment 11 Radar WebKit Bug Importer 2020-02-10 08:11:22 PST
<rdar://problem/59311107>