Created attachment 453843 [details] Number of reassigned users depending on how long they have not visited the website. Hi WebKit team! I have to report a bug that I am afraid there is no simple way to reproduce but which has an immediate impact on lots of pages. The issue seems to be that Safari in all observed versions (from 14.1 to 15.3) sometimes deletes server set same-origin cookies after 7 days of inactivity when a Service Worker is used on the page. We suspect that this is linked to ITP which makes it virtually impossible to create an easily reproducible set up for you to try. Instead, let me explain how we are coming to this conclusion and what our test setup looks like: * We were running an AB test on a domain example.com in production. The group was chosen randomly on the server side and sent to the browser as a same-origin, secure but not HTTP-only cookie. The group stays the same for the user as long as he sends back the cookie on every navigation. * In the client, we installed a Service Worker if the user was assigned to group A and did nothing when the user was assigned to group B. * The Service Worker itself is rather simple. It caches static assets and stores some metadata in IndexedDB but it does not change the navigate request with the cookie at all. We used a real user monitoring to compare the performance of both groups. In this data we can see the issue: While it happens sometimes that the cookie gets lost and a user is reassigned in the AB test, this should happen about the same number of times in both groups. What we saw is that much more users were reassigned a new group when they were in group A (with the Service Worker) than in group B. Looking at those reassignments in more detail. This is what we found: [1] The attached graph shows the number of reassigned users depending on how long they have not visited the website. What you can clearly see is that when a user returns within 6 days to the website, the probability of him being reassigned is the same in both groups. If the user returns after 7 days or later, the probability is much higher to be reassigned if the user is in group A. After a one-month AB test with more than 200k users per group, this effect resulted in group A having 3.5% fewer users (and even fewer returning users) which skews the conversion rate comparison we wanted to do in that AB test immensely. There are four more insights we got from the data * This discrepancy between the groups only happened for browser versions that had the Service Worker active (we had not installed it in Safari 15.1 and those users were fine) * The discrepancy between the groups also did not happen for users from certain ASN (like akamai or cloudflare). It turned out to be users that had icloud private relay enabled. * For a different AB test on a different domain we know that the same issue is not observed if the cookie used for the split test is marked as HTTP-Only. * The issue seems to happen on mobile only Probably needless to say but we have seen this issue only in Safari, no other browser. For us it looks like ITP deletes too much data from the client if a service worker and indexeddb is used in the client. If you have any other idea that explains the data we are seeing, or if you have a way to reproduce or debug an issue like this, it would be awesome to get your feedback. We are very open to helping in the investigation here!
<rdar://problem/89816682>
Hi! Thanks for filing! Interesting investigation. I don't immediately see how the combo of SW and Private Relay could result in server-set cookies being deleted. Three things to note though: · Cookies set in JavaScript expire after 7 days *calendar time*. However, that is not the case for other script-written storage. Those are deleted after 7 days of browser usage without user interaction as first party. It would be good to be explicit about calendar time vs days of use here. · Cookies set through third-party CNAME cloaking expire after 7 days calendar time. See "CNAME Cloaking Defense" in our documentation: https://webkit.org/tracking-prevention/#intelligent-tracking-prevention-itp. It would be good to know if third-party CNAME cloaking is at play here. · HttpOnly being a factor sounds to me like JavaScript may be a factor too. Could it be that these cookies start out server-set but then get re-written in script? Our guidance is to always set login cookies as HttpOnly, for security reasons. Is there a reason why they can't be HttpOnly? That's not to say there's no bug here, but it's important to know.
Hey John! Thanks for the quick reply, really appreciate it! Thanks also for the clarifications on ITP. Few points: The Private Relay point got mixed up: With Private Relay the issue does NOT occur, only without it. Calendar time vs. days of use is interesting but I think it is the same in this scenario. We only take the time between user sessions on the page and round down to days of absence. At the moment the cookie is read from JS but there is certainly a solution where the cookie is set to HTTP-only. Unfortunately that is not under our direct control. I am certain it is NOT written from JS though. Regarding security, the cookie only lists AB test groups assigned to the user, no authentication or tokens. The point with CNAME Cloaking Defense sounds pretty interesting when I think about it (have glanced over that before when reading up on ITP). The reason why I think this is interesting: As always with these kinds of issues, I have left out a few details in the description above to simplify to what I thought were the main points. The Service Worker does one thing that I did not mention, that might be interesting here: When the navigate (HTML) request gets to the Service Worker, it is forwarded to the network unchanged (mostly unchanged, see [1]). The response that comes back contains the AB test cookie on every navigation, so the browser should store it every time (even though the value does not change) On some navigations though, the Service Worker will issue another request in parallel that will fetch a cached version of the HTML response from a different endpoint (different domain as the page). This cached response contains a static version of the HTML and the response has no cookies associated with it at all. This response will be sent back to the browser if it is faster than the original navigate request (which is very likely as it is cached in CDN node). If the browser receives the HTML from the different domain, the original navigate response is held in the Service Worker until a special XHR request is received from the page. That XHR request gets the original navigate response (because it then inserts the personalized bit from that HTML into the DOM - that should not really matter for the issue at hand) You probably understand why I left that bit out as it complicates things a bit as is might be important I wanted to update you on that. Now, I don't really see CNAME cloaking in effect here since the original navigate request is fetched unchanged (mostly unchanged, see [1]) and cookies should be applied in the browser once the Service Worker receives the response (it is not necessary for the Service Worker to hand it back to the browser as the response to the navigate is it), right? [1]: we used to send the request completely unmodified to the network but since identifying another bug in Safari, we need to copy the request object before sending it to ensure same site lax cookies are sent always: https://bugs.webkit.org/show_bug.cgi?id=232440
Hey! Since our last comment we are now using an HTTP-only cookie (to decide the group in the AB test). Still, we are seeing users with a service worker losing this group information after 7 days whereas users without serivce worker don't. Seems the issue is also there for HTTP-only cookies. Do you have any idea on this?
(In reply to erik.witt from comment #4) > Hey! > > Since our last comment we are now using an HTTP-only cookie (to decide the > group in the AB test). Still, we are seeing users with a service worker > losing this group information after 7 days whereas users without serivce > worker don't. > > Seems the issue is also there for HTTP-only cookies. Do you have any idea on > this? Have you confirmed that the HttpOnly cookie is deleted?