For anyone who didn't click through to the WebKit bug report the author submitted, a WebKit dev asked him to clarify why the BBC finds it beneficial to be able to detect that the event was sent from a keyboard. This is the author's response:
> Ironically, I want interoperability on this to help with use cases relating to accessibility.
> I work at the BBC and, on our UK website, our navigation bar menu button behaves slightly differently depending on if it is opened with a pointer or keyboard. The click event will always open the menu, but:
> - when opening with a pointer, the focus moves to the menu container.
> - when opening with a keyboard, there is no animation to open the menu and the focus moves to the first link in the menu.
> Often when opening a menu, we don't want a slightly different behaviour around focus and animations depending on if the user 'clicks' with a pointer or keyboard.
> The 'click' event is great when creating user experiences for keyboard users because it is device independent. On keyboards, it is only invoked by Space or Enter key presses. If we were to use the keydown event, we would have to check whether only the the Space or Enter keys were pressed.
This is fascinating, because the naive English interpretation of the code and the comment on that WebKit bug don't match the actual structure of the code. Here's the relevant code:
Ignoring the actual conditions entirely, this code seems to be trying to categorize the event into one of two categories: mouse or keyboard. But what it actually does is to categorize into one of four categories: (mouse and not keyboard), (keyboard and not mouse), (keyboard and mouse), and (neither keyboard nor mouse). And, as the original bug shows, (neither keyboard nor mouse) is handled inappropriately. One might wonder whether (keyboard and mouse) works well.
Either the code should be deliberate about the fact that (is it a keyboard) and (is it a mouse) are separate booleans, or the code should be structured so that the actual categories are mutually exclusive. For example:
const isInvokedByMouse = ...
and use !isInvokedByMouse to check for keyboardiness, or:
const eventSource = ... (returns "keyboard" or "mouse")
or, perhaps even better:
const eventSource = ... (returns "keyboard", "mouse", or "not sure")
This is a great comment ^ whenever you use two booleans like this, you're opening yourself up to "unrepresentable state" logic errors. Finding and noticing this in code can be tricky, but it's a great example of something that should be caught during code review.
Wait—that last state doesn't make sense. With a real door, you can technically turn the key while it's open, but does that meaningfully change its state? Yet our model allows this impossible combination.
Funnily enough, I have a physical door & lock that OFTEN gets in to this state - it's exactly as irritating as it sounds, and it has very meaningful impact on it's state (it then can't be closed without first unlocking the lock!)
I’ve come to the conclusion that most programming problems like this can be solved my having programmers work as building superintendents for 3 months. Practical experience will open their eyes to how the world really works.
This leads to another wonderful gap, where programmers are expected to understand the world[0], but the world isn't expected to understand programming[1]
[0] Business, etc
[1] System design, security, database management, cost vs speed trade-offs, SCM, etc etc etc
This is a good account of a software modelling process. But, this is not specific to booleans. Database constraints involve the same issue. One motivation for encapsulation in software was to preserve constraints in the data (or in other words, disallow improper states). Encapsulation allows only internal functions to manipulate the data, and the developer just needs to checks that these functions are doing it correctly.
Or, as pointed out in the post where multiple booleans are merged into a single enum, encode the constraints into the data itself ie. use the constraints of the host programming language.
But this wont be possible in general - for instance if your language doesn't have sets/dictionaries, how would you encode uniqueness of values directly using arrays and lists? It would have to be done using interface functions.
Yet another article prematurely optimizing. It’s like these people have nothing better to do. I’ll wait for my code to get into stupid edge cases first and then fix it. Even if you spend your time avoiding booleans you will still find yourself with some new contradictory state and have to fix it differently anyways.
Coming up with proper representation for your state is almost always worth it. If anything it's the opposite of premature optimisation - normalise first, only denormalise after you've proven that it's needed.
No I read wrong and missed it. If all four states are actually possible an enum can still be a win if you have a compiler that can check for exhaustive switch case statements. Or especially if updating a system where it used to be only three were valid and now something changed so that all four are.
And just to add an extra corner case, Mobile Safari changes the click behavior if an onclick handler is registered - even if the click handler is an empty function that does nothing. The onclick handler itself acts as another Boolean that affects the browser's behavior. I don't remember the exact details because it was a corner case (I think to do with scrolling or popovers or touchcancel - I know it was surprisingly nasty). This page mentions something else http://www.quirksmode.org/blog/archives/2010/09/click_event_... "Fortunately it’s pretty easy to solve: you have to make the element clickable by giving it an onclick event handler of its very own. That handler can be empty; as long as it’s present it will make any element clickable.".
Well said. I usually go through several phases when dealing with this kind of thing, I start with "flags for different conditions" type logic, then when things get too complex I refine this into "set of explicitly defined states", and then as the state code evolves from 'ideal happy path' to 'production-tested code' I gradually realise that most of the original combinations represented by the original flags actually CAN happen in weird edge-cases.
By making it easy to pattern match, it reduces the possiblity of producing an invalid state, because at the time of definition, you have to figure out how to get that type (and checked by compiler).
I frequently use algebraic thinking to verify my sanity with types. Bool times bool is 4. We have two states we want to represent so bool times bool so the wrong thing.
The first mistake the developer made, was that he wanted to create a different user experience between keyboard and mouse. Stick to what you get by default and design your components so they work for both usecases. Don't try to be smart when it comes to accessibility.
What he ended up doing is what I would have considered a hack. A solution that inevitably breaks or has side effects.
The reason there rarely are good handles to do things differently in accessibility context, is because it's not something that's meant to be handled differently.
See I work in accessibility. Like I provide and create solutions direct to end users with complex needs. Not regular web accessibility. I get the view of this. It’s the same idea of universal access. But actually I don’t fully agree. Yes. If you can stick to this principle - and do try / but I promise you edge cases - which in itself is what accessibility users are all about - cause headaches. At some level you have to do custom stuff. It’s the best way. Take for example switch users. Yes. If your ui is tab able - great. But what if you need your items scannable in frequency order. Your tab index needs to change to meet the end users needs. Or eye gaze users. The accuracy level changes. Add in cognitive issues. You can’t just make a one size fits all interface. At some stage you need to significantly customize it. You can’t rely on a user just learning a complex system level interaction technique- if they can’t do that you have to customise on an individual level.
Of course there are edge cases, I work with accessibility too, for an app in the public sector where WCAG rules are no joke, so I know this as well but even so, we don't build custom accessibility UI for our users. We (try to) build the UI with accessibility in mind so it's scalable, can be used and navigate properly by voice over and keyboard.
On mobile it's not perfect either but in general you do have features to change stuff like. focus, grouping of elements, how the keyboard navigate the view stack, how to access a button through custom actions and like you mention, change the tab index programmatically.
Even so, not everything can be fixed or handled through standard accessibility means and as such hacks will inevitably make it into the products.
I get what you're saying but I still think that making things accessible and designing with common accessibility in mind should be default and as such it has to be thought about when designing and developing from the get go. Having to create custom interfaces to fulfill a specific need might be a good fit for some things but not when developing apps and websites unless you're targeting that user-group specifically.
Well said! It certainly applies to web development as well. Sadly, sometimes more complex solutions are needed - especially when based on user research.
> The first mistake the developer made, was that he wanted to create a different user experience between keyboard and mouse. Stick to what you get by default and design your components so they work for both usecases.
We have. The behaviour is mostly the same whether you're using the keyboard or a pointer (mouse/touch/pen). The only difference is that, for keyboard users, we want to turn off the animation and move the focus to the first link in the menu instead of focussing on the menu's parent <ul>.
The problem was that, as various devs have iterated on the menu over the years, it's broken the fallback behaviour. For my colleague on the funny multi-monitor set up, it should have fallen back to the keyboard no-animation behaviour with no real major difference to the UX, but instead it fell back to the no-JS experience.
So yes, generally don't try to be smart with accessibility, avoid ARIA attributes except where necessary, etc, but click events are the universal input event and work on any kind of input device and have perfect browser support. It's far better for accessibility using them instead of a mix of keydown and mousedown or pointerdown, and potentially missing other kinds of input events.
As I stated in another comment, if it was a scenario where there needs to be a major difference in behaviour between keyboard and pointers, then I would rather use separate keydown and pointerdown events.
The _mostly_ same behavior is what caused the problem though :P I'm curious, did these solutions come to pass because you had to make adjustments based on actual user feedback or was it just a developer trying to think ahead? I'm questioning whether forcing the user to tab to get to the menu item is a hindrance at all or whether the animation was a problem.
Maybe the former could have been solved using ARIA tags or maybe it would require bigger changes to the component itself. Accessibility is a roller-coaster for all these reasons alone.
> What is the benefit of the animation to the user?
Animations enhance experience by drawing attention to state changes and providing intuitive feedback to user actions.
If you don't find them engaging or useful, that's fine - and you can set prefers-reduced-motion to true on your client - , but many people do.
> What is the benefit of focusing on the menu’s parent to the user?
The first item was not interacted with nor navigated to, therefore it shouldn't be focused under normal circumstances. It would be unexpected behavior.
Focusing the first item in keyboard interactions is an accessibility hack recommended by W3C:
> Animations enhance experience by drawing attention to state changes and providing intuitive feedback to user actions.
> If you don't find them engaging or useful, that's fine - and you can set prefers-reduced-motion to true on your client - , but many people do.
The question here is not "does an animation have worth", but how is that worth tied to whether an onclick event originated from the mouse or the keyboard? Your reasoning applies equally to both, and thus leaves us still confused: why are we varying the animation by input device?
I don't actually agree, I think you can keep the animation and still make the content available immediately for screen readers. (And of course, keyboard navigation is not just for screen reader users!) Maybe someone else knows of some issue I don't.
> The first mistake the developer made, was that he wanted to create a different user experience between keyboard and mouse.
No, they wanted to make them the same. It's just to give a blind person the same experience as a seeing person requires different things because they operate differently for obvious reasons. For example, a blind person can't see when an animation has finished. They expect that menu to be available once they've triggered it. However, seeing people see the dropdown appearing and then go to use it once it's ready.
> Don't try to be smart when it comes to accessibility.
In all seriousness, considering the state of accessibility as is, I think going outside the box isn't trying to be smart. It's actually being smart. The BBC frontend team is probably at the forefront of making high-traffic websites extremely usable.
> a blind person can't see when an animation has finished. They expect that menu to be available once they've triggered it. However, seeing people see the dropdown appearing and then go to use it once it's ready.
A blind person can and should get cues from their assistive technologies that an item is is being loaded and is shown, either using announcements or aria tags that provide this information to the user.
While its fine to expect that something is available immediately, that's rarely a realistic expectation, whether you're blind or not.
> For example, a blind person can't see when an animation has finished. They expect that menu to be available once they've triggered it. However, seeing people see the dropdown appearing and then go to use it once it's ready.
For my two-cents, the BBC was simply trying too much to be "cutesy". Don't animate anything, because the silly animation on mouse click simply makes the website feel slower overall. Just open the menu as fast as the user's browser will open it.
That wouldn't change anything. They want the first element of the menu to be focused when "clicked" from a keyboard but not from a mouse. The animation doesn't affect that.
Animation helps to correlate screen elements. Without animation it actually takes longer to establish the mental relationship between the action and the result.
I prefer the line: “make it as simple as possible, but no simpler”
Sometimes complexity is simply the right tool for the job. Complexity is essential and valuable in all sorts of places - like fuzzers, LLMs, compilers, rendering engines, kernel schedulers and so on. But projects only have so much complexity budget to spend. I think I've spent my whole career trying to figure out how to spend complexity wisely. And I could spend decades more on it.
The BBC site has a "search box" that's actually a button that brings up the real search box. Always feels confusing. At least it's consistent across News / Sounds / iPlayer.
I think there is no browser bug here, though using negative screen coordinates is probably going to be surprising to a lot of folks.
However, the BBC's intent seems quite sound to me from an a11y point of view, and their commitment to a11y is commendable. Though it's likely that for some browsers their attempts at defining their own a11y experience will result in a bad UX anyways.
My understanding from this is that BBC want slightly different behaviour depending on whether it's a mouse or keyboard "click" (keyboard shouldn't show the animation and should focus the first link in the menu).
However, they also want the ease of binding to a single event and while binding to "click" can do this, they have no way to tell whether it was a mouse click or keyboard press which triggered the event.
To solve this they're using an unreliable heuristic after realising in Chrome if the mouse position is screenX=0, screenY=0 it means the event was either triggered by a mouse click at screenX=0, screenY=0 or a keyboard.
As someone whose worked on accessibility projects in the past, this is a really stupid idea imo, and had I reviewed a PR with something like this I would have asked it to be reworked. While I agree browsers should ideally do the same thing, the real issue here seems to me that screenX and screenY make little sense on "click" triggered by a keyboard.
The solution ideally would be a new event ("trigger" or something) which doesn't emit a "MouseEvent", but something more generic which could apply to both a keyboard and mouse event and provide information about the "trigger" source. Imo keyboard "clicks" are weird to begin with and would ideally be fixed with a more appropriate event.
That said, I understand this doesn't currently exist in the spec and a solution is needed now. Therefore I don't see why they couldn't also bind to a "keydown" event then if the click is triggered alongside the "keydown" on the same element, assume it was a keyboard press. That would be far more reliable and far less hacky than what they're doing, and would allow them to trigger from the single event with a bit of extra code to detect if it was a keyboard or mouse.
Hello I am the author and, yes, I totally agree some generic 'trigger' event would be far better.
To use the keydown event means we have to assume that the 'Enter' and 'Space' are the only keys we need to check for. Using 'click' is far safer from an accessibility point of view because it will always respect what your device considers to be some kind of input trigger.
As stated in the UI Events spec:
> For maximum accessibility, content authors are encouraged to use the click event type when defining activation behavior for custom controls, rather than other pointing-device event types such as mousedown or mouseup, which are more device-specific. Though the click event type has its origins in pointer devices (e.g., a mouse), subsequent implementation enhancements have extended it beyond that association, and it can be considered a device-independent event type for element activation.
And to be clear, I would not want to do it this way if it was for some really critical difference in behaviour between pointer or keyboard interactions. I'm OK with this strange mechanism here because the fallback behaviour is not that different. If you're on Safari, for example, which can't check for `screenX === 0`, then all that happens is that there will be an animation when you open the menu.
However, sadly, because of the ways various developers have added to this code over the years, it's broken that fallback behaviour and stopped it working entirely. So I've just finished a refactor to sort that out and it will hopefully be going live soon.
I suspect you are checking for the coordinates because you can't fully trust the event type.
I currently have an open semi-related bug, also in a menu dropdown component (where we also want to focus the first item when triggered via keyboard). My issue is that when Windows Narrator is used, the space/enter triggers a mocked click event instead of the keydown. We could check for the position like you do.
Unfortunately, accessibility is often hacky both on the content side, but also on on the browser/screen reader side.
While I can understand the author's need for screenX and screenY, the question remains. Why would screenX return the real screenX position instead of the position within the renderer (I don't think that exists?) or the rendered page (layerX and layerY)? The author's need would be met the same with the renderer position and window positions wouldn't be leaked to all visited websites.
> Often when opening a menu, we don't want a slightly different behaviour around focus and animations depending on if the user 'clicks' with a pointer or keyboard.
Is the word “don’t” a mistake which gives the sentence the opposite of the intended meaning?
> All we had to do was change the isInvokedByMouse to check that screenX and screenY don't equal 0, rather than checking if they are greater than 0.
It's obviously extremely unlikely but what if the mouse is actually at 0,0 when the user clicks? I'm not very familiar with JS, is checking for != 0 really the best/only way to do this?
EDIT: actually upon going back, I realized I didn't fully process this sentence originally but it seems to address this:
> We should probably do further refactoring of the event handler function, since it's complicated by the fact that it also handles keydown events. For now, though, this fix will do just fine.
It seems that querying the screen position is just a heuristic they came up with to determine the nature of the event. Instinctively I would use instance of MouseEvent for this, but even this feels risky/hacky to me.
My question is why they're relying on those heuristics. My guess is that toggleMenu is being used by multiple event handlers. Or maybe there's something else going on that is specific to their codebase.
It's hard to judge without knowing the full picture.
Maybe browsers will report `click` events that aren't actually created by a pointer device (maybe a screen reader or something?). But that still raises the question of why you would care. It seems to me like if the platform wants to report it as a `click`, your app should treat it as one and not try to get "clever" about it.
For compatibility with the Web content, the `click` event has become a device-independent activation event. Sites can't be expected to listen for events from every kind of device, so instead all devices send `click`s.
They care, because focus for keyboard-controlled screen readers sending "click" should behave differently: an element inside the menu should receive focus, even though it's not the element that has been clicked. Otherwise if focus stayed on top-level menu bar, screen reader users would be lost, and had to navigate to menu's content themselves.
No its not. You can do media select on if the primary input device is a pointer device (and, further, if it has high accuracy) and then filter on that.
I used it to select which layout to show in the past.
If you want to listen to input on touch only then you can do that and call preventDefault on the event so that the browser does not then cause a click event. Or you can just save yourself the trouble and write a click handler.
Kudos to BBC for investing in accessibility, and unfortunately discovering a nasty bug.
As an industry, why haven't we figured out how to make drop downs that consistently open for all users? Is accessibility just that hard? Are there web frameworks/web components BBC should be using that already handle this?
I've been wary (as a backend-focused full-stack developer) about tweaking the browsers components. There's so much nuance to how they work and the implementations are battle tested. The idea of creating a custom text box (for example) without doing extensive research of text box behavior across platforms seems ripe for failure. I notice broken copy/paste and dropped characters often enough (on major corporate sites too). Why are text boxes broken in 2024? React feels arrogant to me now.
Personally, I've tried to handle this with server-side templates, CSS frameworks like Bulma, minimal JS. It's not viable for sites demanding slick custom branding (vanity?) but my text boxes work and my site doesn't cost a fortune to develop. Is it accessible to BBC standards? I'm not sure.
> As an industry, why haven't we figured out how to make drop downs that consistently open for all users? Is accessibility just that hard? Are there web frameworks/web components BBC should be using that already handle this?
I don't know the answers to all the questions, but "is accessibility just that hard" is a firm, concrete, YES.
Here's some real world examples, modals. If you are not a vision impaired user, you can see what's going on when you're presented with a white box containing ui components swimming in a sea of "don't touch this bit" grey.
If you're using a screen reader there's no guarantee that you'll receive any of that information. When your screen reader controls tab through the UI elements and you land back at the top of the box, will your particular screen reader report that to you? Will it list the available interactable elements? Will it list them in the same order as the other screen readers? How about on phone? How about on Mac? Will your screen reader and browser report the inputs right, or will it silently allow the user to fall out of the modal and back into the rest of the site?
When it comes to accessibility you can't trust that the OS, browser and the screen reader are cooperative, or even that they'll do something sane in the right situation.
In 2019 I had to log a bug with VoiceOver + Safari because a negative CSS margin could cause screen readers to read RTL text blocks out of order. Users with unimpaired vision would see "9/10/2019" and on the screen reader you would hear "ten slash nine slash two-thousand-and-nineteen", as a stopgap measure we had to set the text aria-hidden and put in an invisible p tag there with the correctly ordered text so screen readers wouldn't choke on it.
All this to say, sometimes when you see some jank code relating to accessibility there really isn't a better way to do it. Even if you dumped everything, turned the codebase upside down and focused on accessibility first you'd see stuff inexplicably break the moment JAWS or VoiceOver updates.
Agreed. But ultimately many issues arise when user agents customize these elements in very dubious ways. It's ok for the most part, but there's a reason behind reset.css files, and I wager a more nuclear approach was used here to circumvent these issues completely.
This seems like a self-inflicted bug resulted from incorrect heuristics (assumption that positive screenX/Y values represent mouse event), and the investigation was complicated by inadequate tracing/logging.
Instead of checking the more appropriate property that other commenters have suggested (pointerType), I'm a bit surprised that the solution given by the author is to patch up the shaky heuristics even more:
> We could deduce from our final two clues the solution: we need to check for negative numbers as well as positive numbers when checking the screenX and screenY coordinates.
Actually that is what we're going to do. I'm hoping to merging in code, soon, that will change it to use pointerId === -1 and then fall back to screenX === 0.
At the time this code was originally written four years ago or whenever it was, not all browsers used PointerEvent for click.
I've searched for reasons and couldn't find much. The fact that a website can know where a browser window is located (window.screenX/window.screenY) and that clicks position can be reported in that coordinate system sounds insane for a desktop. TOR Browser seems to spoof screenX and screenY to avoid fingerprinting.
Has anyone seen good use-cases for that feature? I'm thinking about dual window applications that interacts with each other (I think I saw a demo of something like this a while ago on HN but I wasn't able to find it again), or sites where behavior depends on their location on the virtual screen.
Back in html 4 days we did this shenanigans all the time. I worked on very over the top sites that played with multiple windows talking to each other and moving in synchrony. I’ve tried looking for examples on archive.org (eg I know we did this a ton on flash heavy sites like design museum in London ) but alas the ones I was looking for a broken in that archive.
If you're reacting to click events, you might want to know the coordinates of where you're clicking. I mostly use this for click and drag stuff since you can get the delta between events and update position of the thing being dragged.
As for why they're checking for coordinates instead of checking for event.type is beyond me. Still I appreciate the write up, it is a good puzzle and relatable to come across code you didn't write and ask, why is it important that the click coordinate is nonzero? Why can't we just check that event.target is the button we want to activate? Why are we using JavaScript at all when a details/summary tag would do the same job?
Relative coordinates sure, but why would you need the absolute position?
I'm with you on the second point - as unlikely as it is for the click to occur at the origin, it's still a legitimate value being abused as an indicator of something that might not actually be true - quite frankly the code was bad to begin with, and it was still bad after the fix.
Uh. So they can keep track of what the user is doing?
Why would you just send a document when you can generate a heat map of where the user is on your website. And then complain about the performance and wonder why it costs so much to run a modern website.
The issue isn't so much the coordinate of the mouse within a page, but that the coordinates are relative to the virtual screen layout. It describes where your window is located on the screen(s) and the click is expressed in screen coordinates. Mapping those coordinates to your website's renderer coordinates requires additional calculation.
layerX[1] while non-standard is supported and returns a position relative to the top of the page or the top of the parent element. This makes coordinates positive only and 50,50 is the same for all users. For screenX, 3000,1567 is the same coordinate as 15,37 depending on where the window is located.
Why are you filtering for screen coordinates in the first place? What if the user is using e.g. an alternative input device that doesn't have a screen? The `click` event should be enough indication that the user has tried to activate the menu. Why reinvent the wheel?
> Why are you filtering for screen coordinates in the first place?
FTA:
> The isInvokedByMouse was checking whether the click event was invoked by a mouse or touch pointer – rather than a keyboard – by checking if the screenX or screenY coordinates were a positive number.
They were trying to detect whether it was keyboard or mouse activation, and whoever wrote it assumed that screen coordinates of mouse events would always be positive.
> They were trying to detect whether it was keyboard or mouse activation
But the code shown doesn't do different stuff for Keyboard vs Mouse, it just checks if it is either one of them. Why would you do that? Which other click event types are there that you want to filter?
I omitted that code to keep the article simpler. We want to turn off the animation on keyboard, and move the focus to different things depending on if it's a keyboard or pointer user.
They are breaking Curly’s Law here (https://blog.codinghorror.com/curlys-law-do-one-thing/), the screenX/Y location mean one thing: the location of the mouse event on screen. Using it for anything else is a huge red flag.
Right, but the article doesn't explain why they cared whether it was keyboard or mouse activation. The linked WebKit bug goes into more detail, but it's still lacking an explanation of why alternative, more common and widely deployed strategies (like having a capturing keyup event that triggers earlier in the render loop) wouldn't be a better idea instead.
Also, if you really want to determine whether a MouseEvent is "real" or "synthetic", and you don't want to worry about when mouse events are triggered relative to keyboard events in the event loop (although it doesn't seem very hard to keep track of), it seems like you can use the current click count (i.e., event.detail). This works on both Chrome and Safari—it's 1 for mouse clicks, and 0 for keyboard "clicks", but the spec text is also a little contradictory and under-specified: the "click" event handler says that "the attribute value MUST be 1 when the user begins this action and increments by 1 for each click" (https://w3c.github.io/uievents/#event-type-click) but it also says "This MUST be a non-negative integer indicating the number of consecutive clicks of a pointing device button within a specific time" (https://w3c.github.io/uievents/#current-click-count), and the definition of "pointing device button" seems to exclude synthetic keyboard events (since those are handled separately)
I've just posted another blog post to provide context and answer some questions people had. Like why are we checking screenX === 0 in the first place? Why do we want different behaviour depending on keyboard or mouse inputs? And I've explain how I've refactored it to prevent more mishaps. I hope it's helpful.
What is the correct way to check if you have a mouse click rather than a keyboard click? I would be tempted to set a module level flag based on what most recently happened - if there was a "mousedown" event more recently we'll set isKeyboard to false and isMouse to true and vice-versa if "keydown" is pressed.
Then we wouldn't need the isInvokedByMouse and isInvokedByKeyboard functions.
Is there a better way? Relying on screen coordinates for this is highly dubious and I would argue a hack.
The whole point of the learning from this article is don’t use hacks and make assumptions about what is happening. This is not for that, this is about how many clicks happened. You might get away with it but it’s not good code.
Very interesting but why would a browser report different coordinates depending on the monitor? I'd assume browser would treat the webpage as if it was the entire screen irrespective of which display it was on. Is there a reason for web APIs to have this information at all? Seems like a potential security/information/tracking risk.
Skill issue? Should've used viewport, not screen coordinates (read with .clientX and .clientY). I don't see why negative values in screen space is a bug.
Checking for position != 0 still does not fix it. On Windows and Linux, it is possible for a window to span multiple displays. Someone could set things up so that a valid click target for the button ends up in the (0, 0) position.
Reminds me of a bug on our website where a bright green image turned to yellow on an external monitor. Turns out it is a color space issue on wide gamut screens [1].
Why do you even have to check if screenX and screenY are non-zero (as opposed to just checking typeof event.screenX == "number")? Wouldn't that mean (and this is a wild edge-case) that if someone positioned their browser window so that the menu was in the top left corner (at position 0,0) the event handler would break again? Is this to block synthetic click events like (<div />).click()? Keyboard events don't have a screenX or screenY from what I remember as well.
> Keyboard events don't have a screenX or screenY from what I remember as well.
Remember that this is on 'click' events. The 'click' event type is a bit of a misnomer: it’s arguably more “activate” than “click”, because (depending a little on platform conventions) it also triggers on Space/Enter if the element is focused. But importantly it’s still a 'click' event: so it’s still a PointerEvent, not a KeyboardEvent. Since various of the fields aren’t appropriate, they get zeroed. So, screenX == 0 && screenY == 0 means either that the pointer is at the top left of the screen, or that the event was not generated by a pointer (that is, by a keyboard).
Try it out yourself, if you like, by loading a URL like this and activating by keyboard and by mouse and comparing the events.
In reality, if you used such a check more generally, you’d find it wasn’t such a rare edge case: if the page is fullscreen, corner clicking is actually pretty common, and if you have buttons that are supposed to be in the corner, they should certainly activate on exact-corner clicks. (See also Fitt’s law <https://en.wikipedia.org/wiki/Fitts's_law#Implications_for_U...>.)
This is just a heuristic to determine if the event is keydown or click.
In the article the author says that the issue is that the same function is handling both events, and they will work on refactoring it to something better.
The normal approach is just have different functions answering to different events. Or using more precise information about the event [1], instead of a heuristic.
Yeah, comparing to zero is still the wrong thing to do, was dissatisfied after reading the article because the problem still isn't really solved.. just swapping one spooky edge case with another.
Who knows, they probably broke the menu for keyboard navigation, voice navigation, eye tracking or something like that. This is one of those cases where you could really "make it make sense" by just using something CSS based.
I would not write a blog post advertizing how I filed a bug with WebKit for my own coding 101 error: unsafe assumptions and relying on side effects and heuristics.
Actually not just 101, it's basically with all of us at all levels and for life. So they're in good company having made a mistake everyone makes all the time, but it was a mistake on their part not a bug in WebKit, nore even a "interoperability issue" in WebKit or any browser.
They say they weren't aware that negative values were possible and that different browsers produce different values.
Ok, but neither of those matters.
If the function is even allowed to contain or express a negative value (IE right at the lowest basic level, is the returned data type actually a uint, or is it anything else? a regular int? a string?) then negetive values were always a possibility even if you personally never saw one before. Saying "I didn't expect a number below 0" is barely any different from saying "I didn't expect a number above 10000".
The discrepency between browsers doesn't matter and isn't the browsers fault that it tripped you up. You just made a standard boring unsafe assumption like every other programmer ever.
The entire problem is that you cared about something you don't actually care about.
You assumed that there was meaning in the absolute position of the window or the mouse pointer, when there never was, and you don't actually care about those anyway. The absolute position is like the actual internal-only row number in a db. Every row has a unique one, but it's none of your business what it is. There is only meaning in it's position relative to something else like a button, or relative to it's previous position to track movement.
Similarly checking for 0,0 and assuming that means keyboard is just another false heuristic that doesn't actually prove any such thing. The specs may or may not promise that the value will be 0,0 in the event of a keyboard initiated click, but no way it says that it can't be 0,0 any other way.
Don't de ashamed of this error because it's common, but don't be proud of calling these WebKit or browser interoperability bugs.
Do write up and publish the experience though as a warning and lesson to new developers about assumptions and heuristics and relying on side effects that just happen to work on the developers laptop when they tried it once and shipped it.
Also "it's for accessibility" doesn't change anything. Trying to be too smart just makes it worse. Actually that's true just generally for everything.
I appreciate and fully agree with your perspective on this. A thing was treated as if it were some other thing, and acceptably failed to be that other thing.
That does seem quite obviously better. Even when insisting on checking coordinates why use the coordinate system carried by the event that you have the least control over. Why not .pageX/.pageY, which can't trigger the failure case as long as you make sure the element isn't in the far top left.
Based on the other replies here it seems like it's to differentiate taps vs mouse clicks, keyboard events in js don't have a screenX or screenY property (you can run this in your browser console on this HN post to confirm):
(() => {
const logEvent = event => console.log({
coords: [event.screenX, event.screenY],
type: event.type
});
const input = document.querySelector("textarea");
// use "keydown" instead of "keypress" to detect all keyboard input instead of just character producing input
input.addEventListener("keydown", logEvent);
input.addEventListener("click", logEvent);
})();
Type in or click on the reply text input and you'll see that the coords array is undefined for all keyboard events. I haven't tried this equivalent on a touch device however, so not sure how it's handled there.
Some people are pretty negative about the code presented here (which the author criticised in a second post[0]), which makes me wonder, as someone who thankfully hasn't programmed anything non-trivial in Javascript in years: Is checking for (0,0) actually bad code, or is it simply making the most of a bad API?
Bear in mind the following:
> Also, thank you to Patrick H. Lauke from TetraLogical (editor of the Pointer Events Level 2 spec) for his comment on Mastodon that suggested improving the offending code by checking for pointerType in the PointerEvent interface instead of screenX and screenY.
The code provided in the second post[0] uses pointerType if available, else falls back to checking for (0,0). Also:
> At the time this code was originally written four years ago, not all browsers treated click events as PointerEvents. They used the MouseEvent interface, so pointerId or pointerType wasn't an option.
Am I missing something? Why are things highlighted in blue? And why is " on Chrome and Firefox, the screenX and screenY properties were negative numbers" repeated so often? I had to go back to reread to make sure I didn't somehow skip back a few lines without noticing it.
Most people applaud the tech skills needed to reproduce this bug (I do too). Some people shiver at the Mount Everest of clumsy JavaScript code needed to provide a usable web experience.
Can you believe that every app has a team of people who just maintain the app's code?
I would say many people who browse Hacker News given the fact that, at the time of writing, this is the 4th ranked article. Moreover, who cares which company the person writing this post happened to be working for when discovering this interesting browser edge case?
For anyone who didn't click through to the WebKit bug report the author submitted, a WebKit dev asked him to clarify why the BBC finds it beneficial to be able to detect that the event was sent from a keyboard. This is the author's response:
> Ironically, I want interoperability on this to help with use cases relating to accessibility.
> I work at the BBC and, on our UK website, our navigation bar menu button behaves slightly differently depending on if it is opened with a pointer or keyboard. The click event will always open the menu, but:
> - when opening with a pointer, the focus moves to the menu container.
> - when opening with a keyboard, there is no animation to open the menu and the focus moves to the first link in the menu.
> Often when opening a menu, we don't want a slightly different behaviour around focus and animations depending on if the user 'clicks' with a pointer or keyboard.
> The 'click' event is great when creating user experiences for keyboard users because it is device independent. On keyboards, it is only invoked by Space or Enter key presses. If we were to use the keydown event, we would have to check whether only the the Space or Enter keys were pressed.
Source: https://bugs.webkit.org/show_bug.cgi?id=281430
This is fascinating, because the naive English interpretation of the code and the comment on that WebKit bug don't match the actual structure of the code. Here's the relevant code:
Ignoring the actual conditions entirely, this code seems to be trying to categorize the event into one of two categories: mouse or keyboard. But what it actually does is to categorize into one of four categories: (mouse and not keyboard), (keyboard and not mouse), (keyboard and mouse), and (neither keyboard nor mouse). And, as the original bug shows, (neither keyboard nor mouse) is handled inappropriately. One might wonder whether (keyboard and mouse) works well.Either the code should be deliberate about the fact that (is it a keyboard) and (is it a mouse) are separate booleans, or the code should be structured so that the actual categories are mutually exclusive. For example:
and use !isInvokedByMouse to check for keyboardiness, or: or, perhaps even better:This is a great comment ^ whenever you use two booleans like this, you're opening yourself up to "unrepresentable state" logic errors. Finding and noticing this in code can be tricky, but it's a great example of something that should be caught during code review.
Not sure if exactly the same thing but reminds me of "Booleans are a trap"
https://katafrakt.me/2024/11/09/booleans-are-a-trap/
Wait—that last state doesn't make sense. With a real door, you can technically turn the key while it's open, but does that meaningfully change its state? Yet our model allows this impossible combination.
Funnily enough, I have a physical door & lock that OFTEN gets in to this state - it's exactly as irritating as it sounds, and it has very meaningful impact on it's state (it then can't be closed without first unlocking the lock!)
I’ve come to the conclusion that most programming problems like this can be solved my having programmers work as building superintendents for 3 months. Practical experience will open their eyes to how the world really works.
This leads to another wonderful gap, where programmers are expected to understand the world[0], but the world isn't expected to understand programming[1]
[0] Business, etc
[1] System design, security, database management, cost vs speed trade-offs, SCM, etc etc etc
This is a good account of a software modelling process. But, this is not specific to booleans. Database constraints involve the same issue. One motivation for encapsulation in software was to preserve constraints in the data (or in other words, disallow improper states). Encapsulation allows only internal functions to manipulate the data, and the developer just needs to checks that these functions are doing it correctly.
Or, as pointed out in the post where multiple booleans are merged into a single enum, encode the constraints into the data itself ie. use the constraints of the host programming language.
But this wont be possible in general - for instance if your language doesn't have sets/dictionaries, how would you encode uniqueness of values directly using arrays and lists? It would have to be done using interface functions.
Yet another article prematurely optimizing. It’s like these people have nothing better to do. I’ll wait for my code to get into stupid edge cases first and then fix it. Even if you spend your time avoiding booleans you will still find yourself with some new contradictory state and have to fix it differently anyways.
Coming up with proper representation for your state is almost always worth it. If anything it's the opposite of premature optimisation - normalise first, only denormalise after you've proven that it's needed.
Huh? This isn’t premature optimization unless you consider trying to write correct code “optimizing”.
The issue isn’t booleans the issue is that the code doesn’t handle all the states described by the two booleans, when all are possible.
Make the impossible states unrepresentable using an enum of only the possible boolean combinations.
Isn't that exactly what the article prescribes?
Did you read the context?
No I read wrong and missed it. If all four states are actually possible an enum can still be a win if you have a compiler that can check for exhaustive switch case statements. Or especially if updating a system where it used to be only three were valid and now something changed so that all four are.
And just to add an extra corner case, Mobile Safari changes the click behavior if an onclick handler is registered - even if the click handler is an empty function that does nothing. The onclick handler itself acts as another Boolean that affects the browser's behavior. I don't remember the exact details because it was a corner case (I think to do with scrolling or popovers or touchcancel - I know it was surprisingly nasty). This page mentions something else http://www.quirksmode.org/blog/archives/2010/09/click_event_... "Fortunately it’s pretty easy to solve: you have to make the element clickable by giving it an onclick event handler of its very own. That handler can be empty; as long as it’s present it will make any element clickable.".
Well said. I usually go through several phases when dealing with this kind of thing, I start with "flags for different conditions" type logic, then when things get too complex I refine this into "set of explicitly defined states", and then as the state code evolves from 'ideal happy path' to 'production-tested code' I gradually realise that most of the original combinations represented by the original flags actually CAN happen in weird edge-cases.
> "set of explicitly defined states"
this is called algebraic data type (https://en.wikipedia.org/wiki/Algebraic_data_type), and it is the best way, imho, to reduce bugs in code.
By making it easy to pattern match, it reduces the possiblity of producing an invalid state, because at the time of definition, you have to figure out how to get that type (and checked by compiler).
I frequently use algebraic thinking to verify my sanity with types. Bool times bool is 4. We have two states we want to represent so bool times bool so the wrong thing.
Seems like a non bug to me.
The first mistake the developer made, was that he wanted to create a different user experience between keyboard and mouse. Stick to what you get by default and design your components so they work for both usecases. Don't try to be smart when it comes to accessibility.
What he ended up doing is what I would have considered a hack. A solution that inevitably breaks or has side effects.
The reason there rarely are good handles to do things differently in accessibility context, is because it's not something that's meant to be handled differently.
See I work in accessibility. Like I provide and create solutions direct to end users with complex needs. Not regular web accessibility. I get the view of this. It’s the same idea of universal access. But actually I don’t fully agree. Yes. If you can stick to this principle - and do try / but I promise you edge cases - which in itself is what accessibility users are all about - cause headaches. At some level you have to do custom stuff. It’s the best way. Take for example switch users. Yes. If your ui is tab able - great. But what if you need your items scannable in frequency order. Your tab index needs to change to meet the end users needs. Or eye gaze users. The accuracy level changes. Add in cognitive issues. You can’t just make a one size fits all interface. At some stage you need to significantly customize it. You can’t rely on a user just learning a complex system level interaction technique- if they can’t do that you have to customise on an individual level.
Of course there are edge cases, I work with accessibility too, for an app in the public sector where WCAG rules are no joke, so I know this as well but even so, we don't build custom accessibility UI for our users. We (try to) build the UI with accessibility in mind so it's scalable, can be used and navigate properly by voice over and keyboard.
On mobile it's not perfect either but in general you do have features to change stuff like. focus, grouping of elements, how the keyboard navigate the view stack, how to access a button through custom actions and like you mention, change the tab index programmatically.
Even so, not everything can be fixed or handled through standard accessibility means and as such hacks will inevitably make it into the products.
I get what you're saying but I still think that making things accessible and designing with common accessibility in mind should be default and as such it has to be thought about when designing and developing from the get go. Having to create custom interfaces to fulfill a specific need might be a good fit for some things but not when developing apps and websites unless you're targeting that user-group specifically.
Well said! It certainly applies to web development as well. Sadly, sometimes more complex solutions are needed - especially when based on user research.
Also note, it’s been about 10-15 years since the rules changed and if you want to work on a government contract, accessibility is mandatory.
I am the author.
> The first mistake the developer made, was that he wanted to create a different user experience between keyboard and mouse. Stick to what you get by default and design your components so they work for both usecases.
We have. The behaviour is mostly the same whether you're using the keyboard or a pointer (mouse/touch/pen). The only difference is that, for keyboard users, we want to turn off the animation and move the focus to the first link in the menu instead of focussing on the menu's parent <ul>.
The problem was that, as various devs have iterated on the menu over the years, it's broken the fallback behaviour. For my colleague on the funny multi-monitor set up, it should have fallen back to the keyboard no-animation behaviour with no real major difference to the UX, but instead it fell back to the no-JS experience.
So yes, generally don't try to be smart with accessibility, avoid ARIA attributes except where necessary, etc, but click events are the universal input event and work on any kind of input device and have perfect browser support. It's far better for accessibility using them instead of a mix of keydown and mousedown or pointerdown, and potentially missing other kinds of input events.
As I stated in another comment, if it was a scenario where there needs to be a major difference in behaviour between keyboard and pointers, then I would rather use separate keydown and pointerdown events.
The _mostly_ same behavior is what caused the problem though :P I'm curious, did these solutions come to pass because you had to make adjustments based on actual user feedback or was it just a developer trying to think ahead? I'm questioning whether forcing the user to tab to get to the menu item is a hindrance at all or whether the animation was a problem.
Maybe the former could have been solved using ARIA tags or maybe it would require bigger changes to the component itself. Accessibility is a roller-coaster for all these reasons alone.
> we want to turn off the animation and move the focus to the first link in the menu instead of focussing on the menu's parent
Why not just always turn off the animations? Why not just always move the focus to the link?
What is the benefit of the animation to the user? What is the benefit of focusing on the menu’s parent to the user?
One rule of thumb with accessibility is that accessible products are usually better for everyone.
> What is the benefit of the animation to the user?
Animations enhance experience by drawing attention to state changes and providing intuitive feedback to user actions.
If you don't find them engaging or useful, that's fine - and you can set prefers-reduced-motion to true on your client - , but many people do.
> What is the benefit of focusing on the menu’s parent to the user?
The first item was not interacted with nor navigated to, therefore it shouldn't be focused under normal circumstances. It would be unexpected behavior.
Focusing the first item in keyboard interactions is an accessibility hack recommended by W3C:
https://www.w3.org/WAI/ARIA/apg/patterns/menubar/
> Animations enhance experience by drawing attention to state changes and providing intuitive feedback to user actions.
> If you don't find them engaging or useful, that's fine - and you can set prefers-reduced-motion to true on your client - , but many people do.
The question here is not "does an animation have worth", but how is that worth tied to whether an onclick event originated from the mouse or the keyboard? Your reasoning applies equally to both, and thus leaves us still confused: why are we varying the animation by input device?
The question was "Why not just always turn off the animations?"
---
> why are we varying the animation by input device?
Another user explains it here: https://news.ycombinator.com/item?id=42176540
I don't actually agree, I think you can keep the animation and still make the content available immediately for screen readers. (And of course, keyboard navigation is not just for screen reader users!) Maybe someone else knows of some issue I don't.
> The first mistake the developer made, was that he wanted to create a different user experience between keyboard and mouse.
No, they wanted to make them the same. It's just to give a blind person the same experience as a seeing person requires different things because they operate differently for obvious reasons. For example, a blind person can't see when an animation has finished. They expect that menu to be available once they've triggered it. However, seeing people see the dropdown appearing and then go to use it once it's ready.
> Don't try to be smart when it comes to accessibility.
In all seriousness, considering the state of accessibility as is, I think going outside the box isn't trying to be smart. It's actually being smart. The BBC frontend team is probably at the forefront of making high-traffic websites extremely usable.
> a blind person can't see when an animation has finished. They expect that menu to be available once they've triggered it. However, seeing people see the dropdown appearing and then go to use it once it's ready.
A blind person can and should get cues from their assistive technologies that an item is is being loaded and is shown, either using announcements or aria tags that provide this information to the user.
While its fine to expect that something is available immediately, that's rarely a realistic expectation, whether you're blind or not.
If you remove the can, I would agree.
> For example, a blind person can't see when an animation has finished. They expect that menu to be available once they've triggered it. However, seeing people see the dropdown appearing and then go to use it once it's ready.
For my two-cents, the BBC was simply trying too much to be "cutesy". Don't animate anything, because the silly animation on mouse click simply makes the website feel slower overall. Just open the menu as fast as the user's browser will open it.
That wouldn't change anything. They want the first element of the menu to be focused when "clicked" from a keyboard but not from a mouse. The animation doesn't affect that.
> Don't animate anything
Animation helps to correlate screen elements. Without animation it actually takes longer to establish the mental relationship between the action and the result.
We as developers think we like zero animation. Probably not least because animation is harder for us to implement than just view(state).
But it's very easy to create cases where the UX sucks because things happen instantly especially as inherent complexity of the app increases.
> Don't try to be smart when it comes to accessibility.
“Don't try to be smart” alone is good advice in general and everywhere. Also in UI “don’t try to be original”
I prefer the line: “make it as simple as possible, but no simpler”
Sometimes complexity is simply the right tool for the job. Complexity is essential and valuable in all sorts of places - like fuzzers, LLMs, compilers, rendering engines, kernel schedulers and so on. But projects only have so much complexity budget to spend. I think I've spent my whole career trying to figure out how to spend complexity wisely. And I could spend decades more on it.
The BBC site has a "search box" that's actually a button that brings up the real search box. Always feels confusing. At least it's consistent across News / Sounds / iPlayer.
This is becoming more and more common. Can’t say I’m a fan either.
I think there is no browser bug here, though using negative screen coordinates is probably going to be surprising to a lot of folks.
However, the BBC's intent seems quite sound to me from an a11y point of view, and their commitment to a11y is commendable. Though it's likely that for some browsers their attempts at defining their own a11y experience will result in a bad UX anyways.
Does anyone else find this write up confusing?
My understanding from this is that BBC want slightly different behaviour depending on whether it's a mouse or keyboard "click" (keyboard shouldn't show the animation and should focus the first link in the menu).
However, they also want the ease of binding to a single event and while binding to "click" can do this, they have no way to tell whether it was a mouse click or keyboard press which triggered the event.
To solve this they're using an unreliable heuristic after realising in Chrome if the mouse position is screenX=0, screenY=0 it means the event was either triggered by a mouse click at screenX=0, screenY=0 or a keyboard.
As someone whose worked on accessibility projects in the past, this is a really stupid idea imo, and had I reviewed a PR with something like this I would have asked it to be reworked. While I agree browsers should ideally do the same thing, the real issue here seems to me that screenX and screenY make little sense on "click" triggered by a keyboard.
The solution ideally would be a new event ("trigger" or something) which doesn't emit a "MouseEvent", but something more generic which could apply to both a keyboard and mouse event and provide information about the "trigger" source. Imo keyboard "clicks" are weird to begin with and would ideally be fixed with a more appropriate event.
That said, I understand this doesn't currently exist in the spec and a solution is needed now. Therefore I don't see why they couldn't also bind to a "keydown" event then if the click is triggered alongside the "keydown" on the same element, assume it was a keyboard press. That would be far more reliable and far less hacky than what they're doing, and would allow them to trigger from the single event with a bit of extra code to detect if it was a keyboard or mouse.
Hello I am the author and, yes, I totally agree some generic 'trigger' event would be far better.
To use the keydown event means we have to assume that the 'Enter' and 'Space' are the only keys we need to check for. Using 'click' is far safer from an accessibility point of view because it will always respect what your device considers to be some kind of input trigger.
As stated in the UI Events spec:
> For maximum accessibility, content authors are encouraged to use the click event type when defining activation behavior for custom controls, rather than other pointing-device event types such as mousedown or mouseup, which are more device-specific. Though the click event type has its origins in pointer devices (e.g., a mouse), subsequent implementation enhancements have extended it beyond that association, and it can be considered a device-independent event type for element activation.
And to be clear, I would not want to do it this way if it was for some really critical difference in behaviour between pointer or keyboard interactions. I'm OK with this strange mechanism here because the fallback behaviour is not that different. If you're on Safari, for example, which can't check for `screenX === 0`, then all that happens is that there will be an animation when you open the menu.
However, sadly, because of the ways various developers have added to this code over the years, it's broken that fallback behaviour and stopped it working entirely. So I've just finished a refactor to sort that out and it will hopefully be going live soon.
I suspect you are checking for the coordinates because you can't fully trust the event type.
I currently have an open semi-related bug, also in a menu dropdown component (where we also want to focus the first item when triggered via keyboard). My issue is that when Windows Narrator is used, the space/enter triggers a mocked click event instead of the keydown. We could check for the position like you do.
Unfortunately, accessibility is often hacky both on the content side, but also on on the browser/screen reader side.
While I can understand the author's need for screenX and screenY, the question remains. Why would screenX return the real screenX position instead of the position within the renderer (I don't think that exists?) or the rendered page (layerX and layerY)? The author's need would be met the same with the renderer position and window positions wouldn't be leaked to all visited websites.
> Often when opening a menu, we don't want a slightly different behaviour around focus and animations depending on if the user 'clicks' with a pointer or keyboard.
Is the word “don’t” a mistake which gives the sentence the opposite of the intended meaning?
Hello I am the author and that was indeed a mistake. Whoops!
> All we had to do was change the isInvokedByMouse to check that screenX and screenY don't equal 0, rather than checking if they are greater than 0.
It's obviously extremely unlikely but what if the mouse is actually at 0,0 when the user clicks? I'm not very familiar with JS, is checking for != 0 really the best/only way to do this?
EDIT: actually upon going back, I realized I didn't fully process this sentence originally but it seems to address this:
> We should probably do further refactoring of the event handler function, since it's complicated by the fact that it also handles keydown events. For now, though, this fix will do just fine.
It seems that querying the screen position is just a heuristic they came up with to determine the nature of the event. Instinctively I would use instance of MouseEvent for this, but even this feels risky/hacky to me.
My question is why they're relying on those heuristics. My guess is that toggleMenu is being used by multiple event handlers. Or maybe there's something else going on that is specific to their codebase.
It's hard to judge without knowing the full picture.
EDIT: Aha, there's an answer here: https://news.ycombinator.com/item?id=42174177
You can do a search on GitHub and there is quite a bit of code that is using event.screenX > 0. Maybe someone should file some bug reports.
Also, stack overflow suggests that exact code to "differentiate between mouse and keyboard triggering onclick": https://stackoverflow.com/questions/7465006/differentiate-be...
But they're already checking for event.name == 'click' in the revised code. So why would you want to filter out some legitimate click events?
Maybe browsers will report `click` events that aren't actually created by a pointer device (maybe a screen reader or something?). But that still raises the question of why you would care. It seems to me like if the platform wants to report it as a `click`, your app should treat it as one and not try to get "clever" about it.
For compatibility with the Web content, the `click` event has become a device-independent activation event. Sites can't be expected to listen for events from every kind of device, so instead all devices send `click`s.
They care, because focus for keyboard-controlled screen readers sending "click" should behave differently: an element inside the menu should receive focus, even though it's not the element that has been clicked. Otherwise if focus stayed on top-level menu bar, screen reader users would be lost, and had to navigate to menu's content themselves.
Interesting. Seems like something that should be exposed more explicitly.
Apply Chesterton's Fence principle and assume there are (hopefully) comments in the real code around why this has been put in place
No its not. You can do media select on if the primary input device is a pointer device (and, further, if it has high accuracy) and then filter on that.
I used it to select which layout to show in the past.
If you want to listen to input on touch only then you can do that and call preventDefault on the event so that the browser does not then cause a click event. Or you can just save yourself the trouble and write a click handler.
Kudos to BBC for investing in accessibility, and unfortunately discovering a nasty bug.
As an industry, why haven't we figured out how to make drop downs that consistently open for all users? Is accessibility just that hard? Are there web frameworks/web components BBC should be using that already handle this?
I've been wary (as a backend-focused full-stack developer) about tweaking the browsers components. There's so much nuance to how they work and the implementations are battle tested. The idea of creating a custom text box (for example) without doing extensive research of text box behavior across platforms seems ripe for failure. I notice broken copy/paste and dropped characters often enough (on major corporate sites too). Why are text boxes broken in 2024? React feels arrogant to me now.
Personally, I've tried to handle this with server-side templates, CSS frameworks like Bulma, minimal JS. It's not viable for sites demanding slick custom branding (vanity?) but my text boxes work and my site doesn't cost a fortune to develop. Is it accessible to BBC standards? I'm not sure.
> As an industry, why haven't we figured out how to make drop downs that consistently open for all users? Is accessibility just that hard? Are there web frameworks/web components BBC should be using that already handle this?
I don't know the answers to all the questions, but "is accessibility just that hard" is a firm, concrete, YES.
Here's some real world examples, modals. If you are not a vision impaired user, you can see what's going on when you're presented with a white box containing ui components swimming in a sea of "don't touch this bit" grey.
If you're using a screen reader there's no guarantee that you'll receive any of that information. When your screen reader controls tab through the UI elements and you land back at the top of the box, will your particular screen reader report that to you? Will it list the available interactable elements? Will it list them in the same order as the other screen readers? How about on phone? How about on Mac? Will your screen reader and browser report the inputs right, or will it silently allow the user to fall out of the modal and back into the rest of the site?
When it comes to accessibility you can't trust that the OS, browser and the screen reader are cooperative, or even that they'll do something sane in the right situation.
In 2019 I had to log a bug with VoiceOver + Safari because a negative CSS margin could cause screen readers to read RTL text blocks out of order. Users with unimpaired vision would see "9/10/2019" and on the screen reader you would hear "ten slash nine slash two-thousand-and-nineteen", as a stopgap measure we had to set the text aria-hidden and put in an invisible p tag there with the correctly ordered text so screen readers wouldn't choke on it.
All this to say, sometimes when you see some jank code relating to accessibility there really isn't a better way to do it. Even if you dumped everything, turned the codebase upside down and focused on accessibility first you'd see stuff inexplicably break the moment JAWS or VoiceOver updates.
Agreed. But ultimately many issues arise when user agents customize these elements in very dubious ways. It's ok for the most part, but there's a reason behind reset.css files, and I wager a more nuclear approach was used here to circumvent these issues completely.
I'm just trying to reason on their decision here.
This seems like a self-inflicted bug resulted from incorrect heuristics (assumption that positive screenX/Y values represent mouse event), and the investigation was complicated by inadequate tracing/logging.
Instead of checking the more appropriate property that other commenters have suggested (pointerType), I'm a bit surprised that the solution given by the author is to patch up the shaky heuristics even more:
> We could deduce from our final two clues the solution: we need to check for negative numbers as well as positive numbers when checking the screenX and screenY coordinates.
Actually that is what we're going to do. I'm hoping to merging in code, soon, that will change it to use pointerId === -1 and then fall back to screenX === 0.
At the time this code was originally written four years ago or whenever it was, not all browsers used PointerEvent for click.
Why are websites getting mouse position in screen coordinates in the first place?
I've searched for reasons and couldn't find much. The fact that a website can know where a browser window is located (window.screenX/window.screenY) and that clicks position can be reported in that coordinate system sounds insane for a desktop. TOR Browser seems to spoof screenX and screenY to avoid fingerprinting.
Has anyone seen good use-cases for that feature? I'm thinking about dual window applications that interacts with each other (I think I saw a demo of something like this a while ago on HN but I wasn't able to find it again), or sites where behavior depends on their location on the virtual screen.
Back in html 4 days we did this shenanigans all the time. I worked on very over the top sites that played with multiple windows talking to each other and moving in synchrony. I’ve tried looking for examples on archive.org (eg I know we did this a ton on flash heavy sites like design museum in London ) but alas the ones I was looking for a broken in that archive.
Maybe this[1] is the demo you were thinking of? It's what came to mind when reading this chain at least.
[1] https://x.com/wesbos/status/1727730566143803522
Not the one I was thinking one but definitely the vibe, thanks for sharing.
The webkit report talks about it (https://bugs.webkit.org/show_bug.cgi?id=281430), while the article doesn't seem to, for some reason. Another HN comment with summary: https://news.ycombinator.com/item?id=42174177
It’s useful for creating games where the graphics are composed of many small browser windows which interact with one another.
For example:
https://youtu.be/3al8prbfK5o?si=loNtyqIfMFkppm5V
That's nowhere near useful enough to justify such a significant security flaw.
I agree. Though that is cool.
Because this was easy to do during the 10 days allocated to develop JavaScript in 1995, and then backwards compatibility kicked in :(
If you're reacting to click events, you might want to know the coordinates of where you're clicking. I mostly use this for click and drag stuff since you can get the delta between events and update position of the thing being dragged.
As for why they're checking for coordinates instead of checking for event.type is beyond me. Still I appreciate the write up, it is a good puzzle and relatable to come across code you didn't write and ask, why is it important that the click coordinate is nonzero? Why can't we just check that event.target is the button we want to activate? Why are we using JavaScript at all when a details/summary tag would do the same job?
Relative coordinates sure, but why would you need the absolute position?
I'm with you on the second point - as unlikely as it is for the click to occur at the origin, it's still a legitimate value being abused as an indicator of something that might not actually be true - quite frankly the code was bad to begin with, and it was still bad after the fix.
I use it for a JavaScript-free CAPTCHA, works well, but it only sends the x and y of mouse click upon clicking on it.
Well, I used it for bounding box and reading order annotations, but that’s a pretty specialized use case.
Uh. So they can keep track of what the user is doing?
Why would you just send a document when you can generate a heat map of where the user is on your website. And then complain about the performance and wonder why it costs so much to run a modern website.
The issue isn't so much the coordinate of the mouse within a page, but that the coordinates are relative to the virtual screen layout. It describes where your window is located on the screen(s) and the click is expressed in screen coordinates. Mapping those coordinates to your website's renderer coordinates requires additional calculation.
layerX[1] while non-standard is supported and returns a position relative to the top of the page or the top of the parent element. This makes coordinates positive only and 50,50 is the same for all users. For screenX, 3000,1567 is the same coordinate as 15,37 depending on where the window is located.
[1] https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/...
Haha. Welcome to the world of analytics. Lots of sites are recording exactly what you are doing on their site including mouse movement at all times.
Why are you filtering for screen coordinates in the first place? What if the user is using e.g. an alternative input device that doesn't have a screen? The `click` event should be enough indication that the user has tried to activate the menu. Why reinvent the wheel?
> Why are you filtering for screen coordinates in the first place?
FTA:
> The isInvokedByMouse was checking whether the click event was invoked by a mouse or touch pointer – rather than a keyboard – by checking if the screenX or screenY coordinates were a positive number.
They were trying to detect whether it was keyboard or mouse activation, and whoever wrote it assumed that screen coordinates of mouse events would always be positive.
> They were trying to detect whether it was keyboard or mouse activation
But the code shown doesn't do different stuff for Keyboard vs Mouse, it just checks if it is either one of them. Why would you do that? Which other click event types are there that you want to filter?
I omitted that code to keep the article simpler. We want to turn off the animation on keyboard, and move the focus to different things depending on if it's a keyboard or pointer user.
They are breaking Curly’s Law here (https://blog.codinghorror.com/curlys-law-do-one-thing/), the screenX/Y location mean one thing: the location of the mouse event on screen. Using it for anything else is a huge red flag.
Right, but the article doesn't explain why they cared whether it was keyboard or mouse activation. The linked WebKit bug goes into more detail, but it's still lacking an explanation of why alternative, more common and widely deployed strategies (like having a capturing keyup event that triggers earlier in the render loop) wouldn't be a better idea instead.
Also, if you really want to determine whether a MouseEvent is "real" or "synthetic", and you don't want to worry about when mouse events are triggered relative to keyboard events in the event loop (although it doesn't seem very hard to keep track of), it seems like you can use the current click count (i.e., event.detail). This works on both Chrome and Safari—it's 1 for mouse clicks, and 0 for keyboard "clicks", but the spec text is also a little contradictory and under-specified: the "click" event handler says that "the attribute value MUST be 1 when the user begins this action and increments by 1 for each click" (https://w3c.github.io/uievents/#event-type-click) but it also says "This MUST be a non-negative integer indicating the number of consecutive clicks of a pointing device button within a specific time" (https://w3c.github.io/uievents/#current-click-count), and the definition of "pointing device button" seems to exclude synthetic keyboard events (since those are handled separately)
I've just posted another blog post to provide context and answer some questions people had. Like why are we checking screenX === 0 in the first place? Why do we want different behaviour depending on keyboard or mouse inputs? And I've explain how I've refactored it to prevent more mishaps. I hope it's helpful.
https://www.joshtumath.uk/posts/2024-11-18-how-i-refactored-...
What is the correct way to check if you have a mouse click rather than a keyboard click? I would be tempted to set a module level flag based on what most recently happened - if there was a "mousedown" event more recently we'll set isKeyboard to false and isMouse to true and vice-versa if "keydown" is pressed.
Then we wouldn't need the isInvokedByMouse and isInvokedByKeyboard functions.
Is there a better way? Relying on screen coordinates for this is highly dubious and I would argue a hack.
`event.detail` [1] is 0 for keyboard “clicks” and 1 for pointer clicks.
1: https://developer.mozilla.org/en-US/docs/Web/API/UIEvent/det...
The whole point of the learning from this article is don’t use hacks and make assumptions about what is happening. This is not for that, this is about how many clicks happened. You might get away with it but it’s not good code.
Very interesting but why would a browser report different coordinates depending on the monitor? I'd assume browser would treat the webpage as if it was the entire screen irrespective of which display it was on. Is there a reason for web APIs to have this information at all? Seems like a potential security/information/tracking risk.
Skill issue? Should've used viewport, not screen coordinates (read with .clientX and .clientY). I don't see why negative values in screen space is a bug.
https://developer.mozilla.org/en-US/docs/Web/CSS/CSSOM_view/...
I’m impressed he wrote a blog without having to get authorisation from a dozen layers of bbc management and lawyers.
Checking for position != 0 still does not fix it. On Windows and Linux, it is possible for a window to span multiple displays. Someone could set things up so that a valid click target for the button ends up in the (0, 0) position.
Reminds me of a bug on our website where a bright green image turned to yellow on an external monitor. Turns out it is a color space issue on wide gamut screens [1].
[1] https://issues.chromium.org/issues/41483538
> I checked the (DOM UI Events) spec to see if that was correct, but there didn't seem to be any specific information about it
Story of my life is finding out the details that apparently matter when I am debugging stuff has not been actually written in the spec (any)
RE this line of code at the bottom of the article:
Why do you even have to check if screenX and screenY are non-zero (as opposed to just checking typeof event.screenX == "number")? Wouldn't that mean (and this is a wild edge-case) that if someone positioned their browser window so that the menu was in the top left corner (at position 0,0) the event handler would break again? Is this to block synthetic click events like (<div />).click()? Keyboard events don't have a screenX or screenY from what I remember as well.> Keyboard events don't have a screenX or screenY from what I remember as well.
Remember that this is on 'click' events. The 'click' event type is a bit of a misnomer: it’s arguably more “activate” than “click”, because (depending a little on platform conventions) it also triggers on Space/Enter if the element is focused. But importantly it’s still a 'click' event: so it’s still a PointerEvent, not a KeyboardEvent. Since various of the fields aren’t appropriate, they get zeroed. So, screenX == 0 && screenY == 0 means either that the pointer is at the top left of the screen, or that the event was not generated by a pointer (that is, by a keyboard).
Try it out yourself, if you like, by loading a URL like this and activating by keyboard and by mouse and comparing the events.
In reality, if you used such a check more generally, you’d find it wasn’t such a rare edge case: if the page is fullscreen, corner clicking is actually pretty common, and if you have buttons that are supposed to be in the corner, they should certainly activate on exact-corner clicks. (See also Fitt’s law <https://en.wikipedia.org/wiki/Fitts's_law#Implications_for_U...>.)Fortunately, there’s a proper fix: `event.pointerId === -1` indicates non-pointer (viz. keyboard) activation.
Very informative, thank you!
This is just a heuristic to determine if the event is keydown or click.
In the article the author says that the issue is that the same function is handling both events, and they will work on refactoring it to something better.
The normal approach is just have different functions answering to different events. Or using more precise information about the event [1], instead of a heuristic.
[1] A suggestion was made by this poster: https://news.ycombinator.com/item?id=42174436
Yeah, comparing to zero is still the wrong thing to do, was dissatisfied after reading the article because the problem still isn't really solved.. just swapping one spooky edge case with another.
Who knows, they probably broke the menu for keyboard navigation, voice navigation, eye tracking or something like that. This is one of those cases where you could really "make it make sense" by just using something CSS based.
I would not write a blog post advertizing how I filed a bug with WebKit for my own coding 101 error: unsafe assumptions and relying on side effects and heuristics.
Actually not just 101, it's basically with all of us at all levels and for life. So they're in good company having made a mistake everyone makes all the time, but it was a mistake on their part not a bug in WebKit, nore even a "interoperability issue" in WebKit or any browser.
They say they weren't aware that negative values were possible and that different browsers produce different values.
Ok, but neither of those matters.
If the function is even allowed to contain or express a negative value (IE right at the lowest basic level, is the returned data type actually a uint, or is it anything else? a regular int? a string?) then negetive values were always a possibility even if you personally never saw one before. Saying "I didn't expect a number below 0" is barely any different from saying "I didn't expect a number above 10000".
The discrepency between browsers doesn't matter and isn't the browsers fault that it tripped you up. You just made a standard boring unsafe assumption like every other programmer ever.
The entire problem is that you cared about something you don't actually care about.
You assumed that there was meaning in the absolute position of the window or the mouse pointer, when there never was, and you don't actually care about those anyway. The absolute position is like the actual internal-only row number in a db. Every row has a unique one, but it's none of your business what it is. There is only meaning in it's position relative to something else like a button, or relative to it's previous position to track movement.
Similarly checking for 0,0 and assuming that means keyboard is just another false heuristic that doesn't actually prove any such thing. The specs may or may not promise that the value will be 0,0 in the event of a keyboard initiated click, but no way it says that it can't be 0,0 any other way.
Don't de ashamed of this error because it's common, but don't be proud of calling these WebKit or browser interoperability bugs.
Do write up and publish the experience though as a warning and lesson to new developers about assumptions and heuristics and relying on side effects that just happen to work on the developers laptop when they tried it once and shipped it.
Also "it's for accessibility" doesn't change anything. Trying to be too smart just makes it worse. Actually that's true just generally for everything.
I appreciate and fully agree with your perspective on this. A thing was treated as if it were some other thing, and acceptably failed to be that other thing.
Why are screenX and screenY relevant here as far as their code goes?
Where they using those values in their code?
Very interesting article but I'm missing the step where it would impact their code ...
...because when they are 0 one can infer that the event came from a keypress rather than a mouse. They want to know this.
Then it would make a lot more sense to check event.pointerType == 'mouse' vs event.pointerType == '' (for keyboard)
That does seem quite obviously better. Even when insisting on checking coordinates why use the coordinate system carried by the event that you have the least control over. Why not .pageX/.pageY, which can't trigger the failure case as long as you make sure the element isn't in the far top left.
Based on the other replies here it seems like it's to differentiate taps vs mouse clicks, keyboard events in js don't have a screenX or screenY property (you can run this in your browser console on this HN post to confirm):
Type in or click on the reply text input and you'll see that the coords array is undefined for all keyboard events. I haven't tried this equivalent on a touch device however, so not sure how it's handled there.This does not work when the mouse is actually at 0,0
An imperfect solution for a situation that will practically never happen and have no noticeable downside.
It'll screw up UI tests
Clearly not, otherwise this bug would have been found much sooner.
Some people are pretty negative about the code presented here (which the author criticised in a second post[0]), which makes me wonder, as someone who thankfully hasn't programmed anything non-trivial in Javascript in years: Is checking for (0,0) actually bad code, or is it simply making the most of a bad API?
Bear in mind the following:
> Also, thank you to Patrick H. Lauke from TetraLogical (editor of the Pointer Events Level 2 spec) for his comment on Mastodon that suggested improving the offending code by checking for pointerType in the PointerEvent interface instead of screenX and screenY.
The code provided in the second post[0] uses pointerType if available, else falls back to checking for (0,0). Also:
> At the time this code was originally written four years ago, not all browsers treated click events as PointerEvents. They used the MouseEvent interface, so pointerId or pointerType wasn't an option.
[0]: https://www.joshtumath.uk/posts/2024-11-18-how-i-refactored-...
Am I missing something? Why are things highlighted in blue? And why is " on Chrome and Firefox, the screenX and screenY properties were negative numbers" repeated so often? I had to go back to reread to make sure I didn't somehow skip back a few lines without noticing it.
Don't check screenX/screenY. If `event.type == 'click'` then it's a mouse click.
Most people applaud the tech skills needed to reproduce this bug (I do too). Some people shiver at the Mount Everest of clumsy JavaScript code needed to provide a usable web experience.
Can you believe that every app has a team of people who just maintain the app's code?
> Some people shiver at the Mount Everest of clumsy JavaScript code needed to provide a usable web experience.
You don't need any javascript to provide a usable web experience. In fact, you are more likely to break usability with it.
In theory yes, in practice no. Which one matters?
That flickering Ace Attorney GIF was extremely annoying.
[dead]
[flagged]
I would say many people who browse Hacker News given the fact that, at the time of writing, this is the 4th ranked article. Moreover, who cares which company the person writing this post happened to be working for when discovering this interesting browser edge case?