Exploring the Fragmentation of Wayland, an xdotool adventure


In 2007, I was spending a my summer experimeting with UI automation. In June 207, I started xdotool by splitting it from another project, called navmacro at the time. The goal was modest - write some scripts that execute common keyboard, mouse, and window management tasks.

The first commit had only a few basic commands - basic mouse and keyboard actions, plus a few window management actions like movement, focus, and searching. Xdotool sprouted new features as time rolled on. Today, the project is 18 years old, and still going!

Time’s forward progress also brought external changes: Wayland came along hoping to replace X11, and later Ubuntu tried to take a bite by launching Mir. Noise about Wayland, both good and bad, floated around for years before major distros began shipping it. It wasn’t until 2016 when Fedora became the first distribution to ship it at all and even then, it was only for GNOME. It would be another five years before Fedora shipped KDE support for Wayland. Ubuntu defaulted to Wayland it in 2017, almost a decade after Wayland began, but switched back to X11 on the next release because screen sharing and remote desktop weren’t available.

Screen sharing and remote desktop. Features that have existed for decades on other systems, that we all knew would be needed? They weren’t available and distros were shipping a default Wayland experience without them. It was a long time before you could join a Zoom call and share your screen. Awkward.

All this to say, Wayland has been a long, bumpy road.

Back to xdotool: xdotool relies on a few X11 features that have existed since before I even started using Linux:

All of the above is old, stable, and well supported.

Wayland comes along and eliminates everything xdotool can do. Some of that elimination is given excuses that it is “for security” with little found to acknowledge what is being elided and why. It’s fine, I guess… but we ended up with linux distros shipping without significant features that, have, over time, been somewhat addressed. For example, I can share my screen on video calls, now.

So what happened to all of the features elided in the name of security?

Fragmentation is what happened. Buckle up. Almost 10 years into Fedora’s first Wayland release and those elided features are still missing or have multiple implementation proposals with very few maintainers agreeing on what to support. I miss EWMH.

Do you want to send keystrokes and mouse actions?

Outside of Wayland, Linux has uinput which allows a program to create virtual keyboard and mouse devices and send actions using those, but it requires root. Further, a keyboard device sends key codes, not symbols, which makes for another layer of difficulty. In order to send the key symbol ‘a’ we need to know what keycode (or key sequence) sends that symbol. In order to do that, you need the keyboard mapping. There are several ways to do this, and it’s not clear which one (Wayland’s wl_keyboard, X11’s XkbGetMap via Xwayland, or XDG RemoteDesktop’s ConnectToEIS which allows you to fetch the keyboard mapping with libei but will cause the confusing Remote Desktop access prompt).

Window management is also quite weird. Wayland appears to have no built-in protocol for one program (like xdotool) asking a window to do anything - be moved, resized, maximized, or closed.

I’m not alone in finding it very slow on the path to Wayland. Synergy only delivered experimental support for Wayland a year ago, 8 years after Fedora first defaulted to Wayland.

The fragmentation is upsetting for me, as the xdotool maintainer, because I want to make this work but am at a loss for how to proceed with all the fragmentation. I don’t mind what the protocol is, but I sure would love to have any protocol that does what I need. Is it worth it to continue?