[Proof of Concept]: Vosh - a third-party screen-reader for the Macintosh

By João Santos, 13 November, 2023

Forum

App Development and Programming

While I'm no longer interested in actively working on this, the Element library target in the current code is quite robust and may be of use to others at least as an example of how to safely interact with the accessibility infrastructure on macOS. Some features made available by the accessibility infrastructure are not wrapped by my code yet, and one of the things that I definitely want to do is reverse-engineer the custom rotors since there's no public API to consume those. If and once I manage to reverse-engineer the custom rotors, I will update this post again with a video of the latest code in action.

The source code of this project can be found on my GitHub repository, and at the time of this edit, is fully conformant with macOS 26 and Swift 6.2 with its complete structured concurrency model.

If I ever manage to find enough time, one of the things that I'd like to do just for fun at some point in the future, is to try training a custom machine learning model to deal with all accessibility quirks and even provide proper content recognition in totally inaccessible applications, though this is not a promise since I have a lot on my plate right now.

Below is the main paragraph from my original post to this thread before this edit today, 2026-01-12:

After getting fed up with the general neglect of MacOS accessibility from Apple, and having wanted to work on something meaningful for quite some time, I decided to attempt something that for some reason nobody seems to have tried to do before: write a completely new screen-reader for that platform. This isn't an easy task, not only due to the amount of work required to even get close to matching a mature screen-reader in terms of functionality, but also because Apple's documentation for more obscure system services is nigh on non-existent. Despite that, and since I've already overcome a lot of hurdles that I thought to be show stoppers, after a single week of work I already have something to show in a very embryonic stage of development. The idea is to gage the interest of the community in a project similar to NVDA for the Mac to be worked on in the coming years and to which other people can contribute.

Options

Comments

Re: Highlighting

If you mean when you expand a selection using the text caret, yes, it can be read.

If you mean when you move the screen-reader cursor to the element, there are several pieces of optional information that can be present and can be read too.

If you mean reading the element under the mouse cursor, yes, there's a way to do that too.

The project is currently stalled though, because I've been trying to reverse engineer VoiceOver to extract undocumented and undeclared notification identifiers and figure out how to make Vosh notice changes to the frontmost window without polling the system-wide element. Also the video linked to on the original post no longer reflects the state of the project.

Use by person with only one hand

My friend who can only use his right hand, recently purchased a Mac, but was unaware that sticky keys on the Mac will not work with voiceover because it only allows one key to be held down and of course, voiceover requires several keys most of the time. Will your screen reader be easier to use for my friend? Will it work with sticky keys? Will it make use of any shortcuts for voice input? Will it also embellish or encourage you to use trackpad gestures? Just wondering if this is a viable alternative. Any notions about timing? Thank you much and very glad this work is happening.

Abandoned

I've abandoned this project.

Originally I wanted to build something that could showcase my ability to develop for Apple technologies, with the intent of landing a job as a native iOS developer. However since nobody replies to my applications, I have switched to a much more interesting yet niche area and language: embedded / bare metal development in Rust. Unfortunately while I have a lot of free time on my hands, I really do need a job, so currently that's the highest priority in my life.

The project's repository will remain public just in case someone wishes to fork or learn something from it, but I'm unlikely to pick it up again in the near future.

I believe it's possible to use VoiceOver with just one hand, thanks to QuickNav, Single Key QuickNav, and VO locking. To enable QuickNav, press the Left and Right Arrows simultaneously by default; to enable Single Key QuickNav, press VO+Q by default; to enable VO locking, press VO+Semicolon. Unfortunately if VO locking is designed for single handed users, the position of the keys at least on US laptop keyboards makes it very inaccessible since there's no Control key anywhere near the Semicolon. It can also be activated from the VoiceOver Help Menu (VO+H) -> General -> Toggle the VO Modifier Lock On or Off, and might be callable from Keyboard Commander possibly using AppleScript so you might be able to map it to a more accessible key combination (I didn't test this).

I wish you

best of luck in your future dev projects. This was quite an ambitious thing and I even happier that you had the humility at the right time to abandon it in a smoother way for everyone.

Work in Progress: Vosh - A New Screen-Reader for The Macintosh

Hi I am Ayub and I am Visually Impaired or legally blind. I would like to participate to make this project better. Are there any conferences that I can attend so that I can advocate for what needs to be done in this project? Thank you and keep up the hard work!. I know this is going to be a long project but when the first beta version comes in can you leave a comment on this post?

That's fine. Everyone's next…

That's fine. Everyone's next computer should be an arm based windows machine anyway.

Concur

Preach it, Brother Ollie! 😇

Re: Use by person with only one hand

There are a couple of things your friend could do to make VoiceOver easier with one hand. Firstly, the NumPad commander is great - I rarely have to use VO modifiers as most things can be done there with one hand. You can use the numpad 0 as a modifier which means you get a lot of keys you can use.
Obviously it requires a numpad to use. You can buy bluetooth numpads I believe. I would be doing that if I was using the laptop keyboard.

The other thing which I don't use is single key navigation. (Pre-Sonoma this was part of quick nav but I believe it isn't now).

Interesting

Hopefully someone will pick up the production of this screen reader, regardless of the status of it. I like the concept, if people were motivated enough to Finesh it, it could be comperable to VoiceOver.

I'm working on VOSH, and the results are somewhat promising

Hi all,
Recent web-breaking voiceover bugs, and yet another generic cookie-cutter response from APple accessibility finally pushed me over the edge. I've been spending a lot of time in XCode recently, building a few apps in Swift. I've decided, finally, to take a look at VOSH and see if I can pick up where @João Santos left off.

This is not an easy undertaking. Apple's documentation is worse than appauling. Most of the system services we need to make this work are undocumented. João made a great start, and after a few days of development work I've made some real progress. I'm a long way off NVDA, but I've got a much better virtual buffer on the web which works similarly to NVDA's 'browse mode', and actually makes browsing in Safari quite a nice experience. It also works tremendously well in terminal, a lot better than voiceover, with NVDA-style review modes that work well.

I don't make promises I can't keep, so I am not making any promises in terms of what i can deliver or how far this effort will go. But I'm tired of dealing with a less than stellar experience on MacOS, and I really don't want to go back to Windows. Apple clearly isn't interested in delivering the kind of experience on the Mac we should expect, nor do they show any signs of making Voiceover open source. So if I can find enough workarounds and reverse engineer enough of Apple's undocumented system services to make this happen, I will.

I don't claim to be an expert in Swift. If there are any Swift programmers out there who want to collaborate on this, please get in touch. If we could get a few of us working on this, maybe we could crack it. BUt if not, I'll keep plugging away whenever i have time and we'll see where we end up. Initial results are, at least, somewhat promising.

Wishing you all the best

I truly hope other developers join in on this effort. I would absolutely love to see a 3rd-party screen reader become a possibility for macOS. In fact, that would actually entice me to pick up a new mat computer. Right now, I would not dare touch another Mac computer with the way VoiceOver is currently. Again I'm wishing you all the best in this endeavor.

Very cool

It's very good to hear that. I am a Mac user and I do not like that Apple is not giving us any more work to VoiceOver. I feel that VO has been a big mess during these years.

I'm not familiar with Swift, though. But I took computer science a couple of years and know a little bit about Python, but not Swift. It would be cool if we could learn Swift because I would be a good developer for this project.

Video Demonstration

Hey there,

Could you please make a video demo that way our users know the status of this project?

Would be very appreciated.

Thank you and keep it up.

Is it possible to contribute?

Is it possible to contribute to this project?

If so how?

Ayub

Something Similar od iOS

Is it possible to expand this project to iOS as well?
Would be very cool to do that.

We don't need it on ios and it's impossible

And it's literally impossible to get it on ios even if 3rd party appstore become worldwide, as far as I understand it.
I get that ios have bugs, but compared to macos bugs these are child bugs IMO.
And thank you for the insane efforts @everyone.

Some remarks

For starters, I've learned a great deal about Apple's accessibility infrastructure since I abandoned Vosh almost two years ago, and I am aware of memory safety issues resulting from Apple's bad API design which enable race conditions in concurrent code, as that code was abused to the limit in my previous job and exposed some problems that I ended up working around since I can't really fix the Apple implementations that I depend on everyone else's computers. I can provide the same kind of fix to Vosh if needed, just need to know exactly where I should make the related pull requests.

There's also one feature that I did not support in my safe abstraction which are custom rotors since there's no public API for that so I'll definitely have to do some reverse-engineering to figure out how that feature works at the lowest levels and how to interact with it so that I can provide yet another safe abstraction. The work I did at my former job aimed towards leveraging the accessibility and audio infrastructures to deliver a highly integrated AI agent on macOS, so only a subset of features was really required, and custom rotors were definitely not in that feature subset.

As for iOS support, that is unfortunately not possible right now even here in the European Union with the Digital Markets Act cracking that platform open. I could probably file a complaint about that to get Apple to provide third-party developers with the ability to interact with the accessibility infrastructure from the consumer-side, but won't do that since I don't think there's a need right now, I don't plan on working on that any time soon if ever, and I don't engage in or trigger frivolous legal battles.

As for the code, my version of it which hasn't been change for nearly 2 years and which I have since archived, is still available on my GitHub profile. I can probably make a video demo of it, but since it has been forked and improved by another developer, I don't think it would be very useful, so I'll probably wait to learn about where the main repository is now and then maybe make a demo with the improvements already in place.

Finally I honestly no longer approve of my own accessibility API design, not because it's flawed or anything, but I have since figured out much better ways to abstract away Apple's lower level APIs taking advantage of modern Swift features like result builders, property wrappers, and observers, and providing a new Swift package with a better designed API is a personal project with a low priority in my pipeline. I also want to build a proper machine learning solution based on my own analytic geometry ideas that can then be trained to provide a very robust and highly optimized optical character recognizer running on the GPU for macOS and iOS / iPadOS, which is a project with a very high priority for me that I will tackle immediately after finishing the cloud software infrastructure for my upcoming business which I will also take advantage of for personal stuff, and could definitely be integrated in any third-party screen-reader to augment the experience and attempt to provide accessibility even in totally inaccessible applications.

I'm interested

I've worked on swift for a bit, I don't claime to be an expert but I've been programming for about 4 years in general not just swift and I could learn most things given time, I actually started a topic a while ago because I wanted to also make a screenreader, for god sake I just want something that works on web and doesn't decide to jump focus or cut off some of the text, I also want something that works with arabic typing and doesn't make it impossible thanks to the cursor. But I felt like most people weren't interested and I got kinda discurraged, plus VO was a bit more useable back then.
however to be completely transparent I have absolutely 0 idea how this works but I'm willing to put in the time to understand. I'm sick of wanting to switch to windows for basic things like typing or web browsing.
I think we all agree macos 26.1 just ruined everything now, VO feels much slower, not just the links lag bug but also some things feel less responsive on macos 26.2 beta, though it's a beta so who knows.

Not realistic

Developing a screen reader for macOS is rather not realistic because the whole accessibility infrastructure seems to be messed up, unreliable, and in some cases, almost unusable for 3rd party developers.
Many problems that occur with VoiceOVer come from bad design decisions in the underyling Accessibility API and bad implementations in Cocoa (NSTextView) and other places.
Also, the web is even more complex as the rendering engine creates its own accessibility tree that you'd have to "translate" into native UI elements or something else that implements the Accessibility protocol.
If you look into WebKit, for example, and their accessibility object wrapper, it's a pure mess to make functionality like text navigation, reading by line, character, or word, possible at all. You might overcome this with massive effort of trial and error and a lot of digging into the depth of Mach traffic that is exchanged to get details that the documentation is not telling you (And this is alot!), but as the accessibility API is that poor at the moment, I'd say that this would be waste of time.

Bit of confusion

Many problems that occur with VoiceOVer come from bad design decisions in the underyling Accessibility API and bad implementations in Cocoa (NSTextView) and other places.

I think that most of Cocoa is fine, it's just NSTextView that is kind of a show stopper because it's simply not providing proper accessibility information. The infrastructure is also fine for the most part, it's just the consumer-side Carbon API that is extremely poorly designed and implemented, especially when it comes to concurrency, but this can be worked around or even reimplemented, and there's a huge lack of documentation and even identifier declarations in the public APIs, which in some cases can easily be learned by simply dumping accessibility trees whereas in other cases hardcore reverse-engineering is required to figure out how things are actually implemented at the lowest levels. There are also some system services that are bad accessibility citizens by announcing that they do implement accessibility but never really responding to accessibility queries, requiring keeping track of their bundle identifiers in order to avoid ever interacting with them.

Also, the web is even more complex as the rendering engine creates its own accessibility tree that you'd have to "translate" into native UI elements or something else that implements the Accessibility protocol.

If you look into WebKit, for example, and their accessibility object wrapper, it's a pure mess to make functionality like text navigation, reading by line, character, or word, possible at all. You might overcome this with massive effort of trial and error and a lot of digging into the depth of Mach traffic that is exchanged to get details that the documentation is not telling you (And this is alot!), but as the accessibility API is that poor at the moment, I'd say that this would be waste of time.

Those are provider-side problems. From the consumer-side, which is where screen-readers and other accessibility services operate, both WebKit and Blink provide properly abstracted accessibility trees not too different from the ones provided by Cocoa. The fundamental problem when it comes to web navigation is that VoiceOver itself is quite broken, and fixing it wouldn't even be that hard. The problems with caret browsing are real and may stem from a deeper issue related to how attributed strings are communicated to the accessibility infrastructure that I haven't investigated yet, but my empirical observation, which I can elaborate further on if necessary, tells me that caret browsing is implemented in VoiceOver itself, not WebKit, meaning that it can be reimplemented properly without messing with WebKit.

My take on this is that alone it is indeed a very ambitious project, but with a community it can be tackled just like in NVDA, so if providing people with the safe abstractions is all that's needed for other people to engage, I can dedicate some time to that effort. However back in 2023 only one person engaged with the project to any capacity, and my life has changed a lot since those days, when I had too much time on my hands and very little to do, as these days it's the exact opposite, so I can no longer dedicate all my time to this kind of niche project.

@João Santos

Thank you for your input, and for your efforts in providing VOSH in the first place. I wouldn't have attempted this at all if it weren't for you.

You're a better programmer than I'll ever be. To be honest, I'm just seeing if I can make it work. The abstraction is the bit I'm struggling with. If you think you can improve VOSH, or improve low-level access to the APIs we need, then please do. I haven't published my repo anywhere yet as frankly my code is currently a shameful, buggy, hacked together mess, just to see what is possible.

But most of what I've done is just built on top of your existing code so it any changes you make in your existing VOSH repo can easily be implemented. I also love the idea of your ML OCR tool, and that could be massive for live recognician features. How much work would it take for you to turn what you've learned since VOSH from knowledge to Swift, and develop the swift package you mention into something usable? Is that realistically something you could prioritise or dedicate time to at the moment? Believe me, I know launching a business is never easy!

Once I've cleaned the code up a bit (so I'm less ashamed of a real programmer seeing it lol) I will publish a private repo for you to see what I've done.

Still think this is a great idea

I’m all for this. We need more third-party screen readers. We need alternatives. Doesn’t matter if it takes a long time. It matters that it gets done. Keep trying keep working through this. I’m also willing to donate financially if it can be useful.

@Ayub

Are you a developper looking to contribute, or are you asking about contributions in other ways (testing etc)?

Thanks all

Thanks all for the additional contributions thus far. Again, no promises, but we'll see wehre this goes. To address a few comments:

1) There will be no iOS version. VO is generally great on iOS, and a third-party iOS screenreader is pretty impossible due to the ecosystem. Unless Apple allows it, which they won't.

2) No video demos at the moment, this is early, buggy, concept stuff. Maybe when it's a bit more refined, well behaved, and the core features work as they're supposed to.

3) It may well be a waste of time, but we've got to try something. Sitting back and hoping Apple might start to care about the Mac hasn't worked for us for years. We still have bugs in VO that are 15+ years old. For a company with Apple's resources, who claims to care so deeply about accessibility, it is inexcusable. The way I see it, as a community, we have 3 choices. We all hold apple acountable, but that requires mass effort, and actions rather than words. 2) we vote with our money, and leave the Mac platform. That's an option, and a very tempting one I'll admit. But putting accessibility aside, there's so much to love about the Mac.

Or, we build something that works, and we do it with or without Apple's help. We've got incredible developers in this community, like João, who not only have the knowledge but have also already put a lot of time and effort in to put things in place to make this possible. We can do it, but no one single person can do it alone.

4) I'm grateful for those in the comments and a couple of private eMails offering financial contributions. I'm lucky in that I'm having a quieter month, so am able to put some real time into this at the moment. My goal is to make this a free, open-source project like NVDA, but there will be a huge amount of time involved if we, as a community, can make this work. I will certainly need João's expertise, and if this starts to really go anywhere there may be things, or people, who need funding. Again, this is early concept stuff at the moment, just exploring what's possible. But thank you.

@João Santos

And other devs. The repo is at [link removed]. Having been accused (wrongly) of vibecoding "most if not all" of this code, I have temporarily pulled the public repo.

Review of my own code

After reviewing my own code, my conclusion is that it's not as bad as I thought. I mean the lower level abstractions are just as bad as I remember them since that's the module that I based my professional work on, but I did implement some abstractions on top of that that even two years later I still consider properly designed.

The Swift package has 5 modules, with the Vosh module being the executable target that serves as the front-end to everything else, the Access module being a library target with the actual screen-reader which orchestrates everything, the Input module being a library target handling input devices (currently only keyboards and keypads), the Output module being a library target handling semantic audio output (currently only speech), and the Element module being a library target providing a safe abstraction directly on top of Apple's Carbon accessibility API which is what I was considering redesigning. In the Access module, there's a type called AccessReader which provides semantic reading functionality. This module delegates its implementation to AccessGenericReader and its subclasses that implement the reading strategies for all kinds of element types. These strategies are far from complete but are designed to be extensible, so all that's needed to add more strategies is to create new subclasses and customize the reading strategy for a specific element role by overriding the methods corresponding to the semantic aspects that need customization.

I will implement the workarounds for the known bugs in Apple's Carbon accessibility API, and fix a bug in my own accessibility actor that is likely making its dedicated thread exit unexpectedly when no accessibility observers are alive, which will be two separate contributions in their respective commits that I will then suggest as a pull request to the repository above. I still intend to figure out how to consume custom rotors at some point, and will look into that as well, but unless I end up finding out that the task is not very daunting, I will postpone that to a later stage depending on community engagement. Lastly I will review and test the modified code on the other repository before deciding on whether I should synchronize my main branch, and maybe publish some audio or video with a demo, where understanding what I'm saying should be a lot easier than in the original video since working for Americans forced me to speak English a lot more frequently so my diction and pronunciation have improved significantly since 2023.

I Can Contribute to Testing It

Hello Ashley,

I can contribute to testing this project. If we have a link, of course.

@João Santos

This is great. My code is currently broken and won't start at all, so focus on yours for now. It was working, but I think I may have tried to implement too much, too soon. If I can't figure out why mine won't start, I may wait for you to make the updates to your code, revert the repo and rebuilt bit-by-bit what I've already done.

I cant code but

If need sounds for screenreader sounds; let me know. I can at least help with that.

@João Santos

I have reverted my code back to a fresh clone of your repo, and started re-implementing features. I was a bit too enthusiastic in my approach and broke things so badly that it would take longer to fix than to start from scratch, and now that I really know how things work I can make a better go of it. The code in my repo is now working and I have started to re-implement things a bit at a time.

Willing to contribute some financial help

I am definitely willing to help financially. No guarantees right now. But give me a path and I’m more than happy to see if I can throw in some cash to help this along. Maybe not the amount it actually needs. But something.

Strong words against vibe coding

I started updating code yesterday, first I checked Ashley's code, which has over 11000 lines of difference from mine, and then decided to go back to my own version of the code, and update the entire codebase to macOS 26 and Swift 6.2 with full strict concurrency checking along with addressing the problems that I was already aware of, which is why this is taking a while to finish.

Beyond the sheer amount of changes made by Ashley, I do have issues with most if not all of the changes themselves, some of which introducing unsafe AI slop regressions, as even the README file was not spared from a drunken robot rewrite complete with the removal of my safety warnings. AI slop is already bad in itself because current models have a tendency to not really understand and definitely not produce properly designed abstractions, on top of introducing very subtle bugs that require a lot of reviewer skill to spot even in relatively confined updates, but is especially reckless in this case if we consider the security implications resulting from the privileges required to run a screen-reader that takes full control over the system. Therefore if his changes were suggested in a pull request to my repository, I would have no doubt about completely discarding them as utter trash.

While I have no issue with inexperienced software engineers and have been actively sharing knowledge with other people for well over 2 decades, which is almost as long as I've been coding myself, I strongly oppose any kind of vibe coding, because current AI models are not designed to learn from their mistakes, and because even if they did, that would still not translate to other humans actually learning anything. Therefore to me the resources spent producing this kind of code, and the time spent reviewing it, are totally worthless and thus vibe coding is a concept that I will never accept even in prototyping regardless of how productive generating code that way may appear. I got people fired from the engineering team at my former job because of their tendency to vibe code, as a demonstration of how strongly I oppose vibe coding and how seriously I take my stance in this regard.

Having noticed the above, I have grown especially concerned about the safety of established open-source screen-readers like NVDA and Orca, because while technical skill was a somewhat effective barrier to entry before ChatGPT entered the playing field in 2022, the fact that anyone can pretend to write code these days means that there's likely already a huge amount of bad quality AI slop running in critical production services with potentially huge security implications down the line, so I cannot stress my repudiation of vibe coding hard enough.

I don't have your knowledge but

Vibe coding in a screen reader sounds very, very dangerous to me, or any other critical app where a mistake can have ripple effects.
I have tried to patch fenrir cli screen readerto run on mac (some drivers / tty interception issues ) and it was a horrible disaster and 3hrs wasted and nothing new learned except that ai can do some things very well but not others, and certainly not everything.

@João Santos

I did use AI to tidy up a few things and lay some groundwork in the original version of the repo, which I realise was a mistake and have reverted the repo. Rest assured I am not a vibe coder generally. I do use anti gravity for expedience as gemini 3 pro has fixed a few of the issues that you mention. I had used it to lay a lot of unnecessaary feature groundwork in the original version of my repo (the one you would have reviewed) but as you rightly say it was the wrong decision as it introduced a lot of mess, hence reverting. AI can be useful for some things, especially when you are making small changes, and I will admit to using it for documentation quite often. However for this project is not something that can be vibecoded, nor should it be. I am no swift expert, so do use AI for help along the way, but I'm not going to atempt to vibecode a screenreader, that would be ridiculous and impractical.

@TheBlindGuy07

AI doesn't do porting / patching well at all, primarily (I think) because even models like Gemini don't have enough of a context window to understand a full codebase in its entirety and you need a relatively low level understanding to make things work. Not ot mention the need to understand any libraries being used, and APIs provided by that specific OS.

Testing

If this will work with Intel based max and gets off the ground, I would love to contribute to testing it.

Finally pushed an update

I finally got around to finish and push some of the changes that I wanted to make to this code, namely bring it to par with macOS 26 and Swift 6.2, as well as fix some known bugs in the custom executor that I use to always interact with Apple's broken accessibility API from a dedicated thread in a highly ergonomic and safe Swift style, which are now available in my repository. However I did notice that some of the heuristics that I implemented originally are no longer working properly so that needs fixing, which I may or may not tackle using a custom machine learning model trained to recognize user interfaces together with whatever semantic information applications already make available, and I still want to try reverse-engineering the custom rotors since Apple does not provide any API that I can use to access them from the consumer side of the accessibility infrastructure.

As for Intel Macs, while I don't think that there's anything preventing this code from running on them since I'm not using anything that would definitely lock my code to a specific architecture, I sold my last Intel Mac in 2021, and since macOS 26 is the last version supporting those Macs, I don't think that I will even consider testing any future code on legacy hardware.

Beta Testing

Hello all,

Thank you for your continued work on this project. I really appreciate it and can't wait to see this roll out.
In the meantime, is there a beta testing opportunity? If so, how can we contribute? Would appreciate any responses.

Thank you so much for your continuation on this project.

Thank you,
Ayub

Code is in the repository

I posted a link to my repository in my last comment as well as in my edit to the original post. It's quite hard to miss, and I am including it here again just in case. As I also said in my last edit to the original post, as well as on the repository README file that is presented whenever someone actually follows the link to the project on GitHub, I am no longer interested in working on this, and will only make sporadic contributions every once in a while, which is why I also changed the thread title from Work in Progress to Proof of Concept.

The code works, the lower level wrappers are very safe and stable, people can even make screen-readers, AI agents, and all kinds of automators on top of that code, but unlike two years ago I have more important things to work on now so can no longer dedicate almost any time to this. I do want to try building and training a custom machine learning model to weed out all the accessibility quirks and smooth out the user experience as a personal research project on top of that code, and am currently attempting to reverse-engineer the custom rotors just for the sake of creating and documenting an API to access that functionality, but am no longer committed to or even interested in taking this project seriously.

Pulled an all-nighter on this

I pulled an all-nighter scripting the debugger and analyzing how VoiceOver is actually implemented and found a few interesting things.

The first interesting thing that I found is that there are a few possibly useful functions not declared in any of the C headers that ship with the macOS SDK. One of these functions is called AXUIElementGetData, it seems to take 3 arguments, with the first predictably and confirmed to be a pointer to an AXIUIElement instance, the second seems to be a double-pointer that is probably intended to return the requested data and in my single observation seem to be returning another pointer to an AXUIElement instance, and the third seems to be a pointer to an NSIndexPath instance which is an Objective-C Foundation type. I did not try to trace this call beyond its return site, but I did discover that it's being called from a seemingly private and secret framework called CoreAccessibility whose functions don't even ship with named symbols.

I also discovered some functions with interesting prefixes named after the existing public API like AXMIGCopyAttributeValue that may or may not indicate that Apple is or has already migrated to a different yet private accessibility API, possibly the one in the mysterious CoreAccessibility framework. Tracing the function calls made by VoiceOver, while I performed some actions like requesting the rotor menu in Safari, revealed very little usage of the public API including to perform basic functionality like read accessibility information, and this theory of a switch to a new API is consistent with my observations two years ago when I started working on Vosh, since back then I tried a different approach by sniffing the inter-process communications that VoiceOver was performing over Mach Ports, as I was yet to find a strategy to debug the screen-reader while depending on it to actually use the computer, and couldn't make sense of most of the data that was being exchanged.

Honestly I'm feeling quite curious now, because I'm highly motivated by challenging technical problems and this is turning out to be quite a deep rabbit hole, but unfortunately I need to restrain myself since I have other responsibilities to attend. However I do want to make a whole library providing high level safe and ergonomic Swift abstractions for audio, video, input, and accessibility on macOS, that people can rely on to make AI agents deeply integrated with the system, so I'll definitely resume investigating this in the future.

I'm also willing to share knowledge with anyone who demonstrates interest in this kind of research. It does require a lot of experience, not only in the engineering component of the Apple ecosystem, but also in low level development on ARM, advanced debugging skills, and some experience with Python to script the debugger, but can be quite gratifying to anyone with a true passion for technology in general and software engineering in particular. Therefore if anyone is interested, feel free to message me and I'll do my best to provide off-hands guidance and assistance.

Machine Vibes

And other devs. The repo is at [link removed]. Having been accused (wrongly) of vibecoding "most if not all" of this code, I have temporarily pulled the public repo.

Just noticed this edit now, and since I don't like to be accused of lying, and fortunately the deleted branch was still yet to be garbage collected by git on my Mac, I just recovered it exactly the way it was when I last pulled it to analyze locally, and pushed it to my Vosh repository under an appropriately named branch so that anyone interested can read it and judge by themselves.

jps@alpha Vosh % git status
On branch machine-vibes
nothing to commit, working tree clean
jps@alpha Vosh % git log --oneline -1
00c7c26 Improve focus, input, and output handling logic
jps@alpha Vosh % git merge-base --fork-point main | pbcopy
jps@alpha Vosh % git diff --shortstat `pbpaste`
 65 files changed, 10995 insertions(+), 899 deletions(-)
jps@alpha Vosh % git push --set-upstream origin @
Enumerating objects: 310, done.
Counting objects: 100% (310/310), done.
Delta compression using up to 16 threads
Compressing objects: 100% (107/107), done.
Writing objects: 100% (275/275), 181.12 KiB | 1.37 MiB/s, done.
Total 275 (delta 166), reused 275 (delta 166), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (166/166), completed with 10 local objects.
remote: 
remote: Create a pull request for 'machine-vibes' on GitHub by visiting:
remote:      https://github.com/FriduxTech/Vosh/pull/new/machine-vibes
remote: 
To github.com:FriduxTech/Vosh.git
 * [new branch]      HEAD -> machine-vibes
branch 'machine-vibes' set up to track 'origin/machine-vibes'.