Hey,
As people keep hijacking my carefully crafted guide for installing Parallels, I thought I'd set up this thread to discuss the merits and drawbacks of each of the VM solutions for Mac which allow running Windows. I really only have experience with Parallels, having thought it would give the best performance and integration with my Mac, but, as in many things, I may be wrong.
Please, anyone who has had experience with each of the options, have your say. Even better if you've used each solution. Talk about the advantages and drawbacks, if you will. Also, how it works with VoiceOver. I know there are a lot against Parallels for its, quite frankly, stupid installation process for us, which is entirely valid. I just assumed the juice was worth the squeeze... Maybe it's not?
Comments
Openness and accessibility
I haven't yet installed any copy of Windows on any of my M-series Macs, but from those 3 options only one feels appealing to me, and that is UTM, not because it works flawlessly, but because it's open so I can make it work flawlessly.
Earlier today I decided to research the most effective way to make the installation of the ARM version of Windows 11 on macOS as straightforward and accessible as possible, so I started by reading out on how to inject VirtIO drivers into the Windows installer to address the missing audio problem during the installation, then I went to the GitHub Crystal ISO Fetch project repository in order to investigate the best approach to adapt the Windows instructions from the first link to a tool used to create Windows installation media on macOS, followed by reading the script actually responsible for building the ISO file, and finally I realized that nothing of this is actually necessary because UTM already integrates with AppleEvents so the installation can theoretically already be automated with AppleScript, Automator, and Shortcuts.
While as I mentioned in theory the above completely eliminates the need to inject drivers into the windows installation media, timing scripts to work around the lack of output processing is still far too clunky for my taste so I continued digging deeper and came across Apple's VisionKit framework, which can in theory be used to OCR the contents displayed by the virtual machine in real time and get the bounding boxes of all the text displayed by the guest system. This, combined with the aforementioned UTM scriptability, makes it both possible and feasible to implement an event-based system that can be scripted to locate, expect, and react to the visual output of the virtual machine, in addition to generate native accessibility elements to make the guest perfectly accessible with VoiceOver from the host.
If I succeed in implementing my ideas, UTM will be taken to the next level and will be on a league of its own in terms of automation and accessibility, and virtualization can even become our best option to tackle inaccessible operating systems, the only thing I need is time to work on personal projects again so I can commit to this goal. This isn't even the only accessibility project that I have in mind involving on-device computer vision, but my success with this one will dictate whether I'll consider tackling the rest.