I don't intend to make a comprehensive guide of this just now; I'm knackered, and anyway there's probably strength in having someone else on here replicate my results and document it properly. I just want you to know it's possible and I've done it: I now have fast and responsive remote screen sharing with VoiceOver feedback.
What you need:
A VPN. You have to somehow reach the TCP port used by Screen Sharing and the UDP port used by SonoBus. You must arrange to be able to reach the controlled computer from the controlling computer, somehow, over an encrypted channel. Point-to-point UDP highly recommended, i.e. Zerotier, or OpenVPN UDP bridged or routed to the target network. Of course, if you're doing this all on your local LAN, there's no harm in not using a VPN, probably, if your network is trusted fully by you. If you ignore this advice and try to communicate directly over the 'net then you had better hope nobody's listening and there are no vulns in Screen Sharing or SonoBus. Not worth it, IMO.
SonoBus. Fast, low-latency audio streaming. Mono 16 kbps Opus is absolutely fine for TTS and system sound effects. Get that from here.
Blackhole audio driver (two-channels) from Existential Audio. Get it from here. Yes, they want your name and email address. It's not my fault; I don't like it either!
Use Screen Sharing (open it from Spotlight) on the client to connect your computers for remote access. The server is in System Preferences, Sharing. You don't need to allow anyone to request to control your screen, neither do you need VNC support. Simply grant the user that will log in access, or all users. In the Screen Sharing app, connect to the IP of the server. Turn off VoiceOver on the client (which you must do to use VoiceOver keys on the server remotely) by switching first to observe mode (Command+Option+X), turning off VoiceOver, going back to control mode (same keystroke) again, and then pressing VO keys and using VoiceOver on the server. Presently, of course, the audio goes via the server's audio hardware. We'll soon fix that. If you Quit Screen Sharing with the window still open, it will reconnect when you relaunch it, helpfully.
On the server, install Blackhole. There's no app, just an installer package. Run it. If you have to jump through hoops to approve the driver, do it. I don't use SIP so I didn't feel a thing, but you might. Reboot, for good measure. If speech goes silent during installation, keep calm, wait a few seconds, and restart VoiceOver if it hasn't come back; the script restarts CoreAudio which might disrupt audio temporarily but really it's fine.
Use Audio Midi Setup to create a "multi-output" device, that comprises both the system output device and Blackhole 2ch virtual device. Set the master device to your system audio and also use it as clock source. Turn on drift correction on Blackhole, but not your system audio hardware. Basically you're creating a virtual device that repeats its output and also provides a synchronised stream for use as input by other apps. When you're quite ready, set the multi-out as your output device. I do hope you got everything right.
Assuming you can still hear your system audio, give your new setup a test. Open QuickTime Player, and make a recording of your Blackhole 2ch input, during which you make your system generate some system output--effects, VoiceOver speech, and so on. Listen back to your recording. Do you hear it? Splendiferous.
Install SonoBus on both systems. The server begins listening the moment it launches, which means that if you fix the UDP port number in the preferences you can reliably reach it by making it launch at login and using the same IP address and port in the client connect to direct raw address dialog, or using the same group configuration which can be made to automatically start on the client side as well. Configure it much to your liking as you fancy, but crucially, the server listens to Blackhole for input, and mutes its output or sends it to nowhere, while your client sends its output to its system output. You can mute the server audio system-wide when you're away from the server if you don't want people to hear you while you remote-control it. On the client side, just connect, and mute your microphone audio or disable sending to save bandwidth. Use send and receive options to set up your preferred codec, bitrate, sample rate, and latency sensitivity--in my opinion, always using the lowest settings at all times and tolerating the odd drop-out is eminently preferable to automatic latency negotiation. For TTS you simply don't need high quality at all and 16 kbps makes Daniel Compact sound perfectly frugal when compared to Daniel Enhanced coming through the compression. Honestly you'll be fine.
Last but not least, get your VPN configured such that the network your server is on is reachable by the client. I'm sorry but how you do that is up to you, maybe some other time I can document my preference of Zerotier but even if your router has OpenVPN support in it, it will help you.
And that's it. Simply make sure SonoBus is always running on the server, and connecting to your machine at any time is:
Connect to the VPN, if needed.
Launch SonoBus, either automatically or manually connecting. I always prefer to specify the IP manually, but if automatic negotiation is working for you, that's one less step. Do check you're connected over your VPN, though. The address is in receive options.
And then start Screen Sharing. Switch to observe, turn off VO, switch back to control, and begin working. You do not have access to your client during this time until you repeat the steps to get VO back on and stop controlling your server, either temporarily or permanently by quitting Screen Sharing.
I hope you now have enough information to try this for yourself. It's been fun getting this going and I couldn't be more thrilled, really. I hope it helps you too. It changes the equation entirely for running a credible Mac server, which is the need I have, because for better or worse remote control is really necessary for that use. It's obviously not going to help with every case, particularly typical tech support situations, without a bit of up front arrangement, but it's definitely a start if you can help someone get it going. Have fun!