Hi guys,
As many of you know, I am the developer of PiccyBot, which has been received well by the community as a flexible tool to describe videos and images.
As a side project, I started to work on Voistant, an AI voice assistant which should in principle give you complete verbal control over your PC (Windows only for now).
It is still very much in beta, and I am in need of feedback about it. Is this really something that would be useful? There are quite some costs involved in the AI agent calls, the screen captures, etc. so even though it is free to use for now, it will have to be a paid service later on. I need your thoughts about whether it would be worth it. Or what improvements would you need to make it worth getting a subscription for.
The website https://voistant.com has a link to download the current version. Note that you may have to approve the download as it is not an officially signed application as of yet.
Looking forward to your feedback! Thanks.
Martijn van der Spek
Comments
my thoughts
Hi Martijn,
I just installed your Voistant app. I managed to register, but it didn't seem to hear any of my commands at all. I wasn't given any instructions about how to configure my microphone, so I assume my microphone was working by default? I don't know if it was or wasn't. Usually my microphone just works when I use it for Google Meet or Microsoft Teams, but even if it doesn't work for whatever reason, I can go and configure it. I opened a webpage, and told Voistant to solve a Captcha, but it didn't hear anything I said. When I tried speaking after the beep, it kept saying, 'something went wrong'. I much prefer being able to type to things instead of speaking anyway, but that's a personal preference thing maybe. Also, when it reads the commands or instructions, I'd love it to use my screen reader voice, and not Sapi 5. But anyway, it doesn't pick up my microphone at all. I'd love to try it out but can't. I should point out that after registering, it gave me a brief description of my screen. It told me there were desktop icons, but not what they were, but I didn't question it further because I wanted to see if it could solve a Captcha. I'm on Windows 11, the latest version. I much prefer what Guide AI Assistant for Windows is doing, yes it's only text input for the moment, but I like being able to type much more than speaking to things. I don't know if you've seen that, but if your assistant could provide a way to type and speak, and a way to configure the microphone, that could be a promising competitor. Check out Guide.
https://www.guideinteraction.com/
That's the sort of thing I'd use and do use.
A solution like this has to do much more
It can't be just click here and there. I would be only interested in a solution like this if it had capability to perform complex actions.
For example, navigate to Amazon, find me highest rated lavender perfume under $50, purchase and use my default payment method and address.
Guide AI can do some of this
Hi,
Yes, Guide AI can do some of this, it bought something from Amazon when I was signed in. It clicked on the correct product and the 'buy' button. But at the moment I can't test this Voistant app since it doesn't even pick up my microphone, or hear when I speak to it.
Tara: Not recognising speech..
I am not sure where the issue is with your microphone Tara. Voistant will default to keyboard input if no microphone is found, so it does find it but somehow can't use it. Maybe an app privacy setting. Will check it!
Thanks for trying it out!
SeasongKing, these type of commands should already work with Voistant. I tested it yesterday by booking a movie for example. But it is definitely not without glitches and it is slow. I need as much feedback as possible for actual use cases so I can improve it.
logging back in
Hi Martijn,
I tried to log back in again, but it said my email address wasn't recognised. Was I supposed to get a confirmation email? I didn't receive one, and I checked my junk folder too. But when I tried registering again, it just logged me in, with my settings I'd saved like the interrupt key as the CTRL key. The things I can type in the terminal are 1 or 2 for registering and logging in. There is no edit field visible, I can just type, either 1 or 2, but no edit field. After logging in or registering, the microphone finally works, and it's just now heard me. I just asked it to describe my screen, and it described this page. I tried asking it to describe a Captcha, to tell me the letters or numbers, but it said it couldn't solve Captchas or something. I'd rather have the option of switching between some sort of voice mode and text mode. For example, I want to be able to navigate through my commands and the assistant's answers without it reading everything out in the SAPI 5 voice. If I try to type a question in the terminal window like, 'describe my screen,' I don't know if anything is being typed or not, and when I press Enter nothing happens. NVDA doesn't announce letters like it usually does, and there just seems to be a blank Window, not an edit field like NVDA would usually announce. If you haven't tested this with NVDA, I'd strongly recommend you do so, because then you'll know what I'm talking about.
Thanks.
Question
I have a question. Is this a matter of computing power, or is it specifically an issue with the mobile platform? Iβm only wondering why you didnβt start with mobile first, since you already have a big fan base here, and then later transition to computer. It seems like most people would be using these tools on their phone rather than on a computer, but I could be totally wrong about that. Iβm really wondering why we donβt have these products available yet for the iPhone.
these types of apps are useful
Hi Winter Roses,
I use Guide AI Assistant on Windows, primarily to solve Captchas, and I've used it in the past to get passed inaccessible cookie banners, and to describe screens, and help me navigate more difficult websites. So yes, this sort of thing is definitely useful to have on a computer. In fact, I keep Guide around now in case I come up against something inaccessible like a website or app. But this Voistant app can't seem to solve Captchas at the moment. The idea of these types of apps is to use your mouse to click on things that aren't accessible with a screen reader, like buttons and so on. For a better idea of this sort of thing, see my thread about the Guide AI Assistant I created a few months back on here.
https://www.applevis.com/forum/windows/guide-ai-assistant-people-who-are-blind-or-low-vision
As for why this sort of thing isn't available for an iPhone, Apple is funny about letting apps control your phone and sharing screens etc.. I think Android is probably more open to this sort of thing.
And also for Windows, there's the Viewpoint assistant, an app that brings inaccessible buttons and links into focus so you can click on them with your keyboard.
https://viewpoint.nibblenerds.com/
Going to test
But like Tara mentioned, I'd prefer a way to input text rather than speak to it all the time, for example, in say, an office setting.
Could be usefyl
I think any user agent that can do things efficiently could be useful. As long as it has way to also type ut instructions as well as speech. Iβd pay for a subscription if it worked well.
Updated with choice of input
There is a new update available, Voistant will now ask you if you prefer microphone or keyboard input. If you don't have a microphone installed, it will use keyboard by default.
As has been said, developing this for mobile is a different kettle of fish. Apple doesn't allow reading of screens and controlling other apps, and even Google will be hard. I am working on a Mac version of Voistant though, that is looking positive.
can not login again
hi! can not login again and i receive error!
Login
Mahmood, can you register again? I made some adjustments in the latest update.
re: login
after i enter otp i got error from lib connection line 198
When can we expect a Mac version to be available
Hi, when can we expect a Mac version to be available? I'm a Mac user, and I would like to test it once it becomes available.
Mac version now available
JC, the initial beta of the Mac version of Voistant is now available. You can download it from Voistant.com.
Note that you have to set quite a few permissions to get the app to work. A text file with all settings as well as a spoken message should guide you how to do this.
I need feedback on Voistant. It's still early days but I really feel this can be a very useful tool. Please let me know how you do!
my experience with voistent so far
Hi,
Here's my experience with voistent so far. I was able to do basic task such as sending a message to a recipient that has already been added to my contacts in the native messages app, delete the message, go to the recently deleted section and delete the message right from there. All without touching or interacting with the keyboard. I especially like the beep that you hear before the command is performed. I really enjoy it so far. In future, you guys can add a pro option similar to how Google Gemini and ChatGPT has both free and pro tears. For example, in ChatGPT, the basic model allows you to ask basic questions and do basic stuff for free, while the pro version offers more advanced capabilities such as faster processing and faster response time with their updated models. So I guess you guys can probably do the same thing, have the free plan for users who want to do basic task like sending a message, deleting a message, sending an email, deleting an email, and basic tasks that you could do on the computer such as opening applications and stuff, while the pro plan offers advanced features such as further model updates, and other related advanced features.However, like me, I'd prefer to have things free, but sometimes, that doesn't always happen. as you have said, there are some costs associated with the backend AI model and processing, so it's going to take some time before users can access the features and capabilities without actually paying for a pro tear. Another option that I have thought of is to add a donation in the voistent menu, so that if anyone wants to donate an extra tip to help with the resources and finance of the AI model that you guys are using to keep the server running that would be awesome! For now, I like it so far.