[New Add-on] Vision Assistant Pro: Your Interactive AI Copilot for NVDA (Powered by Gemini)

By mahmood, 2 December, 2025

Forum

Windows

Hello AppleVis community,

I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.

It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.

🌟 Key Features:

👁️ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."
- Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
- Full Screen Vision: Scan the entire screen layout.
- The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
🧠 Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.
🎙️ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.
🔓 CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.
📄 Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.

🛠️ Requirement: You need a free Google Gemini API Key to run this add-on.

📥 Download & Installation: You can download the add-on directly from GitHub:

Download Vision Assistant Pro v3.1.0 (Direct Link)

Just open the downloaded file and confirm the installation in NVDA.

🔗 Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro

I developed this to help our community become more independent. I would love to hear your feedback and suggestions.

Best regards, Mahmood

Options

Comments

Getting it to work

So I just installed the addon from the addons store, pasted in the API key and nothing seems to happen. When I tried to perform OCR on a PDF NVDA just says not connected.
I tried creating a new project in Google AI studio, same result.
Any idea on how to get it working?

@Nut

not connected? There is no such message in the plugin! Please check again.

@Stefan

Hello. Bulgarian has been added to the new version, 2.6. Enjoy.

Need help getting API key

I feel like I missed the first week of A.I. class and showed up on quiz day. When I go to Google Studio and hit the get API key button, it goes to an empty list of keys for imported projects. When I then hit the create API key, a modal pops up requiring a name and imported project--a combo box that's empty, with the "create" button disabled. Tried in Chrome and Firefox. Is this expected, and I have to create a project, or are browser extensions getting in the way, or what?
Thanks!

translate

It translates one sentence at a time and I have to press the key combination every time, which is not convenient. Make it like the nvda translator to turn on and off, instead of pressing a key combination for each sentence.

Re: Translate

Have to agree; real-trime translation of this quality is just what the doctor ordered. Question is, is it feasible?

@mantanini

Hello. That's not the case! Unless you are translating in the browser! To do this, place the text in the clipboard and press 'y' instead of 't'.

Congrats, good project!

I would recommend not implementing automatic updates or any update system directly in the add-on, as this is done automatically by the NVDA add-on store, and users can enable or disable automatic updates there.

API key missing

Thanks for this most indispensable add-on, i just used the OCR to describe my full screen and got a very nice description, i tried it again a minute later to get a full screen description. it says API key missing. what's up with that? is there a quota limitation of how much you use the add-on or am i missing something. and by the way i am using the very latest version of this add-on

suggestions

Hi Mahmood! I would like to congratulate you on the excellent add-on. Keep up the good work! I have some suggestions that would make things much easier: the possibility of having a screen to place multiple prompts and then being able to switch with a shortcut key. Sometimes, I want to know what's in a photo, then something in a game, and it would be useful to already have these prompts predefined by the user and just keep switching. Another suggestion would be an option so that when it showed the result, it would just speak, without showing a dialogue or window. It's quite useful to know information quickly without having to switch windows. Thank you!

please read

Dear Users,

Please download and install the latest build of the add-on from the project's GitHub page, which is linked in the original post. It appears that a significant portion of users are still running outdated versions.

Due to time constraints, my ability to visit this thread frequently to respond to comments is limited. For any issues or suggestions, please raise them directly in the Issues section of the project's GitHub repository.

Best Regards,

Mahmood

Post updated

Hello everyone. The download link has been updated to version 3.1.0. Enjoy.

Errore 400 in trascrizione audio di .mp3

Salve e complimenti per questo fantastico Add-On. Segnalo che, quando effettuo una trascrizione audio di un file .mp3, viene notificato l'errore 400, con richiesta di controllare la propria chiave API, che ovviamente è corretta. Ho provato a cambiare l'estensione del file audio ad esempio in .wav e la trascrizione funziona, dunque ritengo dipenda dal modo in cui l'estensione .mp3 viene vista dalla chiave Api, spero possiate risolvere, grazie mille.

@Maurizio

Hello, yes, this issue exists and will be fixed soon in version 3.5.0.

Complimenti enormi e richiesta su descrizione video

Intanto grazie per aver risolto brillantemente l'errore 400 sui files .mp3. Ho inoltre trovato davvero meravigliosa e fantastica la funzione per descrivere i video da Url di YouTube ed Instagram. Sarebbe super se si potesse fare anche per X ed altre piattaforme ed anche, se si potesse fare con i video in locale riprodotti sul proprio PC, so che ci potrebbero essere problemi di Privacy, personalmente, se richiesto, sono disposto a concedere tutte le autorizzazioni necessarie, sarebbe davvero meraviglioso. Infiniti e sinceri complimenti e grazie di cuore!!

@Maurizio

Thank you so much for your kind words and feedback, Maurizio! I'm glad to hear the MP3 fix and video descriptions are working well for you.

Regarding your suggestions, I'll be adding support for X (Twitter) very soon. As for local video support, I am currently thinking about it and evaluating the best way to implement it.

Regarding the privacy concerns you mentioned, I want to reassure you that since this addon is open source, everything is transparent and there is no need to worry at all. Your trust and support mean a lot to me—thank you again!

Description Video TikTok

Hello again! I will never stop thanking you for this amazing Addon! Furthermore, I hope he is well, if I am not mistaken his area is Iran and I know what is happening, I sincerely hope that he is as good as possible and that he is safe.
I would kindly ask if it is possible to implement video descriptions also for the TikTok platform, it would be truly incredible, given the efficiency I found for both YouTube, X and Instagram, although on the latter, I found some Download errors, in practice it seemed not to recognize some Urls. But, in any case, it is a wonderful function, can it also be implemented for TikTok? Thank you very much and all the best, with great respect and sincerity!