Stop Waiting for AI to Tell You What to See. Start Exploring It Yourself.

By Stephen, 12 November, 2025

Forum

Assistive Technology

I'm about to show you something that breaks every rule about how vision AI is "supposed" to work.
And when I say breaks the rules, I mean completely flips the whole thing upside down.

Here's What's Wrong With Every Vision AI App You've Ever Used
You point your camera.
You wait.
The AI speaks: "It's a living room with a couch and a table."
Cool story. But where's the couch? What color? How close? What's on it? What about that corner over there? That thing on the wall?
Want to know? Point again. Wait again. Ask again.
The AI decides what you need to know. You're stuck listening to whatever it decides to tell you. You don't get to choose. You don't get to dig deeper. You don't get to explore.
You're just a passenger.
So I built something that does the exact opposite.

What If Photos Were Like Video Games Instead of Books?
Forget books. Think video games.
In a game, you don't wait for someone to describe the room. You walk around and look at stuff yourself. You check the corners. You examine objects. You go back to things that interest you. You control what you explore and when.
That's what I built. But for photos. And real-world spaces.
You're not listening to descriptions anymore.
You're exploring them.

Photo Explorer: Touch. Discover. Control.
Here's how it works:
Upload any photo. The AI instantly maps every single object in it.
Now drag your finger across your phone screen.
Wherever you touch? That's what the AI describes. Right there. Instantly.
Let's Get Real:
You upload a photo from your beach vacation.
Touch the top of the screen:
"Bright blue sky with wispy white clouds, crystal clear, no storms visible"
Drag down to the middle:
"Turquoise ocean water with small waves rolling in, foam visible at wave crests, extends to horizon"
Touch the left side:
"Sandy beach, light tan color with visible footprints, a few shells scattered about"
What's that on the right? Touch there:
"Red beach umbrella, slightly tilted, casting dark shadow on sand beneath it"
Wait, what's under the umbrella? Touch that spot:
"Blue and white striped beach chair, appears unoccupied, small cooler beside it"
Go back to those shells - drag your finger back to the beach:
"Sandy beach, light tan color with visible footprints, a few shells scattered..."
See what just happened?
The information didn't vanish. You went back. You explored what YOU wanted. You took your time. You discovered that cooler the AI might never have mentioned on its own.
You're not being told about the photo. You're exploring it.
And here's the kicker: users are spending minutes exploring single photos. Going back to corners. Discovering tiny details. Building complete mental maps.
That's not an accessibility feature. That's an exploration engine.

Live Camera Explorer: Now Touch the Actual World Around You
Okay, that's cool for photos.
But what if you could do that with the real world? Right now? As you're standing there?
Point your camera at any space. The AI analyzes everything in real-time and maps it to your screen.
Drag your finger - the AI tells you what's under your finger:
• Touch left: "Wooden door, 7 feet on your left, slightly open"
• Drag center: "Clear path ahead, hardwood floor, 12 feet visible"
• Touch right: "Bookshelf against wall, 5 feet right, packed with books"
• Bottom of screen: "Coffee table directly ahead, 3 feet, watch your shins"
The world is now touchable.
Real Scenario: Shopping Mall
You're at a busy mall. Noise everywhere. People walking past. You need to find the restroom and you're not sure which direction to go.
Old way? Ask someone, hope they give good directions, try to remember everything they said.
New way?
Point your camera down the hallway. Give it a few seconds.
Now drag your finger around:
• Touch left: "Store entrance on left, 15 feet, bright lights, appears to be clothing store"
• Drag center: "Wide corridor ahead, tiled floor, people walking, 30 feet visible"
• Touch right: "Information kiosk, 10 feet right, tall digital directory screen"
• Drag up: "Restroom sign, 25 feet ahead on right, blue symbol visible"
You just learned the entire hallway layout in 20 seconds.
Need to remember where that restroom was? Just touch that spot again. The map's still there.
Walk forward 20 feet, confused about where to go next? Point again. Get a new map. Drag your finger around.
But Wait - It Gets Wilder
Object Tracking:
Double-tap any object. The AI locks onto it and tracks it for you.
"Tracked: Restroom entrance. 25 feet straight ahead on right side."
Walk forward. The AI updates:
"Tracked restroom now 12 feet ahead on right."
Lost it? Double-tap again:
"Tracked restroom: About 8 steps ahead. Turn right in 4 steps. Group of people between you - stay left to avoid."
Zoom Into Anything:
Tracking that information kiosk? Swipe left.
BOOM. You're now exploring what's ON the kiosk.
• Touch top: "Mall directory map, large touchscreen, showing floor layout"
• Drag center: "Store listings, alphabetical order, bright white text on blue background"
• Touch bottom: "You are here marker, red dot with arrow, pointing to current location level 2 near food court"
Swipe right to zoom back out. You're back to the full hallway view.
Read Any Text
Swipe up - the AI switches to text mode and maps every readable thing.
Now drag your finger:
• Touch here: "Restrooms. Arrow pointing right."
• Drag down: "Food Court level 3. Arrow pointing up."
• Touch lower: "Store hours: Monday to Saturday 10 AM to 9 PM, Sunday 11 AM to 6 PM"
Every sign. Every label. Every directory. Touchable. Explorable.
Scene Summary On Demand
Lost? Overwhelmed? Three-finger tap anywhere.
"Shopping mall corridor. Stores on both sides, restroom 25 feet ahead right, information kiosk 10 feet right, people walking in both directions. 18 objects detected."
Instant orientation. Anytime you need it.
Watch Mode (This One's Wild)
Two-finger double-tap.
The AI switches to Watch Mode and starts narrating live actions in real-time:
"Person approaching from left" "Child running ahead toward fountain" "Security guard walking past on right" "Someone exiting store carrying shopping bags"
It's like having someone describe what's happening around you, continuously, as it happens.

The Fundamental Difference
Every other app: AI decides → Describes → Done → Repeat
This app: You explore → Information stays → Go back anytime → You control everything
It's not an improvement.
It's a completely different paradigm.

You're Not a Listener Anymore. You're an Explorer.
Most apps make you passive.
This app makes you active.
• You decide what to explore
• You decide how long to spend there
• You discover what matters to you
• You can go back and check anything again
The AI isn't deciding what's important. You are.
The information doesn't disappear. It stays there.
You're not being helped. You're exploring.
That's what accessibility should actually mean.

Oh Right, There's More
Because sometimes you just need quick answers:
Voice Control: Just speak - "What am I holding?" "Read this." "What color is this shirt?"
Book Reader: Scan pages, explore line-by-line, premium AI voices, auto-saves your spot
Document Reader: Fill forms, read PDFs, accessible field navigation

Why a Web App? Because Speed Matters.
App stores = submit → wait 2 weeks → maybe approved → users update manually → some stuck on old version for months.
Web app = fix bugs in hours. Ship features instantly. Everyone updated immediately.
Plus it works on literally every smartphone:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Anything with a browser ✓
Install in 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Done. It's an app now.

The Price (Let's Be Direct)
30-day free trial. Everything unlocked. No credit card.
After that: $9.99 CAD/month
Why? Because the AI costs me money every single time you use it. Plus I'm paying for servers. I'm one person building this.
I priced it to keep it affordable while keeping it running and improving.

Safety Warning (Important)
AI makes mistakes.
This is NOT a replacement for your cane, guide dog, or mobility training.
It's supplementary information. Not primary navigation.
Never make safety decisions based solely on what the AI says.

The Real Point of This Whole Thing
For years, every vision AI app has said:
"We'll tell you what you're looking at."
I'm saying something different:
"Explore what you're looking at yourself."
Not one description - touchable objects you can explore for as long as you want.
Not one explanation - a persistent map you can reference anytime.
Not being told - discovering for yourself.
Information that persists. Exploration you control. Discovery on your terms.

People are spending 10-15 minutes exploring single photos.
Going back to corners. Finding hidden details. Building complete mental pictures.
That's not accessibility.
That's exploration.
That's discovery.
That's control.
And I think that's what we should have been building all along.
You can try out the app here:
http://visionaiassistant.com

Options

Comments

@ Brian

I’ll take a look at that in a second here. Thanks for letting me know. I’m just adding the ability right now for users to choose which voice they want and then I’ll take a quick look at that. You can also now send me a direct message through the app 😊.

Platform-specific messages

I do know this is a universal web app and you posted about it under the non-Apple category, but AppleVis is still an Apple-focused forum and besides, all the other popular sign-in options like Microsoft, Google and Facebook are all available, while Apple is not. So this just attracted my attention. It's likely the most convenient option on Apple devices as you can sign in without having to enter your account credentials and all you have to do is do biometric authentication and that's all.
Even if the app is primarily intended to work on mobile platforms even though it's a web app, not everyone uses Talkback on Android, unlike iOS or iPadOS where you only have Apple's built-in screen reader (VoiceOver). So even if you have designed the app to work on mobile platforms and specifically refer to mobile operating systems in a certain message, you should probably replace Talkback with something like "Talkback or any other Android screen reader", as there are at least a couple of them out there (Jieshuo/Commentary and Prudence).

@ Brian

I just pushed an update… Your bug should be fixed. Hopefully I squash that little bugger lol. Also, you should now be able to choose what voice you use if you’re using Alex on your device it should automatically detect that and respond in that voice. For android same thing.

@Enes Deniz

@Enes Deniz With all respect, could you please focus on the app instead of nit-picking?
Yes, AppleVis is still an Apple-focused forum, but this is the so-called non-Apple forum, and the dev has also indicated that we're dealing with the first beta. What's the point of repeating the point about Apple log-ins over and over again?

my feedback

Hello guys!
First of all, I’d like to congratulate Stephen for the great app.

When I saw the post yesterday, I thought — probably because of all those glasses that got released but didn’t deliver what they promised — that it would be the same with this app.
It looked like it had lots of features, but I figured it would just be all talk and nothing practical.

I got excited when I saw the alpha release today! So, great job for shutting me up! Haha.

Now, on to the feedback.
I’m using a Galaxy S23 running Android 16.

Bugs:
The bug where you have to turn on the camera to use Live AI, as reported by another user here, also happens to me.
When I activate Live AI, the voice cuts off halfway and I can’t hear the description. The same happens in question mode.
It seems like the microphone keeps turning on and off constantly.
If I find more bugs, I’ll update you all.

Suggestions:
Instead of mentioning the names of screen readers like VoiceOver and TalkBack, why not use something more generic? That would work for Android, iOS, and even PCs if needed.
Something like: “Please turn off your screen reader before using the app.” I don’t remember the exact text, but it’s something along those lines.
It would also be nice to have buttons to increase and decrease speech speed. I know we have a slider, but at least on Android, having to double-tap, hold, and drag isn’t very precise.
Support for multiple languages — I’d love to have the descriptions in Portuguese.

Now, two questions:
I tried taking a photo of my room, but it seems like the objects are shown in the wrong places in the picture.
What’s the best camera position? For example, when I enter my room, my bed is on the left.

Another question:
I tried taking a picture of my dog, but double-tapping on him didn’t do anything.
Do I need to activate something to zoom in and analyze it better?

Thanks in advance, and keep up the good work!

Meta smart glasses synchronization

Why do we need a head mount to sync the Meta smart glasses with this web application?

TIA

@ Diego

Thank you so much for the kind words. As for your bug issues, I’m working on them as we speak. As for the photo layout I’m going to change how it presents the information to you. Also working on that now. This is why I had released it an alpha. How it’s presenting to you now when you go from top to bottom is essentially what’s furthest away from you and what’s closest to you. That will be changing here in a moment and hopefully it will be easier to understand what you’re feeling. As of now, the zoom in features only work when you upload your photo but in about five minutes that will be added to when you take a photo of your room as an example. I am hoping it will be easier… Myself personally I liked how it was presenting before, but it’s not for just me. It’s for all of us so I’m gonna try to make it as convenient as possible. I may even put two options in there where you can choose which layout you want.

Layout choices

Not to add to your workload, but I personally enjoy the layout of the closer to the bottom of the screen, the closer to you the item in the picture is.

@Stephen

I've suggested so many things even without being able to sign in and you never reacted to any of them like that. I do know that's the first beta but it's because I've only signed in to my Apple account on my iPhone that I can't sign in and test the app at all. Okay? Besides, I would say the main problem was your response in which you first tried to justify the lack of an option to sign in with an apple account rather than the lack of the option, but now I must say the main problem is your attitude regardless of what I suggest. I already think you will soon begin to charge a subscription fee after some time and I will probably not be able to use the app anyway so that's all from me. I will only continue to use the app as long as it's free, and won't suggest anything else.

@ Brian

Don’t sweat it :). I just changed the layout. Let me know what you think. If it doesn’t work for you, I’ll see what I can do about keeping both options so that way you can choose what type of layout you want. Also there is a little surprise button the Home Screen. I’m still dabbling with the feature so it might be a little bit broken or buggy.

@Enes Deniz

@Enes Deniz First, please get your facts in order. Stephen didn't address your nit-picking - I did. So don't direct your anger at him. In fact, he remained silent and didn't find the nit-picking worthy of a reply. Second, yes, this service cannot remain free as he mentioned in his first message. What's wrong with that? Do you want all of that for free? Not doable. Sadly this attitude towards developers is something that may get under their skin eventually.

@Amir Soleimani

No, it's not unfair that the app will likely become paid at some point. It's me who will probably be unable to use it. Likewise, it's not unacceptable or anything that Apple sign-in is currently unavailable; it's unacceptable that the dev evade my remarks on that and attempt to justify not adding it instead of telling me something like "I'm working on it.", "It's on my to-do list." or even "Hey, seems like I just skipped that one. Thanks for pointing it out!". But now that I've got this response from him, only until or unless it becomes paid will I continue to use the app, and that is if the dev considers adding Apple sign-in, but it is because of this incident that I will stop using the app if it becomes paid, even if I can somehow pay the fee, not because I would be unable to pay a subscription fee anyway.
PS: Speaking of facts, the dev won't have to pay anything to Apple or Google to have his app published on the Apple App Store or Google Play.

@Enes Deniz

But, @Enes Deniz, the AI stuff behind the app isn't free. And how can it be? Is PiccyBot free with all of its features? Just see how JAWS 2026 makes a difference between Home/ Home Pro/ Pro users when it comes to new AI-oriented features. Yes, who doesn't like free apps? But it's a fact no matter how saddening it might be, and I haven't even considered the time and effort he's devoting to it. As for Apple sign-in, what can I add other than the fact that he's said he's working on it? You're dealing with an alpha or beta web app, and he could have excluded, say, Google sign-in if he had wanted to, depending on his priorities. I'm not in a position to provide advice, but this attitude will get us nowhere.

@ @Enes Deniz

Write now continue with apple sign in isn’t supported but you can still sign up using your apple email.

the fact this will work on anything

this is awesome, i use android and one of the things that is frustrating is these really cool apps only to find out moast are IOS only and i have to wait for a long time for an android version if they come at all

got a link for this?

don't know if i just mist it but i didn't find a link to this

can someone share it?

thanks

@ Joshua

Hello Joshua. My goal is to make this app universally accessible across both iOS and android. You can visit the app here and you can also save it to your home screen to use it like a native app.
http://visionaiassistant.com

thanks

thanks for the link

Stephen Re: Layout

So I gave my living room another go, using the newer layout. Honestly, I think I would be satisfied with whichever layout you decide to go with. They both give enough information, distance, and details of items in and around the cameras point of view.
I had a bit of enjoyment with this earlier, after scanning my room with the new layout, I zoomed in on my coffee table, then focus again on items on the coffee table, more specifically a water bottle, that was roughly about half full of water, and a television remote. Now, it could not tell me the label of the water bottle, I am thinking because the label was likely facing away from me. However, it did a fairly good Job Describing the buttons that it could see from the initial picture of the room aesthetics. This application has become quite impressive.
Kudos on creating such an intuitive and enjoyable interface.

@ Brian

Thanks so much. I’m super glad you’re enjoying it. I’m looking into developing social media type features where you can share your taken photos or your uploaded photos with your friends, so they can search through them as well this way you can actually share the tactile photos.

Social media feature implemented

Now you guys can feel through each other‘s photos and shared tactile photos with each other.

I have a suggestion

Thank you Stephen for this amazing app.
I tried the app and it's really really good at what it does and it's making looking through photos fun.

I have a suggestion. I know how expensive building a service that relies on AI can be because of the pricing of the models.
Can you allow users to put their own API keys for using the models?

People can either choose from a list of models or even connect a custom model that they're maybe locally hosting to process images.

A lot of tools that allow local model hosting use the OpenAI API protocol so hopefully there won't be issues connecting the app to them providing that the model supports images of course.

Making the users able to put their own API keys for state of the art models or connect the app to their locally hosted LLMs will gear some users toward that option, thus cutting down on the costs you have to pay to keep the app going.
Second, if they decide to use their local LLMs, they'll be assured that their data won't leave their devices, so it's better for their privacy.

I know that the quality of local LLMs isn't as good as well known state of the art models, but I guess if someone is hosting one, they know about that downside already.
Plus, it doesn't hurt to give the users the option.

If it can be made that users can put their own API keys without creating an account, that would be better too.
Personally speaking, I wouldn't mind paying for the feature if you decide to implement it.

I would love to use an app like this but with my locally hosted models.

@quanchi

Thank you so much both for the feedback and the suggestion. Right now, the web app architecture presents some significant technical challenges - mainly around secure key management, CORS restrictions for localhost connections, and ensuring the consistent reliability that blind users depend on across different model configurations. Since web browsers block remote websites from connecting to localhost services for security reasons, supporting local LLMs would require users to install additional proxy software, which adds complexity that could compromise the accessibility and ease-of-use this app is built around. I an however keeping this on my radar for future exploration. Thank you again for the excellent suggestion!

try this out

I should probably rename this button, but go outside and do the explore your room feature. Zoom in to houses, cars, trees let me know if you were able to capture a photo of a bird and zoom into it through our social media spot on the app! Search for me at Stephen_Lovely. Would love to feel what you’ve captured!!!

Congratulations on the app!

The language is a problem.

When will you have support for other languages? Brazilian Portuguese?
Offering other languages will make the app more universal.

@ Geofilho Ferreira Moraes

It’s next on my to-do list. I’ll be working on it throughout the day and over the weekend. My goal is to have it done by the weekend, but this is gonna take a little bit of time to make sure it doesn’t conflict with the app. I’ll announce it here when it’s officially rolled out. I’ve been building it in the back end.

Please note

I’m trying right now to add multiple languages, but it is proving to be more challenging than originally anticipated. I will be focusing on the language translation before pushing another update. Please continue to let me know if something‘s broken and then I can add it to the next big update but this language thing is gonna take a bit.

Language support is here

Ok so the app should now support nine languages including Portuguese, which I’ve had a couple recommendations for. Let me know if it works for you guys 😊.

Turn-around

This has got to be the fastest turn-around on bug resolution, and feature requests, that I have ever experienced with an in-development application. Now, if only other developers worked like this... 😊

How about languages like Persian?

Thanks, Stephen, for adding multiple languages. But any chance of expanding on those?

In app screanreader

Hey guys, so no longer do you have to turn on and off your own device screen reader to access explorer mode. The entire app now has its own built-in screen reader. I’m still working on trying to get the Alex voice for you guys, but it seems to be giving me a bit of a trouble which is why that’s not there yet but for some reason, I can get all of the voices that don’t really matter in that suck lol. But at least now it should be more user-friendly and you can just keep your own device screen reader off while in the app. Maps feature is a little bit broken and the social media features are in the process of being fixed. Thank you so much Brian. I definitely appreciate it. As for other languages, right now I can only do the languages that your device supports, but I will work on trying to get more. I just can’t guarantee when or if that’s possible. I also still have to implement dark mode too. Don’t worry y’all I have the whole weekend to work on this.

Next up

Working on it for desktop users as well. I have some cool things in mind for y’all.

Desktop integration complete.

You guys should be able to use a computer with a mouse or if you have a touchscreen computer, you should be able to use the app with that as well.

Seems not to work on my iPhone

Hi Stephen,
great idea.
I registered successfully to find just one option: turn on the built in screenreader.
After clicking on this the following message appeared:
404
Page Not Found
The page "gesturetutorial" could not be found in this application.
Go Home

What can I do to use your service?
Thanks and all the best
Jürgen

@ Jürgen

Sorry about that I’m fixing it now.

@ Jürgen Bookmark

Should be working now for you. Sorry I broke things when tinkering with other things lol.

languages?

What are the supported languages? Are Spanish, French and Bulgarian among them? Can you add Bulgarian if it is not among them and if possible because I am learning Bulgarian and I think it would be useful.
I am thinking of visiting this country soon.
Thank!

It's awesome.

I love all the features you described, it's awesome. I'll definitely be waiting for your web app.

@ mantanini

Hello mantanini. You can check out all the languages we support in the settings portion of your app at:
http://visionaiassistant.com

@ Suriyan

It’s out in alpha/beta mode so it’s free for now. Maybe I should make a second entry for folks now that it’s live? All you need to do is go here:
http://visionaiassistant.com

@Stephen

Oh, thanks for your reply. I was just reading through the other comments and realized you've already made it available for testing.

I registered successfully to find just one option: turn on the built in screenreader.
After clicking on this the following message appeared:
404
Page Not Found
The page "gesturetutorial" could not be found in this application.

I just tried it a while ago.

@ Suriyan

It’s still doing that? I’m on it now.

@ Suriyan

“Should“b e fixed now. Let me know if it is still giving you a hard time :).

The accessible web and bugs.

HI, I tried it on my laptop and it didn't work very wel.

I tried the subreddit r/shitamericanssay and youtube.com and both gave me 404 errors

For the Sub it told me what the sub was but that's as far as it went, for youtube it just told me that there was a 404 error and that it doesn't exist on this ap.

@ Brad

Hey Brad thanks for letting me know. I will look into this…I know a lot of sites don’t allow iframe like YouTube for examples so that could be one of the reasons. When you say it doesn’t work on your computer very well is that the only thing you mean? the more details I have the better :).

@Stephen

I'm now logged in. The system itself is impressive to me.
One additional request is that the web app be available in Thai. I think blind Thais would really appreciate it if it were available.

@ Suriyan

Yayyyy 👏🏼 big bug squashed! I can’t make any promises on that one, but I’m already looking into it. The language seems to be giving me a bit of difficulty. You can always dm me thru the app now that your in if you have issues :).

Sounds for explore modes

So you now have spatial audio paired with voice feedback when exploring photos and navigating a room. New options for this feature are available in your settings. Photo exploration inside social media apps is currently broken. This is next on the fix list along with the new web browser feature. The built-in screen reader is acting up, although it does not interfere with your own device’s screen reader. Photo exploration and zoom modes still function with system feedback. Turn off your on-device screen reader for now when exploring the grid so you can double tap and zoom into each element.

In your settings you will also find a control that changes room explorer distance descriptions from feet to meters. This should make navigation easier regardless of where you live. Dark mode has been added for low-vision users as well.

@Stephen

While using the web app's social media page, I noticed that under the post content, where the like and comment buttons are, there are two unlabeled buttons. Without trying to click them, it's impossible to tell what they do. When opened on a computer, the buttons appear as "Unlabeled 3 Button" and "Unlabeled 4 Button."
When opened on an iPhone, the text only appears as "Button."
While this isn't a major issue, it could impact the credibility of the web app.
However, for me, this isn't a major issue, but I wanted to let you know.

That's all I can really give you.

All I know is reddit pages and youtube don't seam to work.