Stop Waiting for AI to Tell You What to See. Start Exploring It Yourself.

By Stephen, 12 November, 2025

Forum

Assistive Technology

I'm about to show you something that breaks every rule about how vision AI is "supposed" to work.
And when I say breaks the rules, I mean completely flips the whole thing upside down.

Here's What's Wrong With Every Vision AI App You've Ever Used
You point your camera.
You wait.
The AI speaks: "It's a living room with a couch and a table."
Cool story. But where's the couch? What color? How close? What's on it? What about that corner over there? That thing on the wall?
Want to know? Point again. Wait again. Ask again.
The AI decides what you need to know. You're stuck listening to whatever it decides to tell you. You don't get to choose. You don't get to dig deeper. You don't get to explore.
You're just a passenger.
So I built something that does the exact opposite.

What If Photos Were Like Video Games Instead of Books?
Forget books. Think video games.
In a game, you don't wait for someone to describe the room. You walk around and look at stuff yourself. You check the corners. You examine objects. You go back to things that interest you. You control what you explore and when.
That's what I built. But for photos. And real-world spaces.
You're not listening to descriptions anymore.
You're exploring them.

Photo Explorer: Touch. Discover. Control.
Here's how it works:
Upload any photo. The AI instantly maps every single object in it.
Now drag your finger across your phone screen.
Wherever you touch? That's what the AI describes. Right there. Instantly.
Let's Get Real:
You upload a photo from your beach vacation.
Touch the top of the screen:
"Bright blue sky with wispy white clouds, crystal clear, no storms visible"
Drag down to the middle:
"Turquoise ocean water with small waves rolling in, foam visible at wave crests, extends to horizon"
Touch the left side:
"Sandy beach, light tan color with visible footprints, a few shells scattered about"
What's that on the right? Touch there:
"Red beach umbrella, slightly tilted, casting dark shadow on sand beneath it"
Wait, what's under the umbrella? Touch that spot:
"Blue and white striped beach chair, appears unoccupied, small cooler beside it"
Go back to those shells - drag your finger back to the beach:
"Sandy beach, light tan color with visible footprints, a few shells scattered..."
See what just happened?
The information didn't vanish. You went back. You explored what YOU wanted. You took your time. You discovered that cooler the AI might never have mentioned on its own.
You're not being told about the photo. You're exploring it.
And here's the kicker: users are spending minutes exploring single photos. Going back to corners. Discovering tiny details. Building complete mental maps.
That's not an accessibility feature. That's an exploration engine.

Live Camera Explorer: Now Touch the Actual World Around You
Okay, that's cool for photos.
But what if you could do that with the real world? Right now? As you're standing there?
Point your camera at any space. The AI analyzes everything in real-time and maps it to your screen.
Drag your finger - the AI tells you what's under your finger:
• Touch left: "Wooden door, 7 feet on your left, slightly open"
• Drag center: "Clear path ahead, hardwood floor, 12 feet visible"
• Touch right: "Bookshelf against wall, 5 feet right, packed with books"
• Bottom of screen: "Coffee table directly ahead, 3 feet, watch your shins"
The world is now touchable.
Real Scenario: Shopping Mall
You're at a busy mall. Noise everywhere. People walking past. You need to find the restroom and you're not sure which direction to go.
Old way? Ask someone, hope they give good directions, try to remember everything they said.
New way?
Point your camera down the hallway. Give it a few seconds.
Now drag your finger around:
• Touch left: "Store entrance on left, 15 feet, bright lights, appears to be clothing store"
• Drag center: "Wide corridor ahead, tiled floor, people walking, 30 feet visible"
• Touch right: "Information kiosk, 10 feet right, tall digital directory screen"
• Drag up: "Restroom sign, 25 feet ahead on right, blue symbol visible"
You just learned the entire hallway layout in 20 seconds.
Need to remember where that restroom was? Just touch that spot again. The map's still there.
Walk forward 20 feet, confused about where to go next? Point again. Get a new map. Drag your finger around.
But Wait - It Gets Wilder
Object Tracking:
Double-tap any object. The AI locks onto it and tracks it for you.
"Tracked: Restroom entrance. 25 feet straight ahead on right side."
Walk forward. The AI updates:
"Tracked restroom now 12 feet ahead on right."
Lost it? Double-tap again:
"Tracked restroom: About 8 steps ahead. Turn right in 4 steps. Group of people between you - stay left to avoid."
Zoom Into Anything:
Tracking that information kiosk? Swipe left.
BOOM. You're now exploring what's ON the kiosk.
• Touch top: "Mall directory map, large touchscreen, showing floor layout"
• Drag center: "Store listings, alphabetical order, bright white text on blue background"
• Touch bottom: "You are here marker, red dot with arrow, pointing to current location level 2 near food court"
Swipe right to zoom back out. You're back to the full hallway view.
Read Any Text
Swipe up - the AI switches to text mode and maps every readable thing.
Now drag your finger:
• Touch here: "Restrooms. Arrow pointing right."
• Drag down: "Food Court level 3. Arrow pointing up."
• Touch lower: "Store hours: Monday to Saturday 10 AM to 9 PM, Sunday 11 AM to 6 PM"
Every sign. Every label. Every directory. Touchable. Explorable.
Scene Summary On Demand
Lost? Overwhelmed? Three-finger tap anywhere.
"Shopping mall corridor. Stores on both sides, restroom 25 feet ahead right, information kiosk 10 feet right, people walking in both directions. 18 objects detected."
Instant orientation. Anytime you need it.
Watch Mode (This One's Wild)
Two-finger double-tap.
The AI switches to Watch Mode and starts narrating live actions in real-time:
"Person approaching from left" "Child running ahead toward fountain" "Security guard walking past on right" "Someone exiting store carrying shopping bags"
It's like having someone describe what's happening around you, continuously, as it happens.

The Fundamental Difference
Every other app: AI decides → Describes → Done → Repeat
This app: You explore → Information stays → Go back anytime → You control everything
It's not an improvement.
It's a completely different paradigm.

You're Not a Listener Anymore. You're an Explorer.
Most apps make you passive.
This app makes you active.
• You decide what to explore
• You decide how long to spend there
• You discover what matters to you
• You can go back and check anything again
The AI isn't deciding what's important. You are.
The information doesn't disappear. It stays there.
You're not being helped. You're exploring.
That's what accessibility should actually mean.

Oh Right, There's More
Because sometimes you just need quick answers:
Voice Control: Just speak - "What am I holding?" "Read this." "What color is this shirt?"
Book Reader: Scan pages, explore line-by-line, premium AI voices, auto-saves your spot
Document Reader: Fill forms, read PDFs, accessible field navigation

Why a Web App? Because Speed Matters.
App stores = submit → wait 2 weeks → maybe approved → users update manually → some stuck on old version for months.
Web app = fix bugs in hours. Ship features instantly. Everyone updated immediately.
Plus it works on literally every smartphone:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Anything with a browser ✓
Install in 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Done. It's an app now.

The Price (Let's Be Direct)
30-day free trial. Everything unlocked. No credit card.
After that: $9.99 CAD/month
Why? Because the AI costs me money every single time you use it. Plus I'm paying for servers. I'm one person building this.
I priced it to keep it affordable while keeping it running and improving.

Safety Warning (Important)
AI makes mistakes.
This is NOT a replacement for your cane, guide dog, or mobility training.
It's supplementary information. Not primary navigation.
Never make safety decisions based solely on what the AI says.

The Real Point of This Whole Thing
For years, every vision AI app has said:
"We'll tell you what you're looking at."
I'm saying something different:
"Explore what you're looking at yourself."
Not one description - touchable objects you can explore for as long as you want.
Not one explanation - a persistent map you can reference anytime.
Not being told - discovering for yourself.
Information that persists. Exploration you control. Discovery on your terms.

People are spending 10-15 minutes exploring single photos.
Going back to corners. Finding hidden details. Building complete mental pictures.
That's not accessibility.
That's exploration.
That's discovery.
That's control.
And I think that's what we should have been building all along.
You can try out the app here:
http://visionaiassistant.com

Options

Comments

that is awesome news!

Hello,

That is awesome news! congratulations.

Looks Great...

I just had a short go with this app on my Mac, and it seems pretty awesome. I couldn't get past the settings, but perhaps I just have to poke around a bit more. My trusty iPhone 14 is sitting right next to me, and I'm excited to try this out on there as well. Update for Wednesday: I just tried this on my phone, and it seems to work well. Apparently there's not enough lighting here in my apartment or something like that, but one feature which I'm especially looking forward to trying out is the voice guidance. My camera aim isn't that great, and I've therefore had mixed results with these scanning apps. I'll definitely play around more with your app, and depending on my financial situation I'm happy to pay a nominal subscription fee. For now though it's off to La-la Land.

I should probably wait to say anything but,

I had 2 high level meetings with 1 the CEo and 2 the product development manager from Benetech the folks behind the book share app. Let’s all cross our fingers! This app is going to be something special and it’s all thanks to each and everyone of you. Hopefully I will have some big announcements for you shortly. It was just a couple of meetings so cross your fingers for me 😊.