Stop Waiting for AI to Tell You What to See. Start Exploring It Yourself.

By Stephen, 12 November, 2025

Forum

Assistive Technology

I'm about to show you something that breaks every rule about how vision AI is "supposed" to work.
And when I say breaks the rules, I mean completely flips the whole thing upside down.

Here's What's Wrong With Every Vision AI App You've Ever Used
You point your camera.
You wait.
The AI speaks: "It's a living room with a couch and a table."
Cool story. But where's the couch? What color? How close? What's on it? What about that corner over there? That thing on the wall?
Want to know? Point again. Wait again. Ask again.
The AI decides what you need to know. You're stuck listening to whatever it decides to tell you. You don't get to choose. You don't get to dig deeper. You don't get to explore.
You're just a passenger.
So I built something that does the exact opposite.

What If Photos Were Like Video Games Instead of Books?
Forget books. Think video games.
In a game, you don't wait for someone to describe the room. You walk around and look at stuff yourself. You check the corners. You examine objects. You go back to things that interest you. You control what you explore and when.
That's what I built. But for photos. And real-world spaces.
You're not listening to descriptions anymore.
You're exploring them.

Photo Explorer: Touch. Discover. Control.
Here's how it works:
Upload any photo. The AI instantly maps every single object in it.
Now drag your finger across your phone screen.
Wherever you touch? That's what the AI describes. Right there. Instantly.
Let's Get Real:
You upload a photo from your beach vacation.
Touch the top of the screen:
"Bright blue sky with wispy white clouds, crystal clear, no storms visible"
Drag down to the middle:
"Turquoise ocean water with small waves rolling in, foam visible at wave crests, extends to horizon"
Touch the left side:
"Sandy beach, light tan color with visible footprints, a few shells scattered about"
What's that on the right? Touch there:
"Red beach umbrella, slightly tilted, casting dark shadow on sand beneath it"
Wait, what's under the umbrella? Touch that spot:
"Blue and white striped beach chair, appears unoccupied, small cooler beside it"
Go back to those shells - drag your finger back to the beach:
"Sandy beach, light tan color with visible footprints, a few shells scattered..."
See what just happened?
The information didn't vanish. You went back. You explored what YOU wanted. You took your time. You discovered that cooler the AI might never have mentioned on its own.
You're not being told about the photo. You're exploring it.
And here's the kicker: users are spending minutes exploring single photos. Going back to corners. Discovering tiny details. Building complete mental maps.
That's not an accessibility feature. That's an exploration engine.

Live Camera Explorer: Now Touch the Actual World Around You
Okay, that's cool for photos.
But what if you could do that with the real world? Right now? As you're standing there?
Point your camera at any space. The AI analyzes everything in real-time and maps it to your screen.
Drag your finger - the AI tells you what's under your finger:
• Touch left: "Wooden door, 7 feet on your left, slightly open"
• Drag center: "Clear path ahead, hardwood floor, 12 feet visible"
• Touch right: "Bookshelf against wall, 5 feet right, packed with books"
• Bottom of screen: "Coffee table directly ahead, 3 feet, watch your shins"
The world is now touchable.
Real Scenario: Shopping Mall
You're at a busy mall. Noise everywhere. People walking past. You need to find the restroom and you're not sure which direction to go.
Old way? Ask someone, hope they give good directions, try to remember everything they said.
New way?
Point your camera down the hallway. Give it a few seconds.
Now drag your finger around:
• Touch left: "Store entrance on left, 15 feet, bright lights, appears to be clothing store"
• Drag center: "Wide corridor ahead, tiled floor, people walking, 30 feet visible"
• Touch right: "Information kiosk, 10 feet right, tall digital directory screen"
• Drag up: "Restroom sign, 25 feet ahead on right, blue symbol visible"
You just learned the entire hallway layout in 20 seconds.
Need to remember where that restroom was? Just touch that spot again. The map's still there.
Walk forward 20 feet, confused about where to go next? Point again. Get a new map. Drag your finger around.
But Wait - It Gets Wilder
Object Tracking:
Double-tap any object. The AI locks onto it and tracks it for you.
"Tracked: Restroom entrance. 25 feet straight ahead on right side."
Walk forward. The AI updates:
"Tracked restroom now 12 feet ahead on right."
Lost it? Double-tap again:
"Tracked restroom: About 8 steps ahead. Turn right in 4 steps. Group of people between you - stay left to avoid."
Zoom Into Anything:
Tracking that information kiosk? Swipe left.
BOOM. You're now exploring what's ON the kiosk.
• Touch top: "Mall directory map, large touchscreen, showing floor layout"
• Drag center: "Store listings, alphabetical order, bright white text on blue background"
• Touch bottom: "You are here marker, red dot with arrow, pointing to current location level 2 near food court"
Swipe right to zoom back out. You're back to the full hallway view.
Read Any Text
Swipe up - the AI switches to text mode and maps every readable thing.
Now drag your finger:
• Touch here: "Restrooms. Arrow pointing right."
• Drag down: "Food Court level 3. Arrow pointing up."
• Touch lower: "Store hours: Monday to Saturday 10 AM to 9 PM, Sunday 11 AM to 6 PM"
Every sign. Every label. Every directory. Touchable. Explorable.
Scene Summary On Demand
Lost? Overwhelmed? Three-finger tap anywhere.
"Shopping mall corridor. Stores on both sides, restroom 25 feet ahead right, information kiosk 10 feet right, people walking in both directions. 18 objects detected."
Instant orientation. Anytime you need it.
Watch Mode (This One's Wild)
Two-finger double-tap.
The AI switches to Watch Mode and starts narrating live actions in real-time:
"Person approaching from left" "Child running ahead toward fountain" "Security guard walking past on right" "Someone exiting store carrying shopping bags"
It's like having someone describe what's happening around you, continuously, as it happens.

The Fundamental Difference
Every other app: AI decides → Describes → Done → Repeat
This app: You explore → Information stays → Go back anytime → You control everything
It's not an improvement.
It's a completely different paradigm.

You're Not a Listener Anymore. You're an Explorer.
Most apps make you passive.
This app makes you active.
• You decide what to explore
• You decide how long to spend there
• You discover what matters to you
• You can go back and check anything again
The AI isn't deciding what's important. You are.
The information doesn't disappear. It stays there.
You're not being helped. You're exploring.
That's what accessibility should actually mean.

Oh Right, There's More
Because sometimes you just need quick answers:
Voice Control: Just speak - "What am I holding?" "Read this." "What color is this shirt?"
Book Reader: Scan pages, explore line-by-line, premium AI voices, auto-saves your spot
Document Reader: Fill forms, read PDFs, accessible field navigation

Why a Web App? Because Speed Matters.
App stores = submit → wait 2 weeks → maybe approved → users update manually → some stuck on old version for months.
Web app = fix bugs in hours. Ship features instantly. Everyone updated immediately.
Plus it works on literally every smartphone:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Anything with a browser ✓
Install in 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Done. It's an app now.

The Price (Let's Be Direct)
30-day free trial. Everything unlocked. No credit card.
After that: $9.99 CAD/month
Why? Because the AI costs me money every single time you use it. Plus I'm paying for servers. I'm one person building this.
I priced it to keep it affordable while keeping it running and improving.

Safety Warning (Important)
AI makes mistakes.
This is NOT a replacement for your cane, guide dog, or mobility training.
It's supplementary information. Not primary navigation.
Never make safety decisions based solely on what the AI says.

The Real Point of This Whole Thing
For years, every vision AI app has said:
"We'll tell you what you're looking at."
I'm saying something different:
"Explore what you're looking at yourself."
Not one description - touchable objects you can explore for as long as you want.
Not one explanation - a persistent map you can reference anytime.
Not being told - discovering for yourself.
Information that persists. Exploration you control. Discovery on your terms.

People are spending 10-15 minutes exploring single photos.
Going back to corners. Finding hidden details. Building complete mental pictures.
That's not accessibility.
That's exploration.
That's discovery.
That's control.
And I think that's what we should have been building all along.
You can try out the app here:
http://visionaiassistant.com

Options

Comments

The Architecture of Accessibility: Why Full-Screen Overlays...

The Architecture of Accessibility: Why Full-Screen Overlays Changed Everything
By the Vision Assistant Development Team
When we set out to build a true AI vision assistant for blind and visually impaired users, we faced a fundamental question: how do you create an interface that doesn't just describe the world, but lets you explore it?
The answer? Full-screen camera overlays with real-time processing. Let me explain why this architectural decision transformed everything.
The Overlay Philosophy
Traditional accessibility apps trap you in menus. Click here, navigate there, wait for a response. It's slow. It's clunky. It's nothing like how sighted people experience the world.
Full-screen overlays flip this paradigm. When you enter Room Explorer, Photo Explorer, or Real-Time Navigation, your entire phone becomes a window into that space. The camera feed fills the screen. Your fingers become your eyes. The audio feedback becomes your spatial awareness.
No menus. No buttons. Just you and the environment.
This is only possible because the overlay completely takes over the interface. It's a dedicated mode—like switching your brain from "phone mode" to "exploration mode." Everything else disappears. All processing power, all sensors, all audio output—dedicated to one task: helping you understand your surroundings.
Why Built-In Screen Readers Break Everything
Here's where it gets technical, and why our testing revealed something surprising.
Built-in screen readers like VoiceOver and TalkBack are amazing for traditional apps. They're designed to read buttons, labels, text fields—UI elements with defined roles and states. They're semantic interpreters for structured interfaces.
But our overlays aren't structured interfaces.
When you're touching a photo to explore it, you're not pressing a button labeled "top-left corner." You're experiencing raw spatial data. Your finger position maps to image coordinates. The haptic feedback intensity represents object density. The spatial audio pitch indicates distance.
A screen reader tries to make sense of this and gets confused:
• "Video element. Playing."
• "Image. Decorative."
• "Canvas. No label."
It interrupts the real-time audio feedback with interface announcements. It tries to read the camera preview like it's a webpage. It delays touch responses because it's waiting for double-tap gestures.
The screen reader is trying to help, but it's speaking the wrong language.
After extensive testing with blind users—real-world testing, not lab conditions—we discovered something crucial: turning off your device's screen reader during exploration modes gives you a better experience.
Why? Because these modes implement their own audio feedback systems, custom-designed for spatial exploration:
• Real-time obstacle tones that change pitch/intensity based on distance
• Spatial audio that pans left/right to indicate object positions
• Contextual voice announcements that speak only when relevant
• Haptic feedback synchronized with visual features
All of this runs in parallel with continuous AI analysis. A traditional screen reader can't coordinate with this multi-modal feedback system.
Real-Time Navigation: The Crown Jewel
Let's talk about the real-time navigation overlay, because this is where the technology really shines.
Three-Layer Detection System:
1.
Client-Side Object Detection (150ms refresh)
• Runs TensorFlow.js models directly on your device
• Identifies objects: people, cars, furniture, walls
• Calculates positions and distances in real-time
• Zero latency—no internet required for this layer
2.
AI Pathfinding Analysis (2-second intervals)
• Uploads low-resolution frames to our vision AI
• Identifies walkable areas, optimal directions
• Detects upcoming features: doors, turns, stairs
• Provides navigational context
3.
Simple Proximity Alerts (3-second intervals)
• Lightweight AI checks for important nearby objects
• Announces only critical information: "Door in 6 steps on your right"
• Avoids information overload
• Step-based distances (not feet/meters—more intuitive)
The Audio Feedback System:
Each detected obstacle generates a unique spatial audio tone:
• Pitch = distance (higher = closer)
• Pan = horizontal position (left speaker = left, right speaker = right)
• Volume = threat level
• Waveform = object type (sine for normal, sawtooth for critical)
You're not hearing about obstacles—you're hearing the obstacles themselves. Your brain builds a sonic map of the space.
Critical obstacles (people, cars, close objects directly ahead) trigger aggressive warning tones. You don't need to process language—your nervous system reacts to the sound instinctively.
Double-Tap for Context:
Here's the brilliant part: continuous mode gives you just enough information. "Hallway in 2 steps." "Door in 6 steps on your right."
But when you need more? Double-tap the screen anywhere.
• ✅ Beep confirmation
• ✅ Scanning pauses (audio stays clear)
• ✅ AI analyzes your exact position
• ✅ Contextual response: "You're at the bookshelf. Chair on your left, table ahead. Continue straight 8 steps to reach the door."
• ✅ Scanning resumes automatically
It's like having a sighted guide you can tap on the shoulder: "Where exactly am I right now?"
The Technology Stack: Why Our AI Wins
Let's be honest—there are other vision AI apps. But here's why ours is different:
1. Hybrid On-Device + Cloud Processing
Most apps choose one or the other. We use both:
• Fast, private on-device detection for obstacles (TensorFlow.js)
• Powerful cloud AI for complex scene understanding,
• Intelligent switching based on task requirements
2. Context-Aware Prompting
Our AI doesn't just "describe the image." Every single prompt is engineered for the specific task:
• Reading mode: Extract text, maintain reading order, handle page layouts
• Navigation: Identify walkable paths, estimate distances in steps, warn about hazards
• Search mode: Track object positions across frames, provide directional guidance

Each mode uses carefully crafted prompts that tell the AI exactly what information matters and how to format it for spoken output.
3. Spatial Memory Integration
The app learns your environment through continuous camera analysis:
• Remembers room layouts from photo analysis
• Recognizes frequently visited locations through visual patterns
• Tracks where you last saw objects ("where did I leave my keys?")
• Builds a personalized spatial database from captured images [still being worked on]
Other apps treat every scene as new. We treat your life as continuous.
4. Multi-Modal Fusion
We don't just send images to AI. We combine:
• Visual data (camera frames)
• Spatial data (device orientation)
• Temporal data (movement patterns, history)
• User context (saved locations, tracked objects, preferences)
The AI sees what you see, but it also knows where you've been and what matters to you based on your photo history and saved spatial memories.
The Camera Overlay Experience
Every exploration mode is built around the camera overlay architecture:
Room Explorer: Your finger touches a grid mapped to the camera image. Each cell triggers AI analysis of that specific area. Haptic feedback intensity matches object density. Spatial audio plays sounds positioned where objects actually are in the frame.
Photo Explorer: Upload any image and explore it by touch. The overlay divides the photo into a tactile grid. Touch top-left? You hear what's there. Swipe across? You hear objects from left to right. It's like feeling a photograph.
Real-Time Navigation: The camera feed becomes your windshield. Visual object markers overlay detected obstacles. Audio tones create a sonic landscape. The crosshair shows where you're "looking." Everything updates 150 milliseconds—faster than conscious thought.
These aren't camera apps with accessibility features added on. They're accessibility-first experiences that require camera overlays to work.

Room/Photo Explorer:
• Voice descriptions of currently selected grid cell
• "What's here?" - AI analysis of touch location
• "Zoom in/out" - Change exploration detail level
The overlay doesn't block voice—it enhances it with spatial context.
Why This Matters
Accessible technology isn't about making phones usable for blind people. It's about making phones powerful for blind people.
Full-screen overlays, real-time spatial audio, on-device AI, context-aware prompting—these aren't accessibility features. These are next-generation interface designs that happen to work brilliantly for blind users.
When you turn off VoiceOver and enter navigation mode, you're not losing accessibility—you're gaining a purpose-built tool that traditional screen readers can't match.
You're experiencing spatial computing. Audio-first interaction design. Real-time AI vision. It's not augmented reality—it's augmented perception.
And we're just getting started.

Pro Tips for Exploration Modes:
1. Turn off VoiceOver/TalkBack before entering Room Explorer, Photo Explorer, or Real-Time Navigation
2. Turn it back on when you exit to the main app
3. Double-tap in Navigation for location updates—don't spam it, use it strategically
4. Wear open ear headphones for best spatial audio experience
5. Start indoors to learn the system before taking it outside
The future of accessibility isn't about reading buttons—it's about experiencing the world.
Welcome to that future.

@Guilherme

So in regards to live ai, I've totally scrapped that feature...I have put something much better on your home screen. As for exploring photos, I'm looking into it...it has been a problem since late last night and I'm currently investigating what happened. All will be fixed soon. :).

@Guilherme

I'll look into the language issue as well. Again thanks for letting me know!

Amazing job, and perhaps norwegian?

I've been following this thread with great interest the last days, but haven't had enough spare time to check out the web app in much detail until now. But man, this is some great work and thought put into this project! And the turnaround time for fixing bugs and coming up with new ideas and features is just halarious! 😅 Can't wait to see where this is heading going forwards! Just awesome! And then on to a humble request. I see you've added several languages, but none for any of the scandinavian countries. I myself live in Norway, so adding Norwegian would be pretty high on my wishlist! Keep up the amazing work! 👍🏻

@Cliff

Thank you so much! I take pride in my support handling. I really like to be there for my users. I will look into seeing how we can support your language request :).

Bug fixes

fix the bug where you would go into photo Explorer mode and try to move your finger along the screen it wouldn’t detect any objects in the photo you were trying to explore. Did a complete overhaul of the system so it should work decently now. In app screen reader disabled for all users until I can refine it and make it work more sufficiently. Folks were having trouble double tapping on an element. They were trying to select one thing, but the screen reader was selecting another. Please ensure to disable your device screen reader before entering explorer mode. A three finger swipe left gesture Should bring you back to the homepage if you are in explorer mode. Refining real time navigation to make things more optimal and doing the set up for contextual awareness. I’m looking into why users language settings aren’t saving that should be fixed shortly. Live AI has been removed due to the new implementation of the real time navigation feature but the voice conversation with the artificial intelligence where you can ask it what’s in front of you is working fine and it should also have better OCR now as well.

A big thank you!

I do want to take this time to give a really great big thank you for those who have donated to this project already. You are amazing and words can’t express how much I appreciate you. Thank you so much for your support.

Preferred language

OK, so hopefully your preferred language will work now. Please yell at me if it doesn’t lol.

Photo exploration updated

Ok so photo exploration has been updated to be more responsive and yes I had to get a sighted person to make it look more professional. It is no longer clunky and should work efficiently. I’ll be adding more enrichment descriptions for your photos tomorrow. I will also be updating the room explorer feature. Not a big one just a couple tweaks so it feels smoother and looks professional for presentation. If you’re using the real time navigator, and for some reason, you’re finding it’s hallucinating, let me know and I can adjust the image qualities. There is also a new follow mode in the real time navigation features… It’s brand new. Haven’t tested it out thoroughly yet so go to town and feel free to yell at me if it’s not working.

Regarding the live AI mode that was removed

Hello, please don’t take this as criticism — it’s just a thought: I didn’t really like that the live AI mode was removed, because many times what we want is simply a description of the environment we’re in without having to keep asking, and other apps don’t do this. Because of that, this feature would also be a distinguishing factor for this app. And just to mention it, the descriptions in the photo explorer are being spoken in English and not in my preferred language, which is Portuguese. I’ve already gone into the settings and selected Portuguese again, but the descriptions are still being spoken in English.

@ Guilherme

In regards to the live ai, it was super buggy and it lagged a lot. The real time navigation function works much better than that ever did but if you want it I can most certainly bring it back. In regards to languages, thank you for letting me know. I’ll get on that issue after work tomorrow. Hang tight, I’m on it :).

@Guilherme

Fixed the language bug this morning. Seems to be working on our end so you shouldn't have anymore issues with that. If you do just yell at me about it :).

New feature!!!

Ok so there is now an AI image generation mode wihtch works the same way as photo explorer! Come and explore your own creations!

Doesn't work for me

I really want to try this app out, because it sounds absolutely amazing. But I can't get it to work for me at all. I am using my iPhone with VoiceOver and Safari. I double tap the link, and nothing shows up for me to interact with. Just the buttons on the browser like back, forward, share, and nothing on the actual webpage. It's just empty for me.

@Zoe

Should be fixed now. Let me know if it is working for you :).

Thank You

It's working now, thank you so much for the fix.
I do have another question though. Not all the voices I have on my device are in the list. I can't seem to use the really good voices like Alex or Ava, instead only the lower quality ones like Samantha and Fred, and the novelty voices. Is there a particular reason why this isn't available?

@Zoe

I'm not entirely sure about the Alex voice situation. Some users are able to use it, others are not. I'm still trying to investigate why it is working for some and not all users. Hang tight, I'll keep you posted :).

Having trouble with photo grid

I don't know why, but I'm having difficulties using the photo grid. I turn off voiceover and try to explore, but I just get some sort of haptic and then no sounds or voice feedback describing the photo. Any ideas on what might be happening? Thanks.

@Zoe

What mode are you in? are you using photo explorer mode or room mode?

Re — Mode

I'm using photo explore mode. And I'm having a neat time so far despite my issues.

@Zoe

haha I'm glad. Thanks for letting me know. I have a massive back end so the more details the better for me haha. It is like a maze...I've really been pushing AI to it's absolute limit lol. I will be pushing an update to photo mode shortly hang tight and I'll fix that all up for you.

Another bug

Hi. Unfortunately I have found yet another bug. I am unable to explore photos in the social media feed. It says that the photos do not have any tactile data when I have uploaded my own photo to the feed, and I know it got the map for it.

Re: photo mode

Hey Stephen,

First, I love the way you have organized the options within this application. It is looking more and more professional each and every day. Like Zoe, I am having an issue with photo mode. When I go to choose a photo from a file, it loads the photo fine, even gives me a nice little description. However, it doesn't describe anything. I make sure VoiceOver is off, and I'm hearing the built-in reader using the Alex voice. However, as I start exploring, it doesn't say anything, and eventually boots me back out to the main menu.
Also, when I choose to explore a room, that works a little better. However, when I try to zoom in on a particular object, I sometimes will get an error. I don't remember exactly what the error says, but it says something along the lines of Cannot zoom in on this item.

Otherwise, this application is turning out beautiful.

Yet another bug

I can't get past the room explorer tutorial because the ai says that it can't zoom in and analyze objects in the image.

@Zoe

Yeah that isn't fully up and running for that mode yet. I'm just getting the others mastered so then I can basically upload the infrastructure. This app is in alpha and I am building it live with you folks so not every feature is working properly yet. Quick question in photo mode are you just disabling screen reader speech or are you turning it off entirely for photo exploration mode? On device screen reader needs to be completely off in that mode. I just tested it on my end.

Turning it off

I am completely disabling VoiceOver so that's not the issue. A fair question to ask, though.

@Zoe

Are you uploading a photo or taking a photo?

Also completely disabling VO

I have an SE 2022, so for me it is a triple click of the Home button every time. 😀

@Steven

I am uploading photos.

@Zoe

Is it possible to upload a recording to me of what is going on? I can't reproduce that issue and it seems to be functioning for other users. You also said your getting haptic feedback on IOS? You shouldn't be able to feel vibrations do to ios restrictions. It looks like only android users can get that. :).

Working better now

The photos are working a lot better now. I don't know what it was all about. It's hard to say what's a bug and what's just not been finished yet.
One thing I know for certain is a bug for me, is that I'm unable tu use the keyboard to type on IPhone to search in the web browser, or when generating an image to be explored.
One more thing that isn't a bug that can be improved is zooming in. I'm able to zoom in on photos but I was given the impression that I could find elements inside of elements, such as exploring features on a face. At the moment it just shows that single element I zoomed in on with a little more description. I'm assuming because that's not fully made yet? I'm loving what's available so far.

@Zoe

Yeah the text edit fields broke when updating the programming language...I'm fixing those along with the zoom features. Life of programming...you fix 1 thing another thing breaks lol. I'm also working on this while also at work so updates are a slower to come. All will be fixed though :).

Ok tell me:

How do you guys feel about this photo description mode...this ai image generation one is pretty cool as well. This was a tough one lol.

I'd really like to donate

But PayPal doesn't work in my country. Any other options?

Photo exploration continued

Definitely getting there. It will still occasionally kick me out back to the main menu while trying to explore a photo, however.
Just for transparency sake, I'm trying to use the same photo I uploaded for my profile. It's a pic of me sitting in front of a small brick wall, wearing khaki shorts and a T-shirt, a ball cap and a pair of sunglasses, and my yellow lap guy dog is laying down in front of me, kind of perpendicular to the way I'm facing. Hope that makes sense.

@Gokul

Omg I'm so sorry about that... let me see how I can help and thank you so much!

@brian

Bigger badder update should be coming. I'm working on this feature tirelessly lol. Sometimes though you may have wait longer than expected as what I've done is I put all images to full screen and gave the AI strict parameters as to object placement. A few more layers of zoom is coming shortly as well. It should be much more detailed.

@Gokul

If you use a credit card you should still be able to give thru that link based on my research. I don't think you need a paypal link to give thru that link based off what I'm seeing from other supporters. I'm pretty sure it also has apple pay on that link as well. I just think everything should be in all parts of the world. I'm so sorry about that. I will do some more looking and see if I can get something else set up that works for you. Please let me know when you go to that link :).

Is realtime Navigation up and running?

Because when I tried to go inside it, nothing seems to happen.

PayPal

Seems to have worked. I'm not entirely sure, but it seems to have.

@Gokul

Just tested the public version of real time navigator and seems to be working. Just insure to have your phone unmuted for it to work properly :). I might do some tweaks because sometimes it holds on to the previous capture but the audio cues for obsticals and the 1 finger doubletap with screen reader off works. I will probably change how it gives you the information just for better contextual awareness but nothing too major unless you find a problem with it. It does take a moment for the ai to analyze but any faster and we may start getting hallucinations and I'm trying to avoid any of that behavior of course :).

Now on to social media

Working on the social media feature now so you guys can upload and share your photos.

@Gokul

How are you finding the real time navigation feature? Is it working good for you?

Updates

Alright everyone! Together I think we nailed photo exploration mode. I will be copying the explore mode engine over to ai image creation and over to the social media page. Next on my list is to get that documents feature perfected and real time navigation tweaked to function as perfectly as possible. It is because of each and every single one of you that made this possible. If you want to see more features or are having issues I'm always here with open ears. Thanks to everyone of you for supporting this project. There is so much more to come! Lets keep this thread poppin! Love yawl!

about Live AI

Hello, if possible in future updates I would like you to bring back the Live AI feature, because it would be a great advantage for the app to have continuous descriptions of the environments around us without needing to keep asking.
As far as I know, there is no app that does this — in all of them we need to ask to get descriptions, even in those that have real-time vision.

feedback about the app

Hi Stephen, hats off to this wonderful idea of web app and the idea for exploring the photo.I always felt why we have to wait for the AI's own description of the photo after uploading the photo and we have to ask many questions to get the response we wanted. I always dreamt of an app which gives the user the power to get the info they want.
This app exactly does that .
It allows the user to choose the item to be described and leave the once not interesting to him like the sighted persons browse the headlines and read the content which is of interest to them.I tried using the following modes.
1. Room explorer.
2. Photo explorer.
3. Real time navigation.

Feedback about the various modes are as follows.

1. Room explorer.

After taking the photo, there is a period of inactivity during the processing of the image which confuses the user regarding the status of the application. Kindly introduce processing sounds or message that is processing.
After the results are announced, the AI is giving a most accurate description about the items detected. I am amazed how you achieved this level of accuracy, because all the AI I have tried mostly hallucinated.

2. Photo explorer .

The experience was excellent while exploring a photo.
First time I was able to understand how the photo might look like visually and appealing.

3. Real time Navigation.

The description was fairly accurate in this mode with hallucinations in between.
Some times it says the measurement in feet's and some times it says little far of with out any metrics that makes things difficult to understand.
I got struck in this mode and I was unable to come out of this mode with 2 finger or three finger swipe left.
Am I doing any thing wrong?

Feature request.

1. In the room and photo explorer mode, if the screen reader announces the number of rows and columns or the number of tails available for the user to explore , it would help us to understand how much portion of the screen we have to explore.

In the document reader also please give us the choice to read the content recognized instead of reading all the content from top to bottom, like all the OCR software does.
The sighted people just browse through the document and skip the header and consume only the important content of the document.

I also request to analyze, whether the response time can be shortened in the room and photo explorer mode, because I felt the response time was slow with even 400 MBPS speed network.

excuse me for a long post.

I am actually thrilled with your ideas.
Please keep up the good work.

@Arya

Hi arya,
Wow, thank you so much for this incredibly detailed and thoughtful feedback! 🎉 Your enthusiasm absolutely made my day, and the fact that you took the time to test every mode and write such comprehensive observations means the world to me.
Let me address your brilliant questions:
📸 Photo Explorer - Why No Grid System?
You asked about rows/columns/tiles for exploration - and this is actually one of the most intentional design decisions in the entire app! Here's why:
The drag-based exploration system was specifically designed to mimic how sighted people naturally look at photos. When you look at a picture, your eyes don't move in a rigid grid pattern - they flow freely, following what interests you. You might glance at someone's face, then drift down to their hand, then notice something in the background.
That's exactly what the drag system enables! As your finger moves across the screen, you're "tracing" the photo's natural contours. If you're exploring a person, your finger can smoothly flow from their head → down their torso → to their hand, just like eyes would scan. You're not jumping from "cell B3" to "cell C4" - you're experiencing the photo as a continuous space.
Why a grid would actually limit you:
• Arbitrary boundaries: A grid cell might awkwardly split someone's face in half, or group unrelated objects together
• Cognitive overhead: You'd need to remember "I'm in row 3, column 4" instead of just exploring naturally
• Breaks immersion: The magic of Photo Explorer is feeling like you're "touching" the actual scene, not navigating a spreadsheet
Think of it like this: Would you rather explore a sculpture by touching it freely, or by poking it only at pre-marked grid points? 😊
⏱️ Why Photo Processing Takes Time - The AI Magic Behind the Scenes
You mentioned the processing time, and I want to pull back the curtain on what's actually happening:
When you upload a photo, the AI isn't just tagging objects like "dog, person, tree." It's building an entire hierarchical map of the image with unlimited depth:
1. Level 0 (Scene): Identifies every individual object and person as separate entities - if there are 5 dogs, it creates 5 separate dog objects, not a "group of dogs"
2. Level 1 (Components): Breaks each object down into parts (person → head, torso, arms, legs, each dog → head, body, legs, tail)
3. Level 2 (Fine Details): Breaks those parts into sub-features (head → face, ears, hair, neck)
4. Level 3+ (Micro-Details): Goes even deeper (face → eyes, nose, mouth, cheeks, forehead)
Then for EACH of these hundreds of objects, it generates:
• Precise pixel-perfect boundaries so your finger hits exactly the right spot when you drag
• Tactile descriptions that describe texture, shape, and spatial relationships
• Immersive contextual cues for atmospheric details
• Zoom-level specific details so when you double-tap to zoom deeper, you get finer and finer features
This is why it takes 10-30 seconds - the AI is essentially creating a custom "tactile braille map" of your entire photo with 100-500+ touch-sensitive regions arranged in a hierarchical tree. We're working on optimizing this, but there's serious computational work happening to build that seamless exploration experience!
🏠 Room Explorer - The Hybrid Approach
You're absolutely right about the silent processing period - currently there's no automatic feedback, though you can touch the middle of the screen to hear "Analyzing..." We're working on making this automatic and more prominent so you're never left wondering.
Here's why we use the grid for Room Explorer but not Photo Explorer:
When you walk into a room, you need instant spatial orientation - "What's in front of me? What's to my left? What's far away?" The grid gives you that immediate 9-zone map (top-left, top-center, top-right, etc.) before the detailed AI analysis is even complete. It's like a quick mental sketch of the space.
Once you tap a grid cell and zoom in on a specific object, you then get the natural drag exploration for that object's details - but the initial grid helped you find it in the first place.
📄 Document Reader Improvements
You're spot on - we're actively working on selective reading! The next update will let you jump to specific sections (headlines, paragraphs, tables) instead of forcing top-to-bottom reading. Just like sighted people skim for what matters!
🔧 Response Time & Navigation Exit Bug
• Speed: We're optimizing the AI processing pipeline to show progressive results as they come in.
• Navigation mode exit: That two-finger swipe back gesture that's not working? Yeah, that's a nasty little bug I'm actively trying to squash! 🐛 It absolutely should work, and I'm prioritizing this fix. Could you email me which device/browser you're using? That'll help me kill this bug faster! I'm also working on it sometimes Hallucinating in this mode.
🌟 The Most Important Part
Everything you're experiencing is alpha software - which means YOUR feedback is literally shaping the final product. The fact that you're thinking about rows/columns, asking about measurement consistency, and requesting document skimming features? That's gold. Those observations go straight into the development roadmap.
You're not just a tester - you're a co-creator of this tool.
Please keep the feedback coming! Every hallucination you catch, every awkward interaction you notice, every feature you dream of - I want to hear it all.
Thank you for believing in this vision (pun intended 😊) and for helping make it better for the entire blind community.
Keep exploring!
Stephen