I'm about to show you something that breaks every rule about how vision AI is "supposed" to work.
And when I say breaks the rules, I mean completely flips the whole thing upside down.
Here's What's Wrong With Every Vision AI App You've Ever Used
You point your camera.
You wait.
The AI speaks: "It's a living room with a couch and a table."
Cool story. But where's the couch? What color? How close? What's on it? What about that corner over there? That thing on the wall?
Want to know? Point again. Wait again. Ask again.
The AI decides what you need to know. You're stuck listening to whatever it decides to tell you. You don't get to choose. You don't get to dig deeper. You don't get to explore.
You're just a passenger.
So I built something that does the exact opposite.
What If Photos Were Like Video Games Instead of Books?
Forget books. Think video games.
In a game, you don't wait for someone to describe the room. You walk around and look at stuff yourself. You check the corners. You examine objects. You go back to things that interest you. You control what you explore and when.
That's what I built. But for photos. And real-world spaces.
You're not listening to descriptions anymore.
You're exploring them.
Photo Explorer: Touch. Discover. Control.
Here's how it works:
Upload any photo. The AI instantly maps every single object in it.
Now drag your finger across your phone screen.
Wherever you touch? That's what the AI describes. Right there. Instantly.
Let's Get Real:
You upload a photo from your beach vacation.
Touch the top of the screen:
"Bright blue sky with wispy white clouds, crystal clear, no storms visible"
Drag down to the middle:
"Turquoise ocean water with small waves rolling in, foam visible at wave crests, extends to horizon"
Touch the left side:
"Sandy beach, light tan color with visible footprints, a few shells scattered about"
What's that on the right? Touch there:
"Red beach umbrella, slightly tilted, casting dark shadow on sand beneath it"
Wait, what's under the umbrella? Touch that spot:
"Blue and white striped beach chair, appears unoccupied, small cooler beside it"
Go back to those shells - drag your finger back to the beach:
"Sandy beach, light tan color with visible footprints, a few shells scattered..."
See what just happened?
The information didn't vanish. You went back. You explored what YOU wanted. You took your time. You discovered that cooler the AI might never have mentioned on its own.
You're not being told about the photo. You're exploring it.
And here's the kicker: users are spending minutes exploring single photos. Going back to corners. Discovering tiny details. Building complete mental maps.
That's not an accessibility feature. That's an exploration engine.
Live Camera Explorer: Now Touch the Actual World Around You
Okay, that's cool for photos.
But what if you could do that with the real world? Right now? As you're standing there?
Point your camera at any space. The AI analyzes everything in real-time and maps it to your screen.
Drag your finger - the AI tells you what's under your finger:
• Touch left: "Wooden door, 7 feet on your left, slightly open"
• Drag center: "Clear path ahead, hardwood floor, 12 feet visible"
• Touch right: "Bookshelf against wall, 5 feet right, packed with books"
• Bottom of screen: "Coffee table directly ahead, 3 feet, watch your shins"
The world is now touchable.
Real Scenario: Shopping Mall
You're at a busy mall. Noise everywhere. People walking past. You need to find the restroom and you're not sure which direction to go.
Old way? Ask someone, hope they give good directions, try to remember everything they said.
New way?
Point your camera down the hallway. Give it a few seconds.
Now drag your finger around:
• Touch left: "Store entrance on left, 15 feet, bright lights, appears to be clothing store"
• Drag center: "Wide corridor ahead, tiled floor, people walking, 30 feet visible"
• Touch right: "Information kiosk, 10 feet right, tall digital directory screen"
• Drag up: "Restroom sign, 25 feet ahead on right, blue symbol visible"
You just learned the entire hallway layout in 20 seconds.
Need to remember where that restroom was? Just touch that spot again. The map's still there.
Walk forward 20 feet, confused about where to go next? Point again. Get a new map. Drag your finger around.
But Wait - It Gets Wilder
Object Tracking:
Double-tap any object. The AI locks onto it and tracks it for you.
"Tracked: Restroom entrance. 25 feet straight ahead on right side."
Walk forward. The AI updates:
"Tracked restroom now 12 feet ahead on right."
Lost it? Double-tap again:
"Tracked restroom: About 8 steps ahead. Turn right in 4 steps. Group of people between you - stay left to avoid."
Zoom Into Anything:
Tracking that information kiosk? Swipe left.
BOOM. You're now exploring what's ON the kiosk.
• Touch top: "Mall directory map, large touchscreen, showing floor layout"
• Drag center: "Store listings, alphabetical order, bright white text on blue background"
• Touch bottom: "You are here marker, red dot with arrow, pointing to current location level 2 near food court"
Swipe right to zoom back out. You're back to the full hallway view.
Read Any Text
Swipe up - the AI switches to text mode and maps every readable thing.
Now drag your finger:
• Touch here: "Restrooms. Arrow pointing right."
• Drag down: "Food Court level 3. Arrow pointing up."
• Touch lower: "Store hours: Monday to Saturday 10 AM to 9 PM, Sunday 11 AM to 6 PM"
Every sign. Every label. Every directory. Touchable. Explorable.
Scene Summary On Demand
Lost? Overwhelmed? Three-finger tap anywhere.
"Shopping mall corridor. Stores on both sides, restroom 25 feet ahead right, information kiosk 10 feet right, people walking in both directions. 18 objects detected."
Instant orientation. Anytime you need it.
Watch Mode (This One's Wild)
Two-finger double-tap.
The AI switches to Watch Mode and starts narrating live actions in real-time:
"Person approaching from left" "Child running ahead toward fountain" "Security guard walking past on right" "Someone exiting store carrying shopping bags"
It's like having someone describe what's happening around you, continuously, as it happens.
The Fundamental Difference
Every other app: AI decides → Describes → Done → Repeat
This app: You explore → Information stays → Go back anytime → You control everything
It's not an improvement.
It's a completely different paradigm.
You're Not a Listener Anymore. You're an Explorer.
Most apps make you passive.
This app makes you active.
• You decide what to explore
• You decide how long to spend there
• You discover what matters to you
• You can go back and check anything again
The AI isn't deciding what's important. You are.
The information doesn't disappear. It stays there.
You're not being helped. You're exploring.
That's what accessibility should actually mean.
Oh Right, There's More
Because sometimes you just need quick answers:
Voice Control: Just speak - "What am I holding?" "Read this." "What color is this shirt?"
Book Reader: Scan pages, explore line-by-line, premium AI voices, auto-saves your spot
Document Reader: Fill forms, read PDFs, accessible field navigation
Why a Web App? Because Speed Matters.
App stores = submit → wait 2 weeks → maybe approved → users update manually → some stuck on old version for months.
Web app = fix bugs in hours. Ship features instantly. Everyone updated immediately.
Plus it works on literally every smartphone:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Anything with a browser ✓
Install in 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Done. It's an app now.
The Price (Let's Be Direct)
30-day free trial. Everything unlocked. No credit card.
After that: $9.99 CAD/month
Why? Because the AI costs me money every single time you use it. Plus I'm paying for servers. I'm one person building this.
I priced it to keep it affordable while keeping it running and improving.
Safety Warning (Important)
AI makes mistakes.
This is NOT a replacement for your cane, guide dog, or mobility training.
It's supplementary information. Not primary navigation.
Never make safety decisions based solely on what the AI says.
The Real Point of This Whole Thing
For years, every vision AI app has said:
"We'll tell you what you're looking at."
I'm saying something different:
"Explore what you're looking at yourself."
Not one description - touchable objects you can explore for as long as you want.
Not one explanation - a persistent map you can reference anytime.
Not being told - discovering for yourself.
Information that persists. Exploration you control. Discovery on your terms.
People are spending 10-15 minutes exploring single photos.
Going back to corners. Finding hidden details. Building complete mental pictures.
That's not accessibility.
That's exploration.
That's discovery.
That's control.
And I think that's what we should have been building all along.
You can try out the app here:
http://visionaiassistant.com
Comments
My thoughts
I had a quick play and thought I would give my first impressions. I think the touch grid thing is a genius idea and possibly once I get the hang of using the app it might prove to be very useful.
I did get a little stuck in the terms and conditions for a bit. There are a number of checkboxes and it was a bit laborious to find them. I kept going to the Accept button and it was dimmed, then had to go off looking for the next ones. That's probably just me though.
I also had a small issue with the location services. I was told that Safari was blocking the request, so I went into settings and safari was already set to ask. So I ended up skipping the step. Again probably something I was doing wrong but I didn't notice an option to set it.
I went into this expecting an accessibility tool so was a bit confused when it started going on about social media things. Then when I got into the app my first impression was "am I using the wrong thing?". It initially made me a little less confident about using it because it felt like I was going to be sharing photos with the world which I definitely do not want to do. There were also a number of unlabelled buttons here too. I realise I am an anti-social old man and once I realised I could just ignore it, I was OK. I'm sure lots of people will like this though.
I must admit, I don't really like the built-in screen reader. For the most part it feels unnecessary. Maybe it doesn't help that I have an unnatural aversion to Daniel. It did occasionally use my Spoken Content voice, seemingly at random. I guess it does make sense with the grid because you want to be able to go up/down as well as left/right. But otherwise I didn't really appreciate it. I did find the option to change the voice, but it was quite a complicated page with lots of controls. Maybe it's not been finished yet. I could swipe through all the default voices including the silly ones like Bubbles. I thought I'd changed to Karen but it seemed to use Daniel still. Anyway I guess I will get used to it.
When I was on the grid it did work fine, except I wasn't then sure how to get out of the grid and go back. I guess there are probably other swipes to learn and maybe I didn't pay attention when it told me. But I just turned VoiceOver back on and tried to ignore the other voices talking over the top of it.
I tried uploading a photo. I'm guessing as a PWA there's not going to be a way to share photos, but I chose to open the photo library. I got the usual very slight VoiceOver image recognition thing and chose a photo I didn't recognise. I chose the default Bullseye option. It is quite cool being able to move around the photo and feel the different parts of it. But I did find double tapping a little clumsy. I tapped on a group of people as I was interested to know who they were and how they ended up on my phone. But I think in double tapping I must have reselected a different photo instead as it started describing a barista who was in another square. I was swiping around for a bit, then it all went quiet. I wasn't sure if it was busy and tapped and swiped a bit. When I gave up and turned VO back on, I had some popup menu then found I had somehow managed to go Home and had closed the app altogether. I guess it is just going to need a bit of practice. I will need to try the other option and see if that works better for me.
I haven't tried it on a Desktop yet but noticed it uses a mouse or trackpad[. I think it would be better and more natural for screen reader uses if you could use the keyboard. Maybe I could use arrow keys to move around the grid and enter to zoom in? Maybe Esc to go back? And in that case I think if that was doable then I would much prefer to do that without the built-in screen reader.
I think there is an awful lot going on in here and that it does risk becoming a bit overwhelming as a result. As far as I can tell there ids, or plans to be, a social media platform, a tool to help me find stuff, a tool to help me explore photos, a web browser, an AI chat (?), some kind of location thing and all this with a different screen reader that I need to figure out. I am drawn to the idea of the photo grid. I think some of the other features feel a bit unnecessary. There is something a bit peculiar about going to a web site, logging in, and then from there going to another web browser to help me navigate the web. Particularly when it told me I could navigate with the mouse or trackpadd. I don't really get why I would want to do that.
Anyway apologies for the big long post and it is probably sounding a bit negative. It is quite likely that with a bit of of effort on my part it will all start to make sense.
Photo grid
Stupid question, maybe, but when am I supposed to be using the built-in screen reader? It seems that most of the time I need to use VO but when I get to the photo grid, for example, I need to disable it and use the built-in one. When I went through setup, I was using VO and it was fine except for Daniel talking from time to time. Maybe if I'd turned off VO it would have made more sense. But when I turn it off in the app I find I can't really do much until I get to the grid. So I need to keep turning it on and off depending on what I am doing. Is that how it is supposed to work?
I had another play with the photo grid. I think I misunderstood before - I thought I was being asked to choose between Bullseye and Progressive mode, but only bullseye was usable. Maybe the other option is coming later?
Anyway I tried bullseye again. The app thought about it and then the photo appeared. Except as I was moving around, all I heard were the sounds and no descriptions. It was kinda fun to play with but not helpful. Then I found I'd accidentally selected a square. I got a description of the scene this time as I swiped around, but I'd find it would sometimes go quiet and I wasn't sure why. Maybe it's my phone (iPhone 13 Pro Max with iOS 26.1) but sometimes I struggled to get anything to happen for a bit. Sometimes I would hear something like "Reset".
I managed to drill-down to the correct square once, and got the wrong one the second time I tried. I guess it's just a matter of practice, but a split tap would be a lot easier to execute.
After a while it all went quiet, but without coming back to I was a little unsure about how far I could drill-down. It went so far then stopped. The level of detail I got to was more or less just telling me there was a face with blonde hair or something. I tried to go further but think that was as far as it went.
I tried to go elsewhere but it kept going quiet on me and I wasn't sure what was going on. Eventually, nothing, and again I found I'd fallen out of what I was trying to do and this time was back a few screens asking me to choose a photo or something.
One small suggestion is that any time the app is doing something, I would appreciate a little noise to know that I should just wait and not try moving about. This might alleviate some of my issues - maybe I was just too impatient.
One other small thing that was a bit annoying is that every time I went to the start page, it would ask me to grant access to the camera even when I wasn't using it.
I was comparing the grid idea to something like PiccyBot, which is my app of choice for describing what's in a photo. And I was thinking that the main thing this app gives me is an outline of what is there. So with PiccyBot I am going to get an awful lot of detail up front and I can ask follow-up questions. Now I could probably tailor the prompt to give me a specific list of things in the photo, but it is still going to reel them out.
I think this app has the potential to be a quicker way of finding the thing I want, but unfortunately it doesn't really work out that way. Firstly, it takes me a long time to get into it, find the right place, select the photo, turn off VoiceOver, wait for it to have a good long think, and only then do I get the top-level list of things for me to select from.
Whereas I think for this sort of thing I could probably just use other apps and use a better starting question, then just say "tell me about such and such".
Again I think this app is maybe just doing too much, whereas it might be better if it just concentrated on its USP, which I think is the photo grid, and make it really intuitive and nice to use. But I don't want another screen reader, I don't want to use a mouse or a trackpad, I just want to get to the answer.
But let's face it I am not everyone. I am less patient than most, maybe a bit too lazy when confronted with something new, and definitely less sociable.
Anyway, sorry again if I am just being overly critical. I love that you are pouring so much into this app and I hope you continue so that I can eat my words. You have done an incredible amount in a ridiculously short amount of time, so who knows where this is going.
I can't see myself using it as it stands but I will keep following along with this thread and see where it goes. I genuinely wish you luck with it and please don't let me dissuade you.
@mr grieves
Thank you for taking the time to provide such detailed feedback. It's incredibly valuable for us as we continue to refine the experience. I can definitely clarify some of the points you raised.
First, regarding the screen reader usage: You've hit on a core aspect of how our app is designed to provide a unique, immersive experience, especially in features like the photo grid and room exploration.
The Built-in Screen Reader and Why VoiceOver Needs to Be Off for Tactile Exploration:
• Custom Gestures and Spatial Interaction: Our app's tactile exploration modes (like the photo grid and room exploration) rely heavily on custom multi-touch gestures and spatial audio cues. We've designed a system that allows you to "feel" the layout of a scene or room, pinpoint specific objects, and zoom into them using your fingers directly on the screen.
• VoiceOver Interference: When an external screen reader like VoiceOver is active, it intercepts most of these multi-touch gestures. This means the app doesn't receive the direct touch input it needs to interpret your "explorations" (drags, double-taps for zoom, triple-taps for navigation, or multi-finger swipes to exit). VoiceOver tries to describe what's on the screen rather than allowing you to directly interact with the spatial layout.
• Split Tap Limitation: You suggested a "split tap" might be easier, but this would unfortunately remove the ability to perform the nuanced multi-finger gestures that allow for zooming and precise navigation. Our system is designed for a more direct, multi-dimensional tactile feedback, not just sequential item reading.
• Current Workflow: Yes, for now, the intended workflow is to temporarily disable VoiceOver when entering the tactile exploration modes (photo grid, room exploration) to fully engage with our app's unique interactive features. We understand this adds an extra step, and we're always looking for ways to streamline this, but it's a necessary compromise to deliver these specialized experiences. For all other parts of the app (menus, settings, etc.), VoiceOver can certainly remain on.
Your Experience with the Photo Grid ("Bullseye" Mode):
• "Bullseye" Mode: You're correct; "Bullseye" is our primary focus for photo exploration, with other modes being future developments.
• Sounds but No Descriptions: The sounds you hear are indeed our spatial audio cues. They're designed to give you a sense of where objects are in the grid and their importance, even before you explicitly select them. This helps you build a mental map of the scene. The detailed descriptions are triggered once you tap or select a specific area.
• Accidental Selection and Descriptions: You noted that accidentally selecting a square gave you a description – that's precisely how it's designed to work! When you explicitly touch and pause, or "select" an area, the app then provides a verbal description of what's there.
• Quiet Periods and "Reset": Quiet periods could occur if you move your finger quickly outside a selected area, or if the app is momentarily processing. The "Reset" sound you heard is an audio cue indicating that the exploration focus has been cleared, perhaps because a gesture was completed or canceled, or a timeout occurred. This is indeed to let you know a state change has happened.
• Drill-down Limits: You accurately observed that there are limits to how far you can drill down. The system processes the image into progressive levels of detail, and once the finest available detail is reached, it will stop. This is by design, as there's a computational limit to how finely we can analyze an image.
• Exiting Exploration: It sounds like you might have accidentally triggered one of the multi-finger exit gestures. For example, a three-finger swipe left is intended to close the exploration mode. We'll work on making these gestures more intuitive and providing clearer audio feedback when they occur.
Your Suggestions:
• Loading Indicators: This is excellent feedback! You're absolutely right that the app should provide more audio cues when it's actively processing something. We are actively working on implementing these kinds of "wait" or "processing" sounds to prevent users from feeling like the app has gone quiet or is unresponsive.
• Camera Access on Start: The app requests camera access upon startup to ensure all core functionalities are immediately available. We understand how this can be perceived as annoying when not directly using the camera, however this is an IOS restriction. We are currently researching ways around this.
• Comparison to PiccyBot: You've made a great comparison that helps highlight the different approaches. PiccyBot excels at providing comprehensive, upfront descriptions. Our app aims for a different experience: interactive spatial discovery. Instead of getting a detailed list immediately, our goal is to empower you to actively explore a scene, decide what you want to focus on, and then zoom into that specific object or area for detail. It's about letting you drive the exploration rather than receiving a pre-defined description.
• Comparison to PiccyBot: You've made a great comparison that helps highlight the different approaches. PiccyBot excels at providing comprehensive, upfront descriptions. Our app aims for a different experience: interactive spatial discovery. Instead of getting a detailed list immediately, our goal is to empower you to actively explore a scene, decide what you want to focus on, and then zoom into that specific object or area for detail. It's about letting you drive the exploration rather than receiving a pre-defined description.
You're right that there's a learning curve with these new interaction models, but we believe the ability to spatially "feel" and interact with visual information offers unique advantages that complement traditional screen reader functions. Your feedback helps us immensely in making this powerful tool more user-friendly.
mr grieves
In response to your first post, I'm glad to hear that the "touch grid" concept resonated with you as a potentially useful and genius idea!
Let me address your points, as there are indeed some critical design decisions and functionalities that might not be immediately obvious.
On the Built-in Screen Reader and VoiceOver Interaction:
You've pinpointed a key area: the interplay between our built-in screen reader and external accessibility tools like VoiceOver. Our app is designed to offer a unique, immersive "spatial computing" experience, particularly in the photo grid and room exploration modes.
• Why VoiceOver Needs to Be Off for Explore Modes: The reason we recommend turning off VoiceOver for these specific "explore" modes is fundamental to their design. VoiceOver, while powerful, is designed to linearize and announce elements one by one. Our exploration modes, however, are built around direct, multi-finger gestures and spatial interaction.
• For example, dragging your finger across the photo grid isn't just about moving between "buttons"; it's about continuously sampling a visual space, hearing sounds that change in pitch and pan as you move, and triggering feedback that varies based on the object's texture or proximity.
• If VoiceOver is active, it intercepts these gestures, interprets them as standard navigation commands, and prevents the app from receiving the raw touch input needed for the spatial feedback. It tries to describe the UI elements, whereas we want you to "feel" the image content itself.
• Split Tap vs. Spatial Gestures: You mentioned that a "split tap" might be easier. While split taps are great for standard UI elements, they wouldn't allow for the continuous, dynamic sampling of space that makes the grid unique. The multi-finger gestures (like double-tap to zoom, triple-tap to navigate, or three-finger swipe to exit) are designed to provide a rich vocabulary of interaction within that spatial context, allowing you to quickly delve deeper or navigate away using physical movements. It's a different paradigm than sequential navigation.
• "Daniel" and Voice Customization: I understand your "unnatural aversion to Daniel"! You're right that the voice settings page can be a bit overwhelming as it offers many advanced options. We are working on simplifying this. The app should generally respect your preferred voice settings. If it's reverting to Daniel or your Spoken Content voice randomly, that sounds like a bug we need to investigate, as it should consistently use your selected app voice.
Getting Out of the Grid and Unexpected Navigation:
• Exit Gestures: You're absolutely right, there are specific gestures to exit the exploration modes without needing to re-enable VoiceOver. For Room Exploration, a three-finger swipe left is designed to take you out. For Photo Exploration, the gestures are slightly different depending on the context, but the general principle is multi-finger swipes for broader actions. We recognize that these need clearer verbal instruction and practice. The "Reset" cue you heard might be related to gestures that clear the current focus within the grid.
• Accidental Home/App Closure: This sounds like a system-level gesture on your iPhone might have been triggered by accident while attempting app-specific gestures. We aim to keep our gestures distinct to avoid such conflicts but even native apps can be closed with certain jestures. Just insure to stay away from the very top and bottom of your screen as swiping down or up from those points is an on device command.
Terms & Conditions and Location Services:
• Checkbox Process: Thank you for this candid feedback. The "checkbox fatigue" and dimmed button scenario is a known accessibility challenge. We're actively looking into ways to make this process smoother and more intuitive, perhaps by providing better focus indication or summarization but it is important for all that information to be read and understood for legal reasons.
• Location Services: The Safari blocking message and the "ask" setting suggest a browser-level permission issue. Sometimes, even if set to "ask," certain browser privacy settings can be very strict or require a manual override for PWAs. This is a common hurdle with web-based apps accessing device features, and we're working on clearer in-app guidance for these situations.
Social Media and "Overwhelming" Features:
• Social Media as Optional: You hit the nail on the head: the social media aspects are completely optional. We included them because a significant portion of our early user base expressed a strong desire for a truly accessible platform to share visual experiences and connect with others. We recognize that not everyone wants this, and our goal is to empower choice. You can use the exploration tools purely for personal use and completely ignore the social features if you wish. We apologize if its initial prominence made you feel compelled to use it. The unlabelled buttons are a concern, and we'll fix those promptly.
• Why So Many Features? Our vision is to create a comprehensive assistive companion. Different blind and low-vision individuals have diverse needs. Some want to explore their physical environment, some need help reading documents, others want to browse the web safely, and yes, some want to connect socially. Instead of building many separate apps, we're trying to create a unified platform where these tools are available. You don't have to use them all.
• "Browser within a Browser" and Mouse/Trackpad: Your confusion here is understandable. The built-in web browser isn't just a regular browser. It's an accessible web browser designed specifically to tackle the visual complexity of the internet. It uses AI to interpret and summarize page layouts, extracts key information, and allows for tactile exploration of web content in a way standard browsers (even with screen readers) often cannot.
• Mouse/Trackpad for Explore Modes: For desktop users, the mouse or trackpad can offer a highly intuitive way to engage with the spatial exploration modes (photo grid, room exploration, and accessible web browsing). Just as touch gestures simulate "feeling" a space on mobile, a mouse or trackpad allows precise, continuous movement across a digital representation of that space, triggering audio feedback as you "hover" over or click on objects. It offers a different, yet equally rich, interaction model for spatial discovery than keyboard navigation. While keyboard navigation (arrow keys, enter, esc) is a valid suggestion for desktop, the mouse/trackpad provides a more direct analog to the touch experience, particularly for exploring visual layouts, which is a core tenet of the app.
Your feedback is precisely what we need to make this app better. It highlights areas where our design intent isn't translating clearly into user experience, and we'll be making improvements based on your valuable input. Thank you again for being a part of this journey!
Profiles not saving
hi Stephen,
First, I absolutely love how you have organized the settings page. Kudos to you. I wanted to tell you about a small issue I am having. When I go to the profile setting, and try to adjust my name, add a photo, and a little bio, I can do all of this of course, but when I go to save, it errors out. Every time. I will hear the built-in screen reader say saving, or now saving, something like that, then half a second later it says, save failed or failed to save.
All of the other settings seem to work fine. I was even able to set up Alex as my voice, purely through this app, without going to my spoken content settings.
One more issue, slightly off topic, is your built-in screen reader. I see now, when we first load the app, there's a little message about the screen reader, and a button to enable it. However, the screen reader seems to have some lag, or perhaps Certain pop-ups are preventing it from working properly. For example, like Mr. Grieves stated above, I constantly get the allow camera permissions pop up. This happens every time, even after granting permission.
Just some things I wanted you to be aware of. Overall, I am still digging this app.😊
@ Brian
Thanks for letting me know. I’ll have this fixed and hopefully about 10 minutes here.
@Brian
Your profile saving issue should be fixed now! I'm so glad you're now able to set up Alex as your voice - not gonna lie, that was a tough one to implement!
As for camera permissions, that's an iOS-specific restriction I'm looking into. I'm working on ways to make the experience less intrusive - like maybe only requesting camera access when you enter Live AI mode or Conversation mode, rather than upfront. That way it's a little less disruptive, at least for now.
The screen reader is a work in progress, but right now it should work well in explore mode. This one is proving to be trickier than adding that Alex voice, lol.
@brian
This is a prime example why I decided to go with a web app. :). I can push updates to you right away. It could have taken me 3 weeks or long if it was a native app.
Re: Profiles
Profile saved successfully. Thanks Stephen!
Voice commands.
Is there a way to know the voice commands? Also I don't think my microphone settings were on? I'm using firefox and the mic settings are saved as on there.
@brad
Hey Brad, I was just thinking about you lol.
When I talk about voice commands, do you mean for the web browser feature? If so, here’s how it functions:
Our web browser is designed to be fully accessible. You can use your voice to search the web, ask questions about what's on a page, and even navigate through content. When you ask it to open a website, it first tries to load an accessible, AI-summarized version of the content. This means it reads out the main text, key headings, and helps you explore images with spatial audio, all optimized for accessibility.
We've made some big improvements recently. Now, if you want to go directly to a specific website, you absolutely can! However, you might notice that some popular sites like Reddit or YouTube will automatically open in a new tab in your device's default browser instead of appearing directly within our app.
This isn't an issue with our system, but rather a security measure used by those websites. They send out a signal, often called an "X-Frame-Options" header or a "Content Security Policy," which basically tells other applications, "Don't put me inside an iframe!" An iframe is like a window within our app that displays another website, and these sites block that for security and privacy reasons.
We want you to have the best experience, so when we detect these sites, we now intelligently offer to open them in a new tab so you can still access them without a hitch however photo explore mode does not work in these situations as the site can't open in app. I'm actively looking into more seamless solutions for this iframe situation, and I'll keep you updated if and when I find them!
@Brian!
Perfect! :). Not a problem at all.
Wooph!
It looks like I can officially pull the room explore mode out of beta!!!!! Maybe I should get some sleep? Apparently we humans need that. I feel like I haven’t slept in days lol.
omg
sorry guys fixing the room explore feature...lol.
I had an idea!
Hang tight guys, I may do an over hall for room explore mode...just you wait! If this works it is going to be epic. If this works like I'm hoping I'm most certainly adding this to the photo exploration feature. Can any of you guess what I'm thinking?
Epic!
That it already is! I'm loving what I'm seeing here and I feel very happy that I'm being part of this genius thing that is happening.
@Stephen I have no idea what is coming from your side, but I quite trust it's going to be intuetive and useful. I do have a request though. Similar to the maps feature, it will be very useful to have a feature within the app where someone can create a layout of a place (say a venue, restaurant and share the link with a blind person) such that we can explore the place tactiley before we physically go there. Also, it'll be interesting to have a space where we can explore iconic stuff like, say, the statue of liberty, and also layouts of things like say a cricket field, or even things like the human body. I know I am going out of the picture exploration paradigm in the strict sense, but I am beginning to see this as the first baby step towards an AI-powered tactile revolution...
@Gokul
Great minds think alike! I'm already working on features like that on the backend!
Overhaul complete!
Just did a complete overhaul of the room exploration feature. Check it out. It might surprise you 😊. For the absolute best results use headphones. It does work without headphones as well, so please don’t think you have to use them. ❤️.
Also forgot to mention
You should be able to access most features efficiently without having your own device screen reader on. Yes there is still some tweaking to do in the back end but most things you should be able to access without your screen reader.
Very good app, but the live artificial intelligence mode doesn’t
Hello, congratulations on the app, I see great potential in it. Something is happening here and I’m not sure if I’m doing something wrong or if the app has a problem. Here is what happens: when I tap to start the AI assistance camera and try to say something to it, I don’t get any response. The app only starts making some noises and saying the word “Listening,” even to the point of freezing the phone.
@ Guilherme
Hey there thanks for letting me know :). Right now it won’t function properly because I’m switching out how the code works on the back end. I’m going to revamp how that works. I was going to finish that tonight but I got busy working on the room explorer feature for users to have a better experience. Alongside that I was working on the screen reader. I’ll fix that in a bit just going to get some wrest :). I’ll post an update here.
Room explorer
Very interesting how the audio now pans within the room explorer. Again intuetive. I remember trying the vOICe app earlier; this takes the same approach and implements in a more accessible way. I was wondering if there's a way by which one could capture multiple pictures of a room from different angles and combine to get a overall view of the whole room?
A few more thoughts (please ignore if you've had enough of me)
Thanks so much for taking the time both to read through my rather long feedback and to provide such a detailed response.
I think maybe I didn't quite explain a couple of the problems I experienced yesterday very well.
At one point, as I was swiping around I was only hearing the sounds and not the spoken descriptions. So I couldn't tell what anything was. Normally swiping around will tell me the descriptions of things, but for some reason it wasn't doing that. I've only had that once so maybe it was just a quirk.
I think I understand your reasoning for using your own screen reader. I personally think as it stands right now I would still enjoy the app more if I could turn off the built-in one and just use VoiceOver. Touch to explore will work fine with a grid, and gestures like scrub can take me back. Maybe down the road, this choice will feel vindicated but right now it just makes the app very unintuitive and I need to keep switching between two things. I also suspect you will never get away from the need to use VoiceOver sometimes - whether it be to enable the camera, or to select a local file to view or whatever. And it's always going to be jarring. Maybe a sound effect that tells me when to turn VoiceOver on might help a little, or maybe I just end up getting used to it. I am just dabbling after all, so maybe a serious user of the app won't be bothered by these things. But as a new user it feels like an obstacle I am being asked to overcome.
Bear in mind that VoiceOver generally works great for me, I like it and have no particular need for anything else. A tool that understands this and works in tandem with it rather than trying to reinvent the wheel will always be a better experience for me personally.
Being an old cynic, I can't help but feel that this approach was taken by someone who doesn't use a screen reader themselves.
Similarly on a computer, there is no reason for a blind person to own a mouse unless they share a computer with a sighted person. Maybe a trackpad is different. I have never used mine and it seems a bit of an alien artifact to me. It's often not even within reach. The keyboard is a screen reader user's weapon of choice.
I noticed you've added a new onboarding thing which is good. However, at the end it got to a screen about pricing. However the built-in screen reader just kept repeating the heading over and over again. I couldn't read the text behind it with VoiceOver because the other voice was so loud. I eventually found the button to get past it but I have no idea what it was trying to tell me. Probably that the app is free right now but won't remain that way forever.
I was also pleased to hear the absence of Daniel. I The other voice I was hearing is now the permanent screen reader voice as far as I can tell. It's not my spoken content voice, though, it's some American female voice, possibly Samantha but not sure. Anyway it is less grating.
I did find Daniel in one place. I thought I was doing a live scan of my surroundings but I just got these uncomfortable, crackling high pitch beeps and Daniel repeatedly saying "Listening" over and over.
I tried the new room explore feature. I did find the initial sound effects a bit ear-splittingly high pitched and was glad when I completed the tutorial and they stopped. I found when I double tapped on something to get more info I would get an error - something like how it couldn't download an external object or something.
I suspect maybe I am looking at this the wrong way. I usually approach this sort of thing specifically as a utility - how can it help me do something more efficiently than before. I had a similar problem with the Envision Ally app. I wanted the utility but just got wise cracks and jokes as it spouted wrong information at me.
I think maybe my brain isn't quite wired up to feel that exploring an image by touch just for the fun of it is actually something I want to do. The detail of the app isn't really enough that I feel connected to the image in any more of a way than I would with other tools, and being AI it's always going to be a little loose with the truth. (For example, I don't have steps in front of me, nor a telephone, nor a sub-woofer).
I like the suggestion on here about using this to explore an area and try to get some sort of spatial awareness of an environment. Whether a limited number of squares would be enough, I don't know, but it definitely feels like a much stronger case for this tech. And particularly if this could be generated without needing AI so I could rely on its accuracy.
Anyway please remember that I am just one person and the fact that this app may not be for me is absolutely fine and does nothing to diminish what you have done. Please feel free to disregard my ramblings if they come across negatively.
AI 1 - Mr Grieves 0
Oh apparently I do have a sub-woofer in front of me. How long that's been there I have no idea.
OK, AI - you win this time!
@ mr grieves
Hey so I will respond more thoroughly when I’m more awake but I wanted to touch on something. I am completely blind and yes I do use a screen reader. I’ll respond more thoroughly in a bit. Let’s not make assumptions :).
@Gokul and @Stephen
The thing Gokul is talking about is called, 'Panoramics', and if you can implement this properly, Stephen, it will be a true game changer. I mean that in the literal sense.
@Brian and @Gokul
Oh I'm totally with you both on that! I'm actually working toward it, but right now I'm still laying the foundation and building the house before I can think about decorating, you know? Don't want to put the cart before the horse 😄
The reality is, I'm also navigating some financial constraints here. Building this isn't cheap, and I'd honestly love nothing more than to quit my day job and work on this full-time so we could move faster. That's the dream! But I've got bills to pay and responsibilities - my two bunnies, my guide dog, and splitting finances with my spouse. So features like panoramics might be a little ways off for now.
But it's definitely on the roadmap! Just need to get the essentials rock-solid first, then we can start adding those really cool features.
Re: Assump[tions
Apologies if I caused offence. You are entirely right, I made a stupid assumption and should not have done that. If someone had done the same to mean I would have been a little insulted.
Considering that this seemed to come from nowhere and is being developed at a scary speed, you are obviously doing a great job and lots of people are enjoying it. I can tell you are putting your heart and soul into it, and I thoroughly commend you for that.
I think maybe it would have been better for everyone if I hadn't come on here and started spouted nonsense. It is a bit of a personality trait. Give me something that is almost perfect and I will probably complain about the almost and dismiss the perfect bit. Just be glad you don't have to live with me!
Anyway I will do my best to bite my tongue.
Also, as a blind developer myself I am in awe of what you are doing. Admittedly I am a bit of a dinosaur but since going blind, I can only dream of doing a fraction of what you are achieving with this. So please keep it up and don't let me dishearten you.
@mr grieves
Don't talk about your self like that! I welcome anyone and everyone's opinions and feedback. Why? because every opinion, every piece of feedback may spark an idea! You are more than welcome to go to town if you feel like it. That being said, This app may not be for everyone either and that is absolutely ok but that doesn't mean there shouldn't be some dialog. I will always respond to you as long as it's respectful and constructive.
A question for you guys.
Ok, so I’m torn. Should I keep building out the Look & Tell and Live AI features, or should I drop them? There are already a lot of apps offering similar tools, so I’m not sure if it makes sense to invest more into that lane.
If I let those go, I can put more time into the things that really set this project apart like photo exploration, 3D audio panoramic maps, the ability to share those maps, and everything else on that side of the experience.
So I’m asking honestly: would you actually use Look & Tell or the Live AI features? I can spend the time and money on them, but if most of you don’t see yourself using it, then it might not be worth building. Totally up to you.
it's interesting.
Usually I'd not care about the look around feature but whatever picture thing you're using, it's really good, do wish it would read text though, but like you say ther's other apps, but I think you're using a different engin?
@brad
That is interesting. If you like it I'll fix it and polish it up. I was going to scrap it but if you like it and you feal you are gonna use it I'll keep it. It is a little torn apart right now but I'll try to get that fixed later today then. :).
Photo exploration for the win!
Personally, I like the room exploration feature, but I absolutely love the photo exploration feature. If I had my way, I would ask that you invest your time and energy into that.
However, I realize I'm not the only user here, so go with what majority want, I guess. 😅
@ Brian
What would you like to see in the photo exploration feature?
My two cents
Well, if the Live AI feature works continuously, I mean without the need to keep asking it questions for more info, I'm all for it. Gemini and ChatGPT apps can't achieve that.
Only if you want too.
I’m not gonna give you guys a big spiel and ask for any donations. We get all that way too much. If you want to help the project out and it’s within your budget I have set up just a generic donation page. Don’t feel like you have too though. Eventually, once we start making this perfect, I will however be giving away some free legacy member plans and if you want and you’re comfortable, I can put you in the credits of the app as a supporter when it’s fully completed. I don’t want anyone to feel any sort of obligation and I’m definitely not gonna treat you with corporate speech… I work in a corporate job and I hate it lol. I’ll still be putting work into this regardless. You all have been amazing. I appreciate each and every single one of you.
https://www.paypal.com/ncp/payment/8RUHTTVFBJDCQ
Discord server
So I have a discord server up and running so I can chat with you guys in real time instead of constantly having to go here and scroll through all of these posts lol. Here’s the discord server link.
https://discord.gg/B22qDN8C2
Re: photo exploration
Honestly, as much detail as can be provided. And I know this is going to be tricky, but actual details on facial features. I don't need the application to tell me whom the person is, necessarily, but details on their features would be amazing.
@Brian
I love tricky! Let me see how much detail I can get it to describe too you. Let me just finish with the start camera and live ai features as I broke it yesterday and I'll start toying around with that.
hmm
Stephen Stephen Stephen! Nice job!
@Dominique
haha thanks so much :).
Number of users report!
I love you all so much! We are officially at 122 users!
Technical Update: What was wrong
So, the app was experiencing two major iOS-related issues:
1.
Camera Viewer on Homepage: The original homepage displayed a live camera feed even when you weren't using AI features. iOS security requires apps to request camera permission every time the camera is accessed. This meant every time you opened the app, iOS would ask for camera permission - even if you just wanted to check settings or navigate to a different feature.
2.
Voice AI Microphone Conflicts: When trying to use Voice AI to ask questions, iOS was getting confused because the app was already holding onto the camera permission. This created a conflict when trying to request microphone access simultaneously, causing the voice recognition to stutter, restart unexpectedly, or fail to activate properly.
What We Changed:
1.
Removed Camera from Homepage: We completely redesigned the homepage so the camera feed is NOT active when you first open the app. Instead, you now see a clean, organized menu of features. The camera only activates when you tap "Start AI Vision" or enter a feature that actually needs it (Room Explorer, Photo Explorer, etc.).
2.
Separated Camera and Microphone Access: We restructured how the app requests permissions. Now when you want to use Voice AI, you tap the "Ask a Question" button, and the app specifically requests microphone access at that moment - without the camera interfering.
Why iOS Permissions Work This Way:
iOS has strict privacy and security policies. Every time an app wants to access your camera or microphone, iOS requires explicit user permission. This is by design to protect your privacy - Apple doesn't allow apps to bypass this.
The key is timing: We can't prevent the permission prompts entirely, but we can control when they appear. By only activating the camera/microphone when genuinely needed, you'll see far fewer prompts.
What This Means for You:
• Homepage: Opens instantly without camera permission prompts. You can browse features, adjust settings, and navigate freely.
• Voice AI: Works reliably when you tap "Ask a Question" - you'll see a microphone permission prompt (only once when navigating to that page, and voice recognition will function smoothly without conflicts.
• PWA Support: iOS users can now add the app to their home screen as a Progressive Web App without constant interruptions.
• Permission Prompts: You'll still see them when using camera/microphone features - but only when actually needed.
Bottom Line: We didn't "fix" iOS permissions (we can't change Apple's security model), but we optimized when and how the app requests them to create a smoother, less intrusive experience.
I think I already asked this but.
I think an NVDA addon would be nice, so if i'm on reddit, I can just press a button and woosh, the text of a meme is read out to me, and I get to control how its read. For example, do i want to know the picture, the text , how much info, strip away, the post has 200 upvotes and os on, or do I want all that?
In other words; i control the AI and how it responds to me. Oo,, voice control might be even nicer, just say something like, oy! grab the pic from this page and tell me the text with the least amount of detail. Ok maybe not Oy! The mager issue i see with that is that reddit might have a page with multiple pictures, like if I'm on r/shitamericanssay and find a post I like so press enter on it, and it grabs a picture there might be multiple for the joke or just in general on the page, do you see?
Also, this is probably not what your'e going for but i fyou've ever used something like redditforblind, I'd love something like that but without signing in, so a way to brows reddit accessibly but without an account?
I had accounts on and off but would just like to brows.
If this doens't make sense; let me know cause I did just write it down, with not much structure.
@Brad
Oh you did that is my bad. With so much happening over here I thought I already replied to you. Forgive me lol :). As for the NVDA add on, I am looking into how I can make that possible for you. I'll let you know what I find out and how or if I can get that all set up. As for accessing websites thru lets say for example the accessible web brouzer thing I set up, The biggest issue is that whole iframe situation I was speaking about earlier. It is a complicated one and I'm trying to see how I can work around that legally. Hang tight buddy :).
sure.
If you can't it's no issue, if there are more important things to work on,, work on them. the web browser is an interesting idea but I honestly don't see many people using it as we already have web browsers that work.
@Brad
The point originally was to grab lets say google images and explore them but do to security reasons it won't let me. I kept it up because I've actually found it handy to get information much more quickly than using google for example. It skips all those adds and irrelevant results. I also get summaries of pages before I even go to them to make sure it is the one I want. So I left it up. I do see on the back end a handful of people using it so it doesn't hurt.
fair enough.
Have fun making more bits!
Did I just put out something?
Check your apps! This should help in indoor spaces.
more problems with the app
Hello, I’m having some more problems with the app. The AI live mode is not returning the captions in Portuguese, which is my language — it only returns them in English. Another issue is that when I select to explore a photo, it only tells me one object that is in the picture, and when I try to zoom in on this object, it says it found the number of parts, but it stays silent when I slide my finger across the screen to explore those parts.