Introducing Envision AI, a new iOS app to help the blind identify text, objects, and what's around them
My name is Karthik and I have been working with visually impaired users in the Netherlands for the past year to understand ways of enabling independence. In that process, I did a deep dive into artificial intelligence and how it can be a helpful tool for processing and conveying visual information. I found that most of what was in the market were simple object recognition apps that were not very practical to use.
Hence, we built an app called Envision AI, that takes a context-based approach to this problem. Our app can be currently used to:
- Recognise and read texts in their native dialect.
- Explain scenes that camera captures in detail.
- Train and recognise faces of your friends and family.
- Train and recognise your personal objects like wallet, keys or glasses.
- Do context-based recognition, that is, taking a picture of a clock will tell you the time, taking a picture of a window will tell you the weather outside, etc.
The app is still a beta, but available worldwide. We are constantly trying to understand what features are the most important and valuable and how we can continually improve them. So we would really love it if the community here can give the app a try and provide us with active feedback on what they think of it and how it can be improved.
P.S. We understand that we will immediately be compared to Seeing AI and other apps out there. What we can assure you is we are really committed to listening to the community here and work with them to build an app that really really helps them. We are working on this full-time and will be here every day to talk to you.
APP LINK: http://goo.gl/fptaYQ
Thanks and regards,
Thank you for that great feedback. After reading your comment I tried to train some banknotes here using the object recognition on our app to see if that can be used as a self-trained bank note recogniser. Unfortunately, it is not as reliable. It was able to identify it as a banknote every time but was confusing between the 10 and the 20 euro frequently. Hence, I feel a right way to approach it would be to include a dedicated bank note trainer which allows you specifically train banknotes. We will definitely work on this as this seems a much better way of approaching currency recognition feature than building a general classifier for all currencies in the world. We will do some experiments later this week on it and update you.
On a side note, we had a user here who was using the OCR feature on our app to detect their banknote. Do try if that works for you as well for now.
Hi Again Karthik,
Iwas just wondering about the clock and window examples you gave for ai processing. I tested one of these, the clock idea, by taking a picture of my digital watch. I use a talking watch from the UK RNIB which has a digital watchface. Unfortunately, although the voice part of the watch is very accurate which if fine for me, something has gone wrong with the digital watchface. If sighted people see it they always say 'your watch says the wrong time' and my response is 'I know but it oesn't matter because I cannot see it, I use the voice'! However, when I took the picture, theapp gave me the correct time. Does this mean the Envision app doesn't actually use AI to work out what the timeis but works out that the picture is a watch and then goes to the iPhone time and readsthat? In the same way, when you take a icture of a window, do you get the weather from a weather app?
If this is the case, there might be a problem with the really inaccurate weather forecasts we get in the UK and what the weather really is.
For example, this morning I was going out so listened to the weather forecast on the BBC and checked the BBC weather app out which said it would be cloudy. However, when I wentout it was raining! Alas, I didn't use Envision but what would have happened, would I have got rain or cloudy?
As well as being able to read things like food items and things like that through The barcode on the item, I think it would be a good idea to add such a feature that could read the barcode on a prescription label on medicine to tell you what the item is and the instructions on how you might use it.
Seems like the app only supports English text. It would be awesome if it either changes the language to wherever I am located or recognizes the different languages when it has detected text.
Also I am wondering where the data is processed as I'd not really like banking information, PINs etc to wind up on the Internet. Maybe the text detection part could remain on the device instead of being uploaded?
Yes, you are correct. Envision doesn't read the time by reading the dials but has instead been programmed to speak out the local time whenever a picture of clock or watch is taken. This allows it to be much faster. It's the same with weather. Whenever a picture of window is taken it fetches the local weather.
We will definitely look to make it access all kind of information associated with the barcode and present it in a useful way.
We are looking to build a feature that automatically detects the language of the text and speaks it in that dialect. We are already doing it for Dutch in the Netherlands. We will expand it to more languages in our next build that should be out in a couple of weeks. So please update it then and it should work as you say. Can you tell me what languages you are trying to read so that I can ensure that it is included?
For now, the text detection is happening in the cloud. Even though all data is encrypted and anonymised, we do understand concerns regarding personal information. Hence, in our next build, we are ensuring that the real-time text recognition happens on the phone and the cloud text recognition is only used for long form text or when user explicitly requests it.
Hi again Karthik,
I just wanted to ask when you plan to update the Invision AI app? During the life of this thread you promise that updates will occur on a regular basis but I have seen nothing since the app was launched and this thread began some three to four weeks ago.
I understand the difficulties that designers have with the complexities of programming, testing and getting Apple to accept app updates but I am at least waiting in anticipation for updates to occur so that I can test the results and any new features that come along.
This app shows so much potential but if updates dry-up or even fail to happen at all then users will get fed-up with waiting, find an alternative or forget you exist at all.
Here's to an update soon,
Thanks, for that check in! My team and I are working hard on pushing out that next update with improvements we promised. Some aspects of the code took a little longer to crack than anticipated. But rest assured, you will see an updated version coming out in a weeks time.
Sorry to be so hard on you - I didn't really mean it but we have seen so many great apps begin a process of design and improvement only to fizzle out before the real potential has been reached. At least Invision AI is back at the top of the forum now and people might again begin adding comments. I look forward to the update with anticipation.
No problem, Steve. We understand that and are glad that you are holding us to accountable to our promises. It only drives us to deliver better and faster.
As for trash cans and such, we should be able to find these with canes or dogs. I don't feel it necessary for this app to be able to do so.
Great idea about identification of, and detail of, an approaching vehicle, though. Identifying moving objects could be vere useful.
Apologies again. I would like to inform you that we pushed a new update to AppStore last night and are now just waiting for Apple to approve it. We messed up in our planning this time as we underestimated the amount of time it would take to have this update ready. But now we know better so we will focus more on smaller more frequent updates instead of bigger updates like this.
That said, really excited to show you the new version as soon as Apple approves it.
I've still to mess with this app, but it sounds a lot like Seeing AI, which is a great thing btw! One great use for apps of this nature is actually to scan monitors and tv screens for updating content when playing video games. Tons of blind gamers do this, and it works really well. a great thing would be if you added a specific gaming channel that could be used to facilitate things like, reading menus, determining which text was highlighted on screen etc. These apps usually just scan the entire screen at once, which is great, but when you're trying to figure out the status of a button or checkbox, or figure out which option has focus, it can be rather difficult. I'm not sure if something like this is even possible, but it'd be amazing if you tackled this problem. It would make us mainstream blind gamers very happy. Keep up the great work!
Hi Marthik! This sounds like an outstanding app!!! But I have a curiosity question.
What are its differences with Microsoft's app called Seeing AI?
Our new update is finally out. We have worked on majorly improving our text reading experience based on the feedback we received here. We have now also incorporated a live text and a document text reading option. There have been several major UI improvements as well. Please do give it a try and share your feedback like always.
Hey Karthik. This latest update of Envision AI shows a whole lot of promise. I like how the live mode feature is layed out with beeps that, as i understand it, guides you toward the text and then reads it instantly. Unfortunately when i tried it on both my microwave oven and the stove, it had very big dificulties distinguishing between numbers and other characters, this could, however be user error and not the application that is at fault. Have you had success reading LCD displays or the like using the live mode of the app?
Anyways keep up the great work!
Hi again Karthik,
Thank you for the notices concerning the new update. I've just downloaded the new version and I must say I'm a little confused.
So, here are my observations after a couple of minutes of using the app:
1. I am confused when I investigate the screen as the actions appear to be rather random over the bottom half of the screen.
2. I clicked the new button for live mode (said off before I double tapped) and I got a ticking sound. My son says that the camera is showing the camera in live mode but I do not get anything spoken aloud although some text is at the bottom of the screen. However, it is gibberish.
3. I cannot seem to turn off the clicking sound in live mode so I've closed the app because it is annoying.
4. I cannot seem to find settings anymore. My son said that there is a settings button but the face learning button seems to be on top of it as I only get that spoken and not settings.
5. I have not tried the document reading action yet as I cannot stand the ticking noise.
So, could you post some instructions about what is in the new update and describe what we have to do?
As i understand it, and please correct me if i'm wrong here Karthik, the ticking sound is a guide to where you actually have the text, at least in live mode, which i personally find of very much help since i am bad at taking photos in general so this guide is very useful for me.
I tried reading the display of my Zoom H2 digital recorder which i haven't done anything with yet because of the lack of accessibility in the device. Now i can get at the menus and even make changes in settings because of the app. Great work!
I also made that assumption but I cannot find how to stop live mode. I've tried double tapping on the live mode button again but that doesn't stop it and even if I go into another mode like the learning mode for faces, the app still clicks.
Also, what about the voice when you actually find something to read? As I said, the app did not speak at all but I found some text (I don't know from where it came, maybe from a digital pael whilst I was moving the camera around my living room) but the voice did not read it until I placed my finger in the text area and moved it down.
Hi there, so I tried to use the latest update and the live Text mode isn’t working. I have an iPhone 7 with I believe it’s 11 one update. What happens is I pointed toward the text and it doesn’t do anything but beeps at me. I do want to say thank you for developing an app like this, it’s a really neat idea. Thank you
Thank you for your feedback about the update. We're working on improving the Live Text mode in order to read LCD screens better. That said it should be able to read iPad or computer screens without much effort.
Thank you for trying out the latest update.
1. With regards to your first point, could you please elaborate a bit more on what you find random? Is the layout not right? Can you please let me know on what device you're trying this on so I could rectify it?
2. This seems to be a bug, unfortunately. And I will have my team go over this immediately and push a fix. I know how annoying the beep can be. Rest assured, we're on it as I type this.
3. The settings button is still there in the app and is the first button from the left. While using VoiceOver the it reads "Train Faces and Objects Button". Do let me know if this answers your question. If it's possible please do send me a screenshot of the app to firstname.lastname@example.org, i'll take a look right away.
4. Please do give document reading a shot and let me know your thoughts.
Thank you so much for trying out the new update. With Live Text, positioning the text correctly can be tricky at first. And we will improve this in the next release. For the time being, you can switch to the document/long text mode incase you're not able to read the text initially with live text.
Please do let know your thoughts on the update as you continue to use it.
We've released another update that fixes the annoying beep issue and also the ambiguity around whether live text mode is turned on or off. The update is in app review right now. Though we've requested an expedited review since it's Thanksgiving weekend in the US the update might take a couple of days to reach you.
Please try out this release and let me know your thoughts. Also, we noticed that at the moment live text works best in Potrait rather than Landscape and this will be resolved in the next release next week.
I'll be around here to respond to any questions you might have.
Just a few things that I have observed in the recent update.
1. There is an element that is not labeled just past the "Read Document" button. Just swipe to the right and you should hear a click but nothing is spoken.
2. If i trained the EnvisionAi with an object. I can't seem to figure out how to remove it from the Library. So go to the "Train Faces and Objects" -> "Open Library". I am presented of the items that I've trained. How does one go about to remove something from here?
I think i may know what that unlabeled element is, that's where the text that Envision catches shows up.
The second question is something i also would want to know about only i can only add five images for training and then there's no button to continue whatever i am supposed to do. I only get a restart button that doesn't seem to do anything and a back button. I must be missing something obvious only i don't know what.
Regarding the beeping sound in live text mode, isn't that supposed to be a guide to help us zoom in on where the text actually is? I don't understand where the bug is.
Secondly a positive note, i have been able to use the live text mode on the display of a Zoom H2N handhelld digital recorder, and the results i got from using the app on that display is the best i've had ever. It blows Talking goggles out of the water. but if the beeping sound isn't meant to be a guide to where the text on a display is, then at least i would like there to be an option of a guide that beeps when it thinks it has encountered text and the nearer one comes to where it's getting readable the louder or maybe faster the beep would become. I have trouble directing the camera on the phone to the right spot so that feature would be a godsend for me. It could be a toggle in the preferences so that those who don't want that guide could turn it off.
You're right the unlabeled element is indeed the place where the text shows up. Maybe we should label it for the first time someone swipes on it. With regards to your second question, if you swipe right from the take photo button, you'll find the "Done" button. This button should work once you take 5 or more training images. Please let me know if this works for you or if it's still confusing.
The bug with 0.1.3 is the beeping sound is heard even after the live text button is turned off. The suggestion of the beep sound as a way to confirm the presence of text is definitely interesting and we'll look into it for our next release. When we started working on Live Text mode we knew it would be extremely useful for our users but the complexity of it is such that we'll need a couple of iterations to get it fully right.
We'll keep everyone posted when 0.1.4 is out and in the meantime please let us know any other thoughts you might have current update.
I have been experimenting with this app for a couple of days now and I have to say I am impressed. I look forward to future developments. I would like to congratulate you on making this app available to all of us rather than to a few countries as some have done in the past. Keep up the good work.
Thank yu for confirming the presence of a bug and I look forward to the latest version coming out, perhaps today or tomorrow.
The only thing I thought about the layout is that it does not look like that of a standard IOS app and thus it is a little unfamiliar. I am a stickler for standardisation and therefore expect on the top left, on the top right and the actions such as the face teaching and document reading features to be along the bottom. This will then leave the whole screen for text apart from the taking picture button which could be moved from the middle to just above the row of actions at the bottom, in the middle just above them. Only a personal preference but I'd like to know what others preferences are?
As others are aware now the Seeing AI app is available in the UK. However, I am not a fan of the selection area where you slide your finger up and down the box to select actions. I much prefer the Invision method. However, I still like the standard layout of buttons along the bottom of the screen.
Thanks for giving us a shot. We're definitely working on making it a lot better in the coming weeks. We're also adding as many languages as we can to our text recognition feature. Our aim is to make the app available across as many languages as possible.
Please do keep using the app and letting us know your thoughts. We're always around to listen.
The new update just got approved by Apple and I hope that solves the annoying beep bug. We're still working on live text to iron out some issues that might crop that when being used in landscape so for best results please use it in Portrait for now.
We'll have another release later this week that should iron out all remaining issues with live text and will work consistently good in any setting. In the meantime, please let us know about any issues or suggestions, we're always ears to include that in our future releases.
I contacted the developer and have not received a reply yet. The app is crashing on my part and I need troubleshooting for the issue. The app has not been updated for a while!
This is Karthik from Envision. I am super sorry that you are experiencing issues. We would love to reach out immediately and help troubleshoot it. Could you send out an email to email@example.com and tell us the issue. We will ensure this is worked out on priority.
Overall I'm really enjoying this app. I love that it has a bit of everything, including magnification. I actually find it recognizes scanning text on screens a bit better than Seeing AI, which is really helpful for me. I know you will continue to make the app better, and I appreciate the frequent updates. I look forward to seeing what you have coming next. One request I would like to make if I may: Will you please optimize for Ipads as well? It still works pretty good on them, but you only really get a phone-sized image to work with, which is unfortunate.
Also I'm wondering about some form of image stabilization? It seems like if you have shaky hands even a little bit, scanning isn't as accurate. That's true no matter which OCR app a person works with, but I wonder if image stabilizing might help eleviate that problem. Either way, great app.
Thanks, Remy, for the encouraging words! Yes, there is serious work being done in improving our iPad version. We had our resources spread a little thin with focus on our Android Beta, but now that we have that up and stable, we are back to being all hands on deck with feature improvements. You will see more frequent updates in the coming months.
Thanks also for the feedback on image stabilisation. It is something we will look into and explore if it could be used to clearer images while performing OCR.
This is Karthik from Envision. Thank you for being a community that has been supportive of our efforts and being a solid source of encouragement when we were starting out a little over a year ago. Hence, I am thrilled to bring to pieces of information to you today:
1. We just kickstarted our super summer sale today which means we are slashing prices across all our plans. The monthly and annual prices are now down by 50% and the Lifetime plan is down by 75% (actually 78.5% to be more precise). This is mainly done to reflect some of the new cloud credits we have been offered and also to validate certain aspects of our pricing strategy in the long term.
2. We know that many of you who were early adopters of the app probably already exhausted your 14-day free trial. Hence, if you are curious to see how the app has since evolved and what features have been added/improved, we have provided an extended 14-day free trial to all our existing users. So you can really see our latest version of the app before deciding if you would like to dip into the summer sale or not.
I'll be checking in and replying to comments all day today, so please feel free to ask for any clarifications or provide any feedback that you may have.
Thanks and regards,
Karthik from Envision