In this episode of our podcast, Thomas Domville demonstrates the power of the Virtual Volunteer feature, set to come to the Be My Eyes app in late Q3 2023. Currently in beta testing, this feature, powered by OpenAI's GPT-4 model, has the potential to be a game changer for people with visual impairments. It offers a virtual sighted assistant that can generate context and understanding for images, allowing for a greater degree of independence in everyday tasks.
During the episode, Thomas showcases a variety of real-world use cases for the Virtual Volunteer, including identifying clothing; getting information from food packaging; describing greeting cards, photos from your photo library or places such as Facebook, and weather maps; reading restaurant menus, and more.
We thank the Be My Eyes team for allowing us to record and share this demonstration of the Virtual Volunteer.
I am so excited for this.
Great demonstration and its only increased my excitement. Even if this only provides what you demonstrated it will be life changing but obviously it’s going to be capable of so much more. Another reason I’m so excited for this is its on the cutting edge of technology and having vast amounts of money spent on development which we will benefit from. It won’t just be released and remain static. I can honestly see a time in the middle future where this app is video linked to AI for live assistance with the benefit of geographical and contextual understanding. It’s a whole new world. I’d love to be on the beta program but don’t have any contacts.
I think anything like this…
I think anything like this is very exciting however, as with everything tech based, we need to not rely on it. Batteries fail, apps crash and, let's face it, we sometimes misplace our phones. I'm all for this shiny new tech however, we mustn't get lazy and become reliant on it. We need to be able to function without it.
This is kinda my standard warning for any new tech. Low tech, a cane, the courage to experiment, to accept the help of strangers, is still important.
I will be interested to see if this is in any way ported to the upcoming apple headset. Waggling a phone about always seems so dumb to me.
I do not think I will use it. Here are my reasons.
First of all, all of you guys can go talking and say oh artificial intelligence is so good, but think of the negatives. Hypothetically, speaking, and socially speaking, this is a massive mess up. Artificial intelligence is only new,, let me give you an example. So your new user to Windows, and your device gets a blue screen of death. You contact the artificial intelligence server, And it gives you some rhymes with formation. Your designs, but it just makes the issue worse.. I will wait until it was 2025 when artificial intelligence is better understanding commands.
wait if you want you will be missing out
Wait if you want but you will only be hurting yourself. Is it perfect? No it will never be perfect but is a tool in the toolbox that you should use.
Great job but did not show us how he got a facebook picture to describe! Is a nice tool to have. I will consider it but more likely that I might use it just for pictures. Never use any of them to locate things or.
Sign up for the beta program a while back
I signed up for the beta program a while back, and it still says, virtual volunteer access pending. I realize that the rollout will take some time, however, I cannot wait to try this out.
This may have to wait until the beta period is over, as there may be restrictions on what can be reported publicly. However, I think someone should perform a demonstration in which they visually check all of the descriptions for accuracy and comment on the errors made, presenting the system with a variety of images.
My experience of dialogues with GPT-4 and Bard is that the errors can be subtle (i.e., the output seems plausible but actually contains inaccuracies), and the image recognition application of the technology may well be similar in this regard.
I'm impressed by the technology, but I wonder to what extent it generates apparently plausible but false statements in answering questions about images. Obviously, the intended users can't visually inspect the images and thus aren't well placed to detect such issues. Reducing the error rate will undoubtedly be a priority for developers of these models, and in some respects it's probably still an unsolved machine learning research problem.
Having listened to the podcast, I note that all of the descriptions were verified by a sighted person, so they are good examples of what can be done (under the best of circumstances).
Perhaps a "send this image and description for human verification" button would be a useful feature for use in circumstances in which one doesn't need a live conversation with a volunteer but still wants the information checked.
Re: I don't think I will use it
I'm guessing that you haven't tried any of these AI services yet. Your opinion might be different if you had.
First, one good reason to start using such services is that, while they are new, they are especially looking for feedback on what they got right, what they got wrong, and the user experience. Such feedback always helps systems be better, particularly if given while the systems are in development or beta.
Secondly, these AI services can actually be quite useful when used properly and realizing their limitations. Will they sometimes get things wrong? Yes, of course, just as humans will or just as any web site can give erroneous information. It is still up to we humans to vet those answers and make those decisions.
Third, these systems will only be getting better and more useful as time goes on. Like it or not, major companies are starting to integrate AI services into their workflows and already seeing greater efficiencies. I've used several of these AI tools to write lyrics for songs, poems, first drafts of stories and podcast summaries, etc.
Anyway, my suggestion is, "Try it...You'll like it!" as they say. One thing I particularly like about this capability being integrated into Be My Eyes is that I don't have to bother a human to tell me what's around me. Somehow being more anonymous feels more private.
That was mentioned in this podcast. The current AI state is not perfect and yes it does at times gives you inaccurate information. That is the nature of the AI at this current state. Which is why I’d imagine that Be My Eyes is wanting to wait a bit to let the beta testers to hopefully make using Be My Eyes Virtual Volunteer a bit more accurate. Plus, it will give some time for the OpenAI to mature more and get more accurate as time goes on.
I agree that this is just a tool to help you with your daily issues. Nothing will ever replace knowing how to do things with out technology. This is just to assist you with things but is going to be a powerful tool to use with your knowledge of working things out. One more thing to add is that with in each response you get from the Virtual Volunteer is the option to call an actual sighted volunteer in case you need to have a second opinion. The team of Be My Eyes is doing a great job of making sure that you have that option when you are unsure of the reliable information. There are times as well that the Virtual Volunteer does say that I think, or I can’t get that answer for you please try using the call a volunteer to assist you. Meaning, if you do get a screen of death. You just need to use common sense. The mention of it getting you in to more trouble for say you have a screen of death. As I would just use the Microsoft service in the Be My Eyes to reach the technical desk to assist you from there, but you least know that from the conversation with the Virtual Volunteer that you indeed have that on the screen of a failure. This isn’t or meant to step you through a process. Least not yet. ;)
I am shocked at how much technology has evolved in recent months. It is truly impressive that things like these already exist, and I was expecting it to take 5, 10, or 15 years before they became a reality.
This is definitely the career I intend to pursue in the future. In fact, I'm already in my final year of high school studying a professional course in computer science, and I plan to attend university. Here in Portugal, the field of artificial intelligence still has a lot of room to grow, but I have hope that someday I'll be able to work on a project similar to this one and benefit our community. That's my goal, and if I can't achieve it, at least I'll know that I tried.
I almost wet my pants
I wonder if theres a way to get on the beta program with the understanding that accuracy is obviously not going to be at a level they consider ready for release? I am so excited about this technology and would love to be a small part of bringing it into everyone’s lives.
So excited for this!!
I can't wait to be able to beta test this!!!
I have used AI services before
I’ve use chat GPT, chat on and some other Aiah services
This is a really interesting point about plausible errors. The way that chat GPT works is all about plausibility of the output. As far as I understand it, which is no doubt not fully, is that chat GPT resolves probable answers hence, any answer it gives is the most likely.
This is, in many ways a shortcut to truth... As we've all seen over the last few years. The trouble is we don't have the autonomy of assessing the probability ourselves and making our own conclusions. Daft errors, seeing AI calling carpet "grass" for example tells us that the output is producing a false positive. Chat GPT, on the other hand, will take it in context and assume that grass and a TV are unlikely to exist in the same environment. Trouble is, there are going to be fringe cases where an AI will give us a quite reasonable answer that, in fact, could be just as wrong as seeing AI calling carpet grass, the troubling difference being, it will be incredibly difficult for us to spot the lie.
This isn't specific to be my eyes, rather it is an issue with such large language AI systems in general.
So, back to my original drum that I was banging on, it is certainly a tool we "can" use, adding it to our other skills, but I think it should always be treated with a pinch of caution... For now at least.
Don't get me wrong, I'm absolutely jonesing to have a play with it, I think that, as I get older however, my concerns grow for reliance on technology. It's made our lives so much easier, true, but, coming from the UK, and for those of you in the US, we know what sorts of pickles plausible inaccuracies can get us all into.
I’m less concerned about accuracy.
While improving accuracy is obviously crucial and 100% accuracy would be for now impossible but definitely a worthwhile goal, the way I see it is. When I point Seeing AI at my food and it tells me to cook it for 25 - 30 minutes at 2000 degrees, I don’t get annoyed because it got it wrong and I don’t own a kiln. I am grateful now I know to cook my food at 200C for 25 - 30 minutes. Accuracy on the current suite of apps is so bad, so absolute total complete F that I don’t even consider it an issue. I’m just happy to get any information at all. Virtual Volunteer is going to give me far more information. Maybe it makes a mistake but I’m not planning on using it to perform open heart surgery. I honestly can’t think of a situation where it making a mistake could cause danger. There’s a saying about not letting perfection stand in the way of good that I think applies here. Chat GPT doesn’t lie, it’s imperfect and makes mistakes. It doesn’t know it made a mistake and doesn’t have any desire to lie or tell the truth. It is just a vast library of words and pictures with some fancy maths on top. I’ve heard so many people anthropomorphise Chat GPT and say its lying which I guess Open AI should take as a compliment. Nobody ever said google lied when it returned junk search results and 12000 things I should buy instead of the help article I would like to access but arguably theres more malice in googles actions. They aren’t even trying to give me the results I’d like because their advertising team got between me and the engineers trying to give me what I searched for. So far Open AI have never done anything but their best to provide the result they think I might like. Puritanical concerns aside.
Well you know the expression, "garbage in garbage out". GPT uses data that humans added.
OMG I just realised something.
I just realised where this could go, live audio descriptions of TV shows and movies. I’d already imagined having a FaceTime or zoom style call with it while I go about daily life and just be able to ask questions and receive live answers like a regular conversation with a person about what the camera sees but for some reason I hadn’t realised auto audio description could be on that path. This could happen sooner rather than later too as understanding the world visually is already getting lots of attention so building a UI for accessibility on top of a mainstream resource thats got everyone excited and receiving a bunch of money really could happen pretty soon. Months to years instead of years to decades anyway.
We were so lonely that we…
We were so lonely that we built a mirror. Sad really.
I think as long as people manage expectations, don't take output as gospel, it will be very interesting. it's just not the mind blowing thing the press and seemingly the rest of the internet is making it out to be. It's an appearance of intelligence with smoke and mirrors rather than something that is actually intelligent.
Oliver's point about subtleties
interesting point, this one, and I have a good example of it: I was keen to see how helpful Chat GPT would be to my students. I asked it to tell me about the case of R (Miller) v. Prime minister. It told me, accurately enough, that R (Miller) v. prime Minister was a case concerning whether Article 50 of the Lisbon Treaty could be invoked by use of the royal prerogative or whether such a step needed parliamentary approval. It went on to give a decent summary, albeit witout mentioning any of the specific judgments in that case and without acknowledging that there were several powerful dissents. Only one problem: that case is R (Miller) v. secretary of state for Exiting the European Union. R (Miller) v. prime Minister is about the lawfulness of the advice given by Boris Johnson, then Prime Minister, to Her Late Majesty to prorogue the addled 2017-2019 Parliament in september 2019. When I pointed this out to Chat GPT it got into a right old tangle about the differences between Miller No. 1 (the article 50 case) and Miller no. 2 (the prorogation case). it is way off dealing with subtleties like that just at the moment.
On a different note, I hear there was live audiodescription for the coronation today, for the first time ever. I didn't watch it myself as had my other half who did a splendid job, though I would like to grab it as an MP3 to keep. But the reason i mention it here is: I wonder how the dear old virtual volunteer would have described the big moment? Well, the second biggest moment? The biggest moment was shielded from public view, of course.
Great podcast, but do you have a wine collection? I really am very excited about the possibility of cataloguing my own wine, selecting wines for wine-tasting, pairing wines appropriately and so on. The scanning apps don't render wine bottles awfully well, I find.
Wine Sample for Bingo Little
Hi Bingo Little,
AI: This is a picture of a wine bottle. The label reads "Wild Horse Cabernet Sauvignon" and has an image of a white horse on it. The bottle is dark in color and has a purple cork.
What else is written on the label?
AI: Below the "Wild Horse Cabernet Sauvignon" text, there is smaller text that reads "Paso Robles". There is also a small gold emblem at the bottom of the label that says "Quality Assurance". On the side of the label, it says "Produced and bottled by Wild Horse Winery & Vineyards, Templeton, CA".
What can you tell me about the wine?
AI: Based on the label, the wine is a Cabernet Sauvignon from the Paso Robles region in California. It is produced and bottled by Wild Horse Winery & Vineyards in Templeton, CA. The wine likely has notes of dark fruit and a bold flavor, as is typical for Cabernet Sauvignon wines. It may also have a hint of oak, as the label mentions it is from the "Wild Horse" winery, which is known for their oak-aged wines.
Sounds like a decent Californian red that one. It didn't tell you the year, but I guess you could ask. Either way, I'll gladly have a glass of that with you. To Their Majesties! may the King live forever! God save the king! And with that, to bed on this momentous day.
Question about pictures on Facebook and Web
Great podcast about the incredible power of this new virtual volunteer feature.
You illustrated several examples of pictures that you presumably had on the web. For example, photos in your Facebook feed, a shopping site where you were looking at a shirt, etc.
Question: How did you feed those ppictures to the Virtual volunteer? Did you take a screen shot and then share the photo or is there some rotor action for sharing photos in the Facebook or Saffari apps?
May I have one please!!!!
This addon is inteligent enough for us.
This addon is inteligent otherwise it wouldn't be able to answer very spesific questions.
If you view inteligence as in, can this device think like a human in every second of the day? Then of course it's not inteligent, yet, it will probably get there one day though.
The thing is, this is a tool and it doesn't pretend to be anything else.
Is it a super powered AI no, not yet, but I'd love to see this in a video feed and I do think we'll get there one day.
It will be a huge drain on battery but it will guide us like no other app has ever done.
@Dominic, have you checked out the podcast?
This will be able to tell you what your washing machine is set to, what programs are on a tredmill, what those 2 buttons do on my Roomba that I have no idea about and more.
Also, I'll be able to view Pictures of characters like Ellie from the last of us and oh yeah, reddit! I'll be able to view any pic I want and get a detailed description of it, unless reddit really does make you pay for things which I honestly might still do if that's a thing in the future.
I don't understand why you think this is bad? Or not usable in its current state?
It's up to you if you don't use it but I'd like to know your reasoning.
Don't swallow the hype either way
I've been a ChatGPt+ user since the first day. The more you use it, the more you appreciate it. It isn't Terminator, but it also isn't just auto-complete, it's somehting more.
How much more, who knows, but the future is coming much faster again after all this time wasted, so that's a good thing.
We don't need AGI to have our lives changed forever, just human level computer vision!
Just asked Chat GPT about the Miller cases again today. I'm afraid it's still frightfully confused and ended up telling me that Jeremy Corbyn is the current leader of the Labour Party. However, I then asked it a question about the law school where I work and it was able to give me a very sensible summary of the courses we offer and accurate information on how to apply. I imagine with dear old virtual volunteer (surely this entity should have a name?) it'll be a bit hit and miss in the same way, but i'm looking forward to using it.
Bingo: it will rain in Northumberland tomorrow.
Virtual volunteer: Good for the crops.
Community: who gets the literary reference? Whoever does is of singularly good taste.
The thing is...
This is using chat gbt4 which is much more powerful than 3.5.
The only issue I have with gbt is, as far as I know, it can't actually go online like the bing AI but the bing AI is using chat gbt4 so I don't know what's going on there...
As for mistakes, sure, it'll make them but I don't think it'll be as frequent as peple think on here.
Magic: The Gathering
You mention, that you have also tested it on Magic: The Gathering cards. How does it handle the different color symbols, that indicate which color of mana is required? And how does it handle the associated numbers?
I know, that it should be able to say the name of the card, and the associated text, since Seeing AI can do this as well.
Thanks Anonimouse for bringing always high-quality podcast files.
I’m happy you selected a Mexican restaurant menu for this podcast lol.
I’m curiose how this works to describe a scene, it could tell us a wide description of how to navegate an area no matter if it is a general description.
Love it and hope to try it as early as possible.
Greetings from Mexico, the house for the enchiladas.
RE: Question about pictures on Facebook and Web
In the podcast earlier, I talked about two ways to use the Be My Eyes Virtual Volunteer. One of them is with the Shared Sheet. So, let's say you're on Twitter, Facebook, Safari or any other similar app and you come across an image. You can do a one finger double tap and hold on the image which brings up a few options including Share. From there you can send it to familiar things like Airdrop, Mail, Messages, and other apps you can share to. One of the options is to Describe with Be My Eyes. Once you select it and double tap to use it to share with, it will automatically start the conversation with what it sees, and you can then follow up with any questions you have.
Hope that helps! :)
using it to see the message on a TV?
If you have a code on the TV or something you need read on a TV could this be done? if so how would you do it?
thanks - was Re: Pictures on Facebook and web
Thanks for that tip. I didn't know about the context menu for pictures in those apps and the web. Very useful.
BTW, As you suggested in one of your previous podcasts, people should try that double tap and hold gesture on more items like the app icons on the home screen and, as I now know, pictures on the web! In that podcast you also mentioned the tripple finger tab that one could use instead of the single finger double tap and hold. I'm trying to retrain my muscles to use that triple tap instead of the double tab and hold. Another nice tip!
re: using it to see a message on a tv.
You'd take a picture and ask the V.A about it.
For example, can you tell me if there's a code in this image?
RE: Magic: The Gathering
Here is an example of a Magic card that one of my kids had laying out.
AI: This is an image of a trading card. It is a blue dragon card with the text "Flying - Creature - Dragon" on it. The dragon is shown with its wings spread and it has a golden chest plate. The background is a dark forest with yellow lightning. The card has a power level of 500 and additional text that reads "When Blue Dragon enters the battlefield, until your next turn, target creature gets -2/-0, and up to one other target creature gets -3/-0."
What else can you tell me about this card?
AI: The card is from the trading card game, Magic: The Gathering. It is an English version of the card and the artist is Steve Prescott. The card number is 289 and it was released in 2021 by Wizards of the Coast. Additionally, the card has a rarity of rare and a converted mana cost of 5.
Please read everything on this card for me
AI: Sure. The card reads:
Blue Dragon Flying - Creature - Dragon
When Blue Dragon enters the battlefield, until your next turn, target creature gets -2/-0, and up to one other target creature gets -3/-0.
289 - EN - Steve Prescott
© 2021 Wizards of the Coast"
Additional details on the card include:
- Rarity: Rare
- Converted mana cost: 5
- Released by Wizards of the Coast in 2021
- Artist: Steve Prescott
- Card number: 289
Hope that helps.
I haven't downloaded/installed Be My Eyes yet, primarily due to time constraints but this sounds way cool. Thanks for this excellent walkthrough. I kinda hate to say this, but it seems like I'm just about the only person in my area who is supportive of all this new and cool technology. My personal assistant for example, does not like Apple and has been very skeptical whenever I've mentioned any of this to him. Almost to the point of not letting me use these things, but I don't think it's quite there yet and I've been more or less racking my brain trying to think of a way around it. In a sense I understand where he's coming from: he doesn't want to lose his job. But on the other hand I love all this techy stuff and I have a right to use it. Others around here have been perhaps a bit more understanding/accepting, and my family has of course been very supportive of me using all this stuff.
RE: Magic: The Gathering
thank you AnonyMouse
From the description, I couldn't quite make out what the mana cost of the card was.
For example, it could be 4 blue and 1 neutral.
As I understand it, the colors are not letters, but shown as a color symbol followed by a number.
@Ekaj your personal assistant is going to have to get used to it
Or lose their job. They may end up being let go anyway dew to better tech out there.
RE: Magic: The Gathering
Just for giggles and fun I ask the Virtual Volunteer to let me know what the Mana Cost is for this card. Here is the response I got.
AI: Yes, there is a small blue circle in the top right corner of the card with the number "5" in it. This is likely the mana cost for the card in the game.
So I pressed more about the Mana Cost and this is what I got.
AI: The mana cost for this card is 5, as indicated by the small blue circle in the top right corner of the card with the number "5" in it. This means that in the game, a player would need to spend 5 mana in order to play this card. Mana is a resource in the game that is used to play cards and activate abilities.
Hope that helps and gives you a glimmer of hope you might be able to use this to play Magic.
RE: Magic: The Gathering
Sorry for my late response.
I had a sighted person look at the Blue Dragon card, and they got 5 any and 2 blue out of it.
It's very possible that we were looking at a different version of the card, which could explain the different result.
I assume you also cross-referenced with a sighted person?
Otherwise, would it be possible for you to try with another card?
Sorry, I just want to be absolutely sure that it's accurate.
Sorry for bringing up the topic again, but I think it wouldn't make sense to create another one since this one already exists.
I have started listening to the podcast again, and I have a few questions.
Firstly, and most importantly, for those who are already enjoying the tool: How long did it take, on average, for you to gain access?
My second question is, how are the commands given to the virtual volunteer? Since the podcast only provides answers, I was wondering what the user interface would be like. Is it similar to a virtual assistant, such as Google's, for example?
Lastly, does it support multiple languages, or just English at the moment?
Thank you in advance for the answers.
Interface and languages.
The interface is very simple, you first upload a picture which virtual volunteer gives you a description of and tries to guess what you might like to know, theres then a text box you can either type in or dictate in. You then send the message and in a second or 2 you get the reply spoken by voiceover. At that point you can either upload additional pictures or ask it questions about the picture you already uploaded. The conversation builds and it remembers your previous requests and its replies so theres always context until you close the window. As for languages, It is capable of speaking in other languages and can in fact translate as it goes, it seems to know languages it wasn’t even trained on which surprised the devs.
Thank you very much for the response!
I thought it was something like a voice assistant, with a microphone and such, but it turns out it's a text box, which I consider much better for privacy concerns and not asking the whole world for a description of someone in the photo.
Regarding languages, it's not that I don't understand English well enough, but having feedback in Portuguese ends up being much better in terms of formulating questions in a way that can be concise and clear, both for me and for it.
How long will it take to start granting access?
Once again, thank you for the response.