Picture This, or Maybe Not: a Review of the New Image Description Feature in iOS 11
This post has been updated to include tips for using the image description feature based on feedback received from the Applevis Editorial Team and members of the Applevis community.
When I first heard at the WWDC Conference In June that Voice Over was going to include a new image description feature I was excited to say the least. I thought to myself - finally I will be able to know what all of those funny memes say on Facebook! Yes, I know - super productive application of the feature right? Well, when I finally got my hands on iOS 11, I was very disappointed. Below follows a summary of the new feature as well as my experience putting it through its paces over the past week on my iPhone 6.
Using the Image Description Feature
Apple has made it very simple to access image descriptions without having to import the image to another application. When you encounter an image that you would like the description of, you simply tap once with 3 fingers and you receive a general description of what is in the picture along with the text if any is detected. The only improvement that I would like to see to this gesture is an ability to do it one handed as it can be a bit awkward to use the 3 finger single tap motion.
Image Descriptions in the Photos App
The first place I test drove this feature was in my Photos app. Previous to iOS 11, Voice Over attempted to guess what was in the picture as you browsed through the photo library. Oftentimes you would hear if the image was sharp or blurry, if there were faces in the image and descriptions of possible objects in the image such as cars or animals. In iOS 11, it appears that this feature has completely been replaced with the image description feature/gesture. As I scrolled through my library the only descriptions I heard were if the image was sharp or blurry with no mention of people or objects in the picture.
When focused on a picture and using the 3 finger single tap method I only received minimally more feedback about the picture including information about again the sharpness/brightness of the picture and the page the picture could be found on. Many of the pictures did contain text, however Voice Over more often than not did not detect that there was any text in the picture.
Image Descriptions in the Facebook App
The next place I tested this feature was on Facebook. There are so many pictures, many of which contain text, that are shared on this platform daily and I was really intrigued by the idea that I could finally have access to all of the funny and sometimes thought provoking images that my friends and family share. I promptly scrolled through my News Feed of course not finding any images immediately (because this is always how it goes when you want to find photos). I finally found a photo that a friend shared. I was so excited, the Facebook alternative text even said that the image may contain text. "Yes!" I thought, I can finally be part of the conversation. I double tapped on the status update that contained the photo, scrolled to the image, invoked the 3 finger single tap and...all I got was that the image was sharp. I was so disappointed especially after all of the hype that this feature received. Thinking that this surely must just be a fluke, I continued to scroll through my News Feed to find another picture. Time and time again when I encountered pictures on Facebook I went into the status, found the picture and almost every time I was given even less description of the photo with Voice Over than the alternative text that Facebook already provides.
And What About Gifs and Images in the Messages App?
The final place that I tested the image description feature was within the Messages app under the Images i-message app. Last year when Apple launched all of the i-message apps including the ability to search for and send gifs and images I was very disappointed that no alternative text or descriptions were built into the native app. When Apple announced the new image description feature I thought that surely this feature would work amazingly in their own native app. Once again, I was disappointed. When scrolling through the list of potential images and invoking the 3 finger single tap to access descriptions I was again provided with minimal useful information about the image. I was not even provided with if there were people or animals in the picture and there definitely was not any text extracted and spoken from any of the images.
Tips for Optimizing Image Descriptions
Although the image description is not perfect, there are some settings that you can adjust to increase the likelihood that Voice Over will accurately provide descriptions.
Firstly, this feature will not work well with the screen curtain turned on. If the screen curtain is turned on, you will likely hear an image description that indicates that the image is dark or blurry. Secondly, your screen brightness must be turned all the way up to 100% especially for images that contain text. I performed more testing after the official release of iOS 11 and found that the descriptions significantly improved after making these adjustments.
As you can see, I had very high hopes for this new image description feature. As many of you know, I absolutely love Apple, however they definitely missed the mark on this accessibility feature. The descriptions provided are of no assistance to a blind or visually impaired user and the claim that text will be described is simply unreliable. I understand that there are many different types of typography and layouts that may impact how Voice Over would be able to read and describe the image, however this is so unreliable that the feature in my opinion is completely useless at this point in time.
I will definitely keep an eye out for future updates to hopefully see improvements to this feature, however, for now, if you were hoping to upgrade to iOS 11 for this feature I would recommend waiting until some additional accessibility bugs and improvements can be addressed as there are not many significant updates and improvements to iOS 11 this year.
Have you tested out this feature yet? What was your experience? Share in the comments below.
I just tried out the feature with the screen curtain on and surprisingly I received pooer results. Images that previously said sharp when the screen curtain was off said blurry when on. Maybe there is a trick to getting this feature to work, but I have not found it yet. :)
I have tried this on various screen shots within the app store and I think it did a great job describing text within the image. I have also managed to use it in apps that do not have labeled buttons and it has read me the name of the button in question. I have not tried this extensively but so far for me I am able to use it to get more info about things than before.
Thanks for this excellent blog post Serina. I don't currently own an iOS device but am planning on getting an iPhone for Christmas this year. The image description feature sounds great, but based on what I've read thus far about iOS 11 accessibility I think I'll probably get an iPhone running a prior version such as iOS 10. I, too, love the Apple products which I currently own but it seems the accessibility people at Apple have a little more work to do before a new iPhone user like me can install this new update. That said, I am looking forward to updating my Mac next week or shortly thereafter due in part to this improved image recognition.
I have just briefly tested this feature, and got confused on how it works. Therefore, I tested it together with a sighted person and figured out the following:
The feature does not seem to do direct OCR on the picture. Imagine it more as a combined OCR and descriptions.
First I took a screenshot of my homescreen which worked great. Then I tested it with a lot of photos from Facebook, which only gave some bad results. Then I asked the sighted person to take a picture of a random text, which totally failed. Voiceover recognized there was some text on one of the photos which was true, but it didn't read any of the text at all. Then, I did something by mistake which gave me some very interesting results:
My goal was as follows:
1. Take a screenshot of our conversation in the messages app.
2. Let VOiceover do some OCR, description or whatever it does.
3. Send the screenshot in a sms to the sighted person, so she could see what I was trying to recognize.
4. Copy exactly what Voiceover was saying right after it has recognized the screenshot by tapping 4 times with 3 fingers.
The result was as follows:
Without noticing, I took a screenshot without hiding the keyboard in the messages app. Therefore, the keyboard and a very small potion of the conversation was on the screenshot.
However, what Voiceover read out to me was:
1. The persons name, which wasn't shown on the actual screenshot.
2. Some random parts of the conversation which wasn't on the screenshot.
3. It didn't read one single letter from the keyboard which covered half of the screenshot.
I called her to get those descriptions, and I was, and still am, very puzzled of this behavior. The only solution I can find to this is the following:
VOiceover is using a mix of OCR and descriptions to describe what's on the photo. Maybe this description feature can scroll through a bigger part of the photo than a standard iPhone screen can do, I don't know. I'm still puzzled about where Voiceover gets information from, which is not shown on the actual picture. Therefore, I think this is a description process, and not an OCR process. Or maybe I'm just confused and totally wrong. :)
I've had the best luck with this feature where there is no alternative text provided. I.e. If I'm on a web page and I get an image with no label, this feature will often extract text if there is text to be found. If, on the other hand, it has a useless alt tag like "img000202_ggk", the feature will not activate its OCR routines. I think Apple needs to tune this so that OCR will still activate, even if a label is present. I've had the same result within applications: completely unlabeled controls get recognized, whereas controls with uninformative but still present labels do not. Clearly priority is given to internal labels, where as it should be given to OCR since it would follow that we'd activate this feature when what we're hearing is not enough.
The best i've seen it do was recognise an app store screen shot as a television. I also had a few very small spurious attempts at OCR, one in crafting kingdom that just said City, and one in a public transport app where it read out the text on a ticket normally displayed as an image. I was just about to complain how this feature doesn't work, but I just gave it another go and I can't stress how important it is for you to turn off screen curtain. It makes a massive difference. I took screen shots of an E-mail from duolingo, the gameboard of word search war and I also had an image of my home screen. VoiceOver was able to pick up kind of bits and pieces from all of them. Then I went ahead and brought up the app store and skim through some screen shots. Not all of them worked, but a few did. As a final test I brought up twitterriffic and switched it into centre stage which extracts all tweets with attached images and puts them in fullscreen. It picked up that someone took a picture of a nightclub, and correctly recognised a few blurbs from iDownloadblog articles about iOS 11, IE what was recognised as a sign which said "quick type: type like a pro 1-handed", the image of a whip with the text advertising the movie "Whiplash" and correctly detected that there was a chart under an article talking about Valve adding charts to Steam reviews. So, the takeaway from all this is don't expect this to replace your KNFB reader, but certainly give it a try if someone sends you a picture or screenshot, it works surprisingly well for something done locally.
Finally, in addition to turning off your screen curtain, you want to make sure the image fills as much screen as you can. The text from screen shots I had in photos and in app store pages was only detected after I tapped on the thumbnails of the images to have them fill the screen, same goes for recognising details on photos. If you just have it detect a smaller version of the photo, it'll probably pick out faces but don't expect to get any more details of what may be in the background.
That is super interesting, thank you for sharing...hopefully this will get more consistent in the next update.
Although I have found the results vary, you do have to have screen curtain off for this feature to work optimally. This has been noted by beta testers and in articles/publications. Good luck!
That is definitely very interesting...there were times in my testing where in Facebook with there being some alternative text that some text was extracted and spoken, but this was very rare for sure.
I used it with the screen curtain off and it works for me. I haven't tried the Facebook app yet.
Tried it last night, and I didn't have much luck. I'll try turning on my screen today and let you all know how I fared.
With there being such a mix of experiences with the image description feature, we reached out to Apple for clarification of some of the issues mentioned in this thread; and here is what they told us:
Firstly, the screen brightness level does not affect how image description works.
This was somewhat surprising, as it directly contradicts the experiences of some on here. However, it also supports the experiences of others, for whom screen brightness has not affected performance of image descriptions.
Secondly, the Screen Curtain does currently need to be disabled for image description to work consistently; although Apple says that when you turned on the Screen Curtain can be a factor here. Specifically, image description is likely to work if you turned the Screen Curtain on after opening the app containing the image that you want described.
However, some testing of our own, suggest that things may be even more inconsistent.
For example, we carried out some testing with an image in Twitterrific.
With the Screen Curtain off, we got the following description of an image:
sharp, , Possible Text, iPod 21:26, Q Search, PLEASE HELP BY SHARING MY POST, area with another girl who was found in, top and dark grey leggings and has no coat, or mobile with her., She is extremely vulnerable and at risk and, depression and epilepsy since Sunday., She may have tried to get to the Luton area, where her fathers relatives are but she, doesn't know where they live. After, Monday morning., If anyone has any information please contact, Bromley missing persons unit, GIF, page 1 of 1 Centre of screen,
As you will probably agree, this was a great result ; and a good demonstration of the potential value of this feature.
We then turned the Screen Curtain on with a 3 finger triple tap. Immediately we performed a 3 finger single tap on the same image. We got the same description
So, in this scenario, whether screen curtain was on or off made no difference.
However, things quickly deteriorated.
We allowed our device to auto-lock. After unlocking it again, the screen was still displaying the same image in Twitterrific. Now, a 3 finger single tap produced the following feedback:
very blurry, , page 1 of 1 Centre of screen
Yuck! Where did that great description go?
We tried toggling the Screen Curtain off; on; off; on ... and always the result was the same.
We tried other images in Twitterrific, and repeatedly got the same type of result. The only variations were in whether the image was reported as “blurry” or “sharp”.
But, then we manually locked the device; came back to it a little later; and the image description was back to being present and great.
We tried other test scenarios, such as closing Twitterrific via the App Switcher, and then re-opening it ... with the screen curtain off; with the screen curtain on. We locked and unlocked the device, we moved around in Twitterrific before returning to our test image.
We wont detail all of these test cases, but the performance of image description was consistently inconsistent.
The bottom line is that image description was all very inconsistent, unreliable, and generally frustrating.
But, going back to that response we received from Apple ...
Apple are currently working to address these inconsistencies; and a future iOS update should allow you to use image description with Screen Curtain turned on.
So, watch this space for the next iOS release.
In the meantime, we can only suggest that you persevere and experiment, as our testing does suggest that image descriptions can be really good.
Hi, Does anyone know how the software works for gaining the info? I.E OCR engine etc. I ask because whilst the descriptions are random, even if it finds something it doesn't necessarily read all of the text for example. So just looked at a FB post which was a joke. It told me there was a clear image possible text. Then it gave the first line of the text. That was it. So does that mean that a sighted person would only see the one line? I doubt it so I don't get what it is looking at.
Hi All, What I was getting at was; If it detects a sharp image then why does it not read all of the text it finds. I've seen a few text images where it just seems to randomly read the start then stops.
Hi all. First, I didn't have great luck withthe description feature. I'd also definitely like the gesture to change. Asit is, Apple just piggybacked onto one of their own already. What's wrong with a two finger tripple tap, or a one finger tripple tap? At least those you can probably do one handed. i'll give up on the description until we have better results. It's not worth the frustration with all those meems.
hi all, i too have had not very good luck with this feature. nine times out of ten, all i get on Facebook pictures is sharp, or blurry or page one of one center of screen. no great descriptions.
I have had some success with Facebook memes. I tap on the post in my news feed, then find the photo. In the actual post, I do the triple finger tap, and I usually get a similar description to what Facebook said, and if there is text, i can usually read some, if not all, of it. this is a big improvement over knowing text is present, and having no clue what it is unless I jump through hoops to get the photo into another app.
Is it great? Nope. Is it better than what I had? Yup. Am I hopeful it will get better? You bet.
Hello all! I have a question I've been wondering about. I'm still on 10.3.3 and listening to my sensible side. lol I am however, excited to get this update for the image disscriptions and maybe a few other things. I do have a question that I'm kind of curious about. Does anyone know if it will disscribe GIF.'s or not? I'd love to be able to enjoy these as much as my sighted peers do. And I know that there's an app out there for that, but a girl can dream right?
Hi Dawn, to my knowledge no it does not. I had the same hope. However, in FB re the GIF picker you still only get GIF stated. Double tapping posts it without you knowing what it is and 3 finger single tap did nothing when I tried it. It maybe something for the future though.
I tried this feature on several photos in my photo ap, but the only thing I hear is that the photo is sharp or blurry and on which page I am. No image description at all.
Apple announced this feature as an accessibility feature. In IOS 10 I had very good image descriptions in the photo App. In IOS 11 I have no descriptions at all. I really do not see what is improved here?
Ok guys in iOS 11.2 I did find an improvement. Text found was read out and it appeared to read all text. However, in 11.2.1 it seems to have gone back to the bad old days of either not reading anything or just bits of things it sees. NOt good really.
Has anyone been able to once again get description of pictures in the photo app like was in IOS 10?