New AI app for describing images and video: PiccyBot

By Martijn - Sparkling Apps, 1 March, 2024

Forum
iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

By Martijn - Sparkling Apps on Friday, August 8, 2025 - 10:29

Blindpk, I have added GPT5 to the model list as well. But be warned, it is really slow at the moment, possibly be the first day traffic.. For practical use, picking the nano model at the moment probably makes more sense for now.

By Winter Roses on Friday, August 8, 2025 - 12:14

So I have tried out the copy-paste feature from Facebook, and it works pretty well. The only caveat is that whenever I allow the option, if I want to copy and paste a second image, I have to click the allow option again. Is there a way to optimize this feature so that I give the permission once and don’t have to repeat it?
Regarding the GPT-5 model and description, yes, I have to say that it's quite snappy and efficient. I do notice that the initial descriptions are shorter—initially it gives me a summary of what’s in the pictures and video content, but if I send out a message, I can ask for more specific, detailed descriptions. I don’t know if you’re able to make it so we can get the long description without having to prompt for it. Right now, the model response faster, but, yeah, the description is shorter on the first attempt. It's perfectly fine the way it is though.
I don’t know if I mentioned this in my last feedback, but it would be nice to have the magic tap to stop and start the description. Sometimes, when I’m dictating a message into the box while asking a follow up question, if I double tap, the voice starts speaking out the description. I’m not sure if this is something you can change so that I can dictate without the speech starting again, or maybe I should be pausing it. Either way, having the magic tap to stop and start the description and ensuring that when I’m dictating into the text box, the voice isn’t speaking when I tap, would help.
These are some wonderful changes—big improvement. Very proud of you, please continue to keep up the good work. I appreciate this app so much, so thank you for dedicating yourself and putting in the hard work to ensure this product is the best it can be.
I want to clarify some confusion here. I think where a lot of people might have an issue is when copying a picture from Facebook. I was initially looking in the share sheet for the option, where you have the option to share the link, post to your story, or share to other profiles, but that’s not where the copy option is located. It’s actually in the same section where you have the option to save a photo to your device. When you come across a picture on Facebook that has alt text—usually the automatically generated text by Facebook—it might look something like “Photo, may be an image of dog and grass.” When you hear this, you’re going to double tap on it, assuming you’re using VoiceOver, and it will bring up a “More” option within that section. When you double tap “More,” it will show you an option that says “Copy Photo.” Double tap that, and it will copy the photo to your clipboard. Be careful not to copy anything else until you get the description. When you switch over to PiccyBot, go into the text box where it says something like “What’s in this photo?” or “What’s in this video?” Double tap and hold, and a box will pop up asking if you want to allow pasting from Facebook. Double tap “Allow,” and it will automatically send off that image so you can receive the description.

By blindpk on Friday, August 8, 2025 - 13:17

Thanks a lot! Yes, GPT-5 is pretty much unusable right now with the processing time taking so long, GPT-5 Mini seems to work well though.
I found the copy image option in Facebook at last. It is really weird, I knew it used to be there but earlier today it didn't show up at all for some reason. Anyway, the copy/paste feature works well. I agree with Winter Roses though, is it possible to only have to give permission once?

By blindpk on Friday, August 8, 2025 - 13:40

There is a model in the API called GPT-5 Chat, which is the same version of GPT-5 powering ChatGPT. When I test it it responds MUCH faster than standard GPT-5, however with rather short descriptions, but might be something to look into and add instead of the standard one. Here is the page about it:
https://platform.openai.com/docs/models/gpt-5-chat-latest

By Michael Hansen on Friday, August 8, 2025 - 14:20

Member of the AppleVis Editorial Team

Hi Martijn,

First, thank you for all of the work you have done and continue to do for our community with PiccyBot. I never thought that such a service would exist, especially having access to multiple models.

I do have one question about how the AI models use the data we share through PiccyBot. I know that you/PiccyBot do not store or save the pictures uploaded by users; I am wondering, however, if you have any information about what the various AI companies do with the pictures that PiccyBot sends? I enjoy comparing descriptions from the various models, but I am uncomfortable having described pictures of family/friends/anyone else if these services are storing/utilizing the pictures users send. Regardless of what the AI companies do, please understand that this is not a reflection on PiccyBot or the work that you have done for our community.

Thanks for any insight!

By Enes Deniz on Friday, August 8, 2025 - 14:33

I live in a region where cellular connection is more reliable than Wi-Fi but don't think this has anything to do with connection stability. I can't even get a description when I upload a photo. I just get a Retry button but the result doesn't change no matter how many times I retry. Other apps like Be My Eyes can provide image descriptions without any problems though.

By Martijn - Sparkling Apps on Friday, August 8, 2025 - 15:41

Enes, if you feel PiccyBot is 'stuck', while network is fine, please either restart your phone or even reinstall the app. It should work again. It's an elusive issue that I will try to fix the coming time.

Michael, as said, PiccyBot doesn't store any media or prompts. And there is an additional layer of privacy since all requests to the providers come from the PiccyBot address, not yours. However, the AI providers can use your data in some cases. OpenAI says they won't, but you never know. Anthropic (Claude) has quite a good reputation and Mistral being European is very privacy conscious. Safest is Llama 4, since that is running on a local server and all data is removed immediately after use. The worst is likely Google. But hard to avoid them, especially with the Gemini 3 model around the corner of which I have high expectations.

By mr grieves on Friday, August 8, 2025 - 15:58

Firstly, thank you so much for this new feature - I have been wanting an easy way to get facebook images described for ages.

I think I am being thick though as I can't find the option to paste.

In Facebook, I go to an image, double tap to view it, then double tap and hold for a bit to get the menu, and then I select Copy image. I then switch to PiccyBot... but where is paste? I presume I am repeating the same action as per Facebook - double tap and holding. But I can't find the option to paste. What should have the focus when I do this? I've tried the text box, heading and some of the buttons.

Sorry I know I'm always the last one to figure these things out. I think I am on the latest version - there was an update pending so I installed it before trying.

By Enes Deniz on Friday, August 8, 2025 - 16:41

I can't interact with the AI model selection dropdown in the settings. Double-tapping does nothing. Also, there's this button labeled as "gear.badge.questionmark" that should be labeled more properly, likely "Help".

By Martijn - Sparkling Apps on Friday, August 8, 2025 - 16:54

Enes, you somehow cannot access the Firebase database with the PiccyBot settings. Can you use a VPN or other network and try again?
Actually I have included a built in offline list, but that backup feature wasn't included in the latest release. I'll provide an update by Monday.

Mr Grieves, you can paste in the main view in PiccyBot with a long press. Press in the middle of the screen. It will then prompt 'PiccyBot would like to paste from Facebook', 'Do you want to allow this?', and then you can select 'Allow paste'.

By blindpk on Friday, August 8, 2025 - 17:27

Checking the API terms of the different companies they say basically the same thing. All of them store your data for a limited time to check if it complies with their usage policies. How long this time is varies (and some are vague about it). AFAIK the Piccybot server is in the EU which means the companies have to follow GDPR, but you of course really never know. I'm not sure that any of the big companies are "better" or "worse" than others in that regard.

By Enes Deniz on Friday, August 8, 2025 - 18:22

I could access the model list and select Grok 4 to find out how it would describe images and videos but then I forgot that it wouldn't be able to describe videos and captured a video. I did get a description afterwards, but the video was probably described by GPT4 or whatever the default/free model is. And now when I open the app, I have the free version interface with the Subscription button and an ad on the screen. I may try restarting the device or reinstalling the app but just wanted to inform you in case you work on fixing such issues. Also, I'd love to know whether I will have to keep the VPN on even after selecting the AI model, to access the server and retrieve descriptions at all times, or only once. Another thing is, why don't we have DeepSeek, Qwen or other models among the available ones? What models do provide video descriptions if not Grok 4? Can I not select any other model apart from GPT4 if I want to be able to get video descriptions as well as image descriptions? And can I not customize the default/initial system prompt? This would be quite handy. It's actually somewhat strange that this feature is missing when we can even customize the personality of the voice and possibly the description as well. Or does the personality customization thing apply to the style and intonation of the voice only rather than the content of the description? Finally, adding Piper voices as an option might create a free option and help reduce costs for you. They're also neural voices even though they lack style customization. They're also open-source and can be deployed on any server. Wait, why not just use the system voice then?
* Update: I did uninstall and reinstall the app while writing this, but now the Restore Purchases button doesn't bring up the App Store screen to let me restore the purchase. Let me also add that the Turkish localization is incomplete.
* Update 2: Just disabled the VPN and finally got the premium screen back after double-tapping on the Restore Purchase button several times.

By Enes Deniz on Friday, August 8, 2025 - 18:38

It appears that my model configuration is stored on the server, not the device itself. I completely uninstalled and reinstalled the app as I mentioned above, and the other settings were reset to the defaults, but it was still Grok 4 that was selected as the AI model in use.
And comes the question: What is Piccybot Mix and how is it supposed to work?
And here's a suggestion: Can we not set the description to match the language of the content if it is in any of the languages we specify in the settings? This could be useful for bilingual/multilingual people and those learning foreign languages etc..
Update (more questions): What is "Blind native Style"? Is it a model? How exactly does the length parameter work? Does the number let you set the number of words per response? If so, should the description length depend on the content itself to a certain degree? What if we prompt PiccyBot to describe a long video? Will it still stick to the same description length and truncate the response?

By Martijn - Sparkling Apps on Saturday, August 9, 2025 - 09:02

Thanks for the feedback!

The available models vary from time to time. PiccyBot had DeepSeek, but I replaced it with Llama4 as that was a similar open source model and I want to keep the list manageable. I also removed GPT4o mini recently, as we now have the GPT5 models.

Regarding the video descriptions, only the Gemini, Amazon and Reka models do that. The other models are image only. PiccyBot will default to Gemini Flash 2.5 for a video description when a diffrent main model has been selected.

The personality affects the tone of the voice and will have some adjustments in the style of the content. Turn it off for a clean description. I will likely add a few more voice options the coming week. For the system voice, you can can set the voice to 'None'.

I hope the network and VPN issues will improve, I will add more local settings and backup options to ensure the settings remain accessible even if the network cannot connect to the Firebase server or the PiccyBot server.

PiccyBot Mix uses a combination of descriptions given by OpenAI, Google and Mistral models, and uses only the elements that are common to all. This should in principle all but avoid any hallucination in the description. So use this model for the most accurate description. Image only.

Blind Native style uses an inbuilt prompt to ensure the description is relevant for people born blind, with more focus on touch and no reference to colors etc.

The length parameter basically determines the number of tokens used with the model. Set to 100, it will result in more lengthy descriptions, while 10 will give a concise description. The response speed will be slower with a higher length setting. For a long video, set length to 100 for the maximum detail in the description.

The video quality setting determines the amount of compression of the video when sending it to the server. Low is high compression (for free users) while high is no compression. Setting it to high will give more exact results at a cost of slower processing.

Hope this helps!

By mr grieves on Saturday, August 9, 2025 - 12:07

I'm not sure how that works with VoiceOver. I don't really have a "main form" that I can give focus to as far as I know. I can select all the elements in it, but not the form itself.

I have managed to get it to work a couple of times but I think it was pure luck.

Has anyone managed to do this with VoiceOver?

By Martijn - Sparkling Apps on Saturday, August 9, 2025 - 12:55

Mr Grieves, It should be double tap and hold. But you are not the only one having trouble getting it to work. I will try to make it automatic in the next update. So, if PiccyBot finds you have an image on your clipboard, it will prompt you with a question whether you wish to paste it.
However, Apple is tricky with this, as they want only user initiated actions, not automatic ones, so they may not approve. Let's see..

By Winter Roses on Saturday, August 9, 2025 - 13:17

I use VoiceOver, and yes, I have gotten the copy-paste feature on Facebook to work with the app. I will say though, it can be tricky, because from what I understand, you have to be positioned right at the start of the line in the text box for it to work, and it has to be done pretty precisely. Isn’t there a way this could be part of the rotor?

You know how, when using the phone, there’s usually a box or menu with edit options on the rotor that include “Select,” “Select All,” “Copy,” “Paste,” “Share,” and other relevant commands? Is there a way you could enable something similar so that, for example, if I have an image on the clipboard, I could go into the text box manually, switch to the rotor, go to “Edit,” and then double-tap the “Paste” option? This would essentially put the picture into the box, just like how it works on the iPhone directly.

Right now, if I copy a picture directly to my clipboard from my iPhone camera, I can paste that image into the Notes app without a problem, but it doesn’t seem to work anywhere else. I don’t know if this could be implemented here, but it’s an option worth looking into.
As it stands, the edit menu is there, but none of the options show up. It would also be good to have some kind of text representation to show that there’s content in the box. Maybe it could display something like “Image” or even a short code such as JPG, GIF, PNG, or a series of numbers and letters. Basically, anything that would give an indication that there’s media content processing in the app. Having a completely blank box with no indication feels a bit strange, because there’s no way to know that there’s actually media there if you can’t see it.

By mr grieves on Saturday, August 9, 2025 - 13:23

Thanks for the reply. I think the problem with VoiceOver is that it needs a specific child element to interact with and if it needs to be done on the background container then it becomes a bit tricky.

I wonder if popping up when an image is detected could get annoying. If I am using my phone I don't typically do much copy/pasting unless I am also on my Mac. So if I have an image in clipboard, it's likely to stay there for a long time. So if PiccyBot prompts me every time, then I would need to try to find some text to copy just to stop it happening?

Is there enough space on the screen to add a paste button amongst the other buttons, but only display it if there is an image to paste? Or maybe do something with the rotor actions?

Anyway, thanks again for this - once it becomes a bit easier this is going to be another really big advantage of PiccyBot compared to everything else. I usually ignore Facebook as I just feel excluded and I'm too lazy to save files all round the place just to have them described. I've been wanting something like this for ages.

By Michael Hansen on Saturday, August 9, 2025 - 15:26

Member of the AppleVis Editorial Team

Hi Martijn,

Could you please implement an easy way to copy just the image description to the clipboard? Right now, this is accomplished by pressing the Share button and copying the described text to the clipboard. Once I get where I want to paste the description (usually into a message to someone), I have to do some editing to remove the link to PiccyBot. Would it be possible to add a "Copy" function to the VoiceOver Rotor when focus is on the text containing the description, and to please remove the PiccyBot App link from what is coppied?

By Enes Deniz on Saturday, August 9, 2025 - 21:25

Thanks...
Question 1: Does the "None" option not disable the voice entirely? How does that let you use the system voice unless you have VoiceOver or Speak Screen enabled and use the appropriate gestures to hear the description? What I mean by "system voice", however, is the ability to have the system voice read out the description even if no such feature is enabled.
Question 2: What LLM does PiccyBot Mix use to perform text generation and generate the response? What do you mean by "elements"? Does MiccyBot Mix compare the text responses provided by the different models you mentioned or does it compare the raw image processing results and then finds the elements commonly found in all of them and then generate the text response itself?

By Winter Roses on Sunday, August 10, 2025 - 05:30

So the video description is done by Gemini? Why did I think it was a variant of ChatGPT? Wasn’t this a thing once, or did it change? I personally prefer the descriptions from ChatGPT, so I’m wondering if that could be done for video descriptions too.

Also, would it be possible—without making it overly complicated—to have a PiccyBot mix for videos as well? Basically, the idea would be to run the video through different models, then compile the details that at least three or more of them agree on. I know that’s probably super complex, and you’d have to figure out how to merge everything, but I think it could be really useful.