New AI app for describing images and video: PiccyBot

By Martijn - Sparkling Apps, 1 March, 2024

Forum

iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

PiccyBot on MacOS

PiccyBot is now available for MacOS as well. You need a M1 or higher Mac for this. You can benefit from any existing subscription by using the same Apple account. Just go to the App Store on your Mac and search for PiccyBot, it should pop right up.
The camera on the Mac won't be available, but you can describe any video or image stored on your Mac and it has all the regular PiccyBot features.

Guidance for taking pictures

I'm already subscribed, but would pay again for this. it'd be an wonderful thing to have especially if we could refine it to work fairly decent.

Pasting an image

Firstly, great news about the Mac app - thank you. I was initially unsure I was downloading the right thing as the App Store suggested it wasn't a Mac app, but I installed it and opened it up and it seems to work well.

On the Mac, my main use case for this sort of thing is getting images on the clipboard described. But being on my work computer, I don't really want the automatically detection of clipboard on. I noticed there was an option to turn it off. Is there another way I can paste the image in? I tried but I only ever managed to get the last text I copied into the text box as opposed to the last image.

Whilst the feature is fantastic, I'm still a little unsure about how much I like the automatic option on the phone either. For example, I could be in an app and share an image to PiccyBot. So the app opens and immediately I'm prompted if I want to paste in the text I happened to have copied to clipboard before. It's a minor thing but I think I would prefer to be in control of when it happens. Is the only other way to do this in iOS as per the original version? I never really got the hang of it because I don't think it was really set for VoiceOver.

I would quite like an easy way to paste on my own terms in both applications.

And going back to the Mac, the other thing I would really, really like is a way to have the image described locally without going to the cloud. I'm a bit reluctant to put work images in here - I probably will if I have to, but I would really rather have something that was all local in those cases. Please correct me if I am wrong but I don't think an option like this exists now, so can it maybe be added as a feature request?

Thanks again for continuing to work on this. The amount of new features that have come in since I first subscribed is incredible.

Photo guidance iOS and shortcuts for MacOS version

Guys,

In the latest update of PiccyBot, I have added a photo guidance mode. Switch to front facing camera while using VoiceOver, and you will get spoken guidance on whether you have centered your face and if you are the correct distance. Hope this helps! It even works in all PiccyBot languages.

For MacOS, keyboard shotcuts Command I and Command V now work to select images or videos. This should allow easier keyboard only control of PiccyBot on MacOS.

Thanks for the feedback as always!

Tried it out.

Works as well as the Guided frames feature in Google pixel devices. It's come out really well for a first implimentation. but it'd be really nice and useful if it can be developed as a dedicated photography tool for the visually impaired with detailed instructions as to frame and capture pictures, using both cameras and not just portraits, but also of landscapes etc. Maybe one will have to surrender some of the creative autonomy to the AI in such a tool, but I'd be fine with that.

Added haptic feedback as well

The latest version adds haptic feedback when you are centered and at the correct distance for a selfie.

This selfie mode is quite popular and I am considering adding a separate app for just this feature. PiccyBot is getting a bit heavy on features, of which many are not used often. Separate 'one thing' apps may be more practical.

Dedicated app

I feel that'd be great. The way I see it, the primary function of PiccyBot is image/video recognition and the primary function of the new app should be photography. Like I said in my last comment, I'm really looking forward to that app becoming an actual photography tool for the blind camera user. That would be empowering for so many at so many levels.

Feature Request: Rear Camera Guidance with Multiple Face Detecti

Hello Martijn and the PiccyBot community,
Thank you for developing such a fantastic and useful app! I find PiccyBot's image and video description features incredibly helpful.
I have a suggestion for a feature that would greatly enhance the experience for taking photos of people:
Could you please consider adding guidance for using the rear camera that also incorporates the ability to detect and count multiple faces?
Currently, using the rear camera for photos of people, especially groups, can be challenging. Adding audio guidance (like "Move left," "Two faces detected," or "Closer," "Further away") would make it much easier to frame the shot correctly and ensure everyone is in view before taking the picture.
This would be a game-changer for group photos and is a feature many users would appreciate.
Thanks again for all your hard work!

Agreed

Yes thanks so much for continuing to improve the app.
I agree that having this kind of feature for the rear camera too would be great. And in both cases, having it work for both single and multiple faces or subjects.
If you were to go down the road of spinning it off into a separate app, I wonder could you designate it as an app that can be launched with the camera control button?
Dave

Added another setting

The idea of a camera spinoff app with initial rough quick feedback as a guide is interesting and am looking into it. For now, I have added an option in settings to allow PiccyBot to automatically take a photo and describe it, while using the front facing camera when it finds a face is properly in focus. This is available in the latest update, it was a requested feature.

Updates

PiccyBot has been updated with the latest models this week: GPT 5.1, Gemini 3 Pro and Grok 4.1. Note that these are used for image descriptions only, for video descriptions PiccyBot still relies on earlier versions.

Working on integration with Meta glasses

Hi guys, with Meta now gradually releasing their SDK for their glasses, developers can now access the live feed from the glasses within third party apps.

This is the first test I have done with PiccyBot processing this feed. Next step should be processing it handsfree and do video descriptions..

https://www.youtube.com/watch?v=L-0U7bc3ucE

This is brilliant!

@Martijn you continue to be one of the first to bring these promissed excitements to the community! All the best for the good work!

super excited for this!

I'm super excited to get my hands on this!
Having alternatives to meta AI will be a welcome change for many of us.
I do have a suggestion for the mobile app.
Would it be possible for the guided selfie mode to have the option to start a countdown and automatically capture selfies?
This was one of my favorite features of selfieX before it died.

Sounds great

Thanks again for the update, Martijn. I love seeing all these new things appear in PiccyBot and can't wait to give this a go.

Next step in the Meta integration: handsfree

Gokul, Quinton, Mr Grieves, thanks a lot! I have taken it a step further by adding a voice trigger to process images from the Meta video stream. The API is limited and they promise more features by end of next month but let's see what we can cobble together already:

https://youtu.be/a1Ue8M6dWaM

It's definitely coming along

I'm looking forward to seeing this evolve, as more tools become available. :-)

Does this work right now?

For everyone I mean?

Hands-free

That sounds amazing. Can you just clarify what is going on?

I think you are opening up PiccyBot as normal. Is it then sitting there listening out for a voice command, which you can speak through the microphone in the glasses? And so at that point it takes a picture and does its thing?

So if I was going out and about, could I just leave PiccyBot running and then talk to either meta or PiccyBot as needed? Does PiccyBot need to be in the foreground? Does it matter if the phone is locked?

Anyway really excited by this. I love how this app always seem to be ahead of the pack with new features, and genuinely useful ones at that.

Clarification

Gokul, no this is not yet available, I am working on it. Expect an integrated release next month.

Mr Grieves, you open PiccyBot as usual, in settings you select that you want to link with Meta glasses. It will then start streaming the Meta output to the PiccyBot app.
The voice command is currently only picked up by the app, not from the glasses. Meta has indicated they will add this to the SDK in January.
Right now (development version), PiccyBot would need to be running in the foreground. And with a separate audio input. So you can start it and say both Hey Meta (picked up by glasses) or Capture (picked up by phone). But with the current version you have to constantly run it, so this would not be practical or good for your phone's and glasses battery life. Still lots of work to be done..

Description and a bug

I found a bug in the app: tapping in the subscription page the price of the different purchaase options with VoiceOver on doesn't activate the relative purchase option, I had to turn off VoiceOver and try tapping the right spot. Secondly a question: I sent a video and the app described it in a text description which the voice I chose read it sequentially without the audio of the video underneath it. Would it be possible to have the video and the audio playing at the same time and the audio is played and generating respecting the silences of the video, like it's done in a real audio description for films or TV shows? Meaning, it tells what happens when it really happens in the video.
Lastly, could we have more voice options for more languages? OpenAI voices such as Fable, Onix etc. are very good in English, but struggle with other languages like Italian, while ElevenLabs voices are much better.

Voiceover bug

Knut, thanks for pointing this out! I will release a fix for this either today or tomorrow.

Regarding the audio description, I will look at it. Earlier attempts to synchronise video and audio description didn't work out due to model costs and slow performance, but now with new AI models such as Gemini 3 available, I will check it again.

Regarding the AI voices: quite a few users actually don't use them at all, and just rely on their preferred VoiceOver voice. To use the Elevenlabs voices through API was very expensive last time I checked. I would have to raise the subscription costs of PiccyBot by quite a bit, which I fear would not be appreciated.

ask more

When I'm in ask more to ask follow up questions, the app hangs. I'm noticing this on the latest update and this seems to happen quite frequently across the different LLM engines.