Most people are aware that Facebook is able to easily recognize faces. Recently, I discovered on Instagram that every photo is run through an extensive image processing algorithm. An "accessibility caption" is added into the source code so that those using a screen reader can have the content of the post read to them. Instagram recently added the ability to write your own accessibility caption, or "alt text" on the app when you post a photo.
The two photos below are from former Instagram CEO, Kevin Systrom's Instagram account. You can see how accurate the captions given to them by the image processing algorithm are.
"Image may contain: 1 person, airplane, sky and outdoor"
"Image may contain: 3 people, people sitting"
After seeing these captions, I decided to go on a quest to gather as many photos and captions as I could to see what else Instagram can recognize.
Out of about three thousand posts that I gathered, I found 113 unique descriptors that Instagram tags posts with in their accessibility captions. In order not to accidentally catch any accessibility captions that people had written themselves, I only analyzed captions that started with "Image may contain: " and then listed one or more descriptions in a "description, description, and description" format.
We'll start with what I think is the least objectionable talent the algorithm has: counting people.
Just over a third of the photos had a tag that described exactly how many people were in the photo and several hundred more said that the photo contained one or more people. There was even a tag called crowd that caught five photos. These photos were usually of street scenes where it is impossible to find the exact number of people in a photo.
While Instagram probably does tap into Facebook's vast knowledge of individual faces, they wisely do not share that information on the publicly available accessibility captions. However they do describe people in other ways.
There are categories for baby and child. There were only six photos tagged with baby of the thousands that I scraped, but each one of those photos was correctly tagged. The child tag caught four times as many photos, but wasn't very accurate. Many of the photos included photos of people who are clearly adults and even one elderly woman.
All the photos below came up with the child tag.
Although the child tag was pretty inaccurate overall, there are some standouts that give us insight into how the image recognition algorithm decides to categorize a photo as containing a child. Most notably, the photo of a child overlooking a body of water is impressive because it doesn't actually show the child's face. Looking at this photo along with some that were mis-identified, I think it is likely that the algorithm is seeing a combination of very smooth skin and the proportions of the person and using those factors to decide whether the image contains a child.
Images were also tagged with beard and eyeglasses, but I did not find that other identifying features such as hair styles or color, body size and shape, or perceived gender were used.
I was surprised by how many actions the processing algorithm has identified. In just the three thousand photos that I found, Instagram had identified some as having the following qualities:
People on a stage
People eating
People playing musical instruments
People playing sports
People sitting
People sleeping
People smiling
People standing
People walking
Riding a bicycle
Swimming
Of all these, I found that people smiling and people sleeping weren't very accurate. This is unsurprising: if you ask ten people to smile you will likely see vastly different results. However, the descriptions of specific activities were highly accurate. I've attached some examples of music instruments, playing sports, riding a horse, swimming, people eating, and riding a bicycle below.
Instagram is covered in wedding photos, so it's no wonder that the image recognition software has had a lot of practice discerning those photos. More surprisingly, indoor locations came up. Two of the photos below are tagged living room, one was tagged office, and the other as kitchen. The kitchen photo was also tagged 1 person, and indoor. I guess the algorithm doesn't yet have the words to describe what is going on in the rest of that photo.
Most photos, however, are just tagged indoor or outdoor. Tagging scenes can be understandably hard. About a thousand photos were tagged outdoor and three hundred were tagged indoor and less than fifty had an additional, more specific location tagged.
Most images that contained text had the text transcribed perfectly in the accessibility caption. I saw this in the classic motivational quotes superimposed over a mountain. However, some photos didn't have the text transcribed and were instead tagged as meme. Some of the photos are in the classic "meme font", like the cat meme below, and others had the word "meme" written on the image, like the iCarly meme below. But many didn't have any hard indications that this isn't a motivation quote, except for the placement of the words in relation to the picture. I am quite impressed with this level of identification.
Instagram also has lots of other objects it can identify, from clothing to phones to motorcycles to mountains.
While Instagram announced in a blog post in November of 2018 that it was adding these captions to create a "more accessible Instagram" it is highly unlikely that is the only reason. Image processing is a hugely complex project to undertake purely for something that makes the company no money. It is much more likely that Facebook has been working on this for another reason and automatically generated accessibility captions are a happy byproduct.
Did you post about your recent engagement on Instagram? Even if you didn't use the words "getting married" in your caption, Instagram and Facebook could easily determine that photos that get an unusually high number of likes and have captions that contain phrases like "the rest of my life" and "best friend" commonly lead to wedding photos. Even if you post no caption at all, the image processing can likely pick out a close up of a hand with a ring on it. Now they can send you ads for wedding dresses and popular honeymoon destinations.
After wedding-related photos have been posted, ads for budgeting applications or home loan services can be directed toward you. While Instagram doesn't seem to be able to identify pregnancy just yet, it does have a tag for ultrasounds. Later, when it sees a baby in your posts it can send you ads for diapers.
Anyway, I'm not suggesting that you should stop posting photos on Instagram and Facebook - Goodness knows this pales in comparison to the multitude of other privacy violations by social media giants. What I am saying is the next time you insist that "Facebook is listening to my conversations!" because you just got an ad for the thing you were talking about, think about the many other ways Facebook can get information about your life and decide if that's something you're okay with.
By going onto your own Instagram profile on the web you can easily find the accessibility captions for your own photos. Right-click on your profile page and select "View Source". Then you can ctrl+F and find "accessibility_caption".
Viewing the source for your profile page will show you the accessibility captions of the nine-ish most recent photos you've posted. You can select a specific photo and view the source of that specific photo to see its accessibility caption.
Keep in mind that Instagram only started adding the automatic accessibility caption around 2015, and back then it only said if there were people in a photo. Sometime around February of 2018 is when the alt text started have much more detailed descriptions of photos, identifying locations and objects within the post. Your photos are only processed at the time they're posted, so your old photos will likely not have any alt text.
I haven't been able to verify the accuracy of every one of the 113 tags I found in the photos I scraped because there are three thousand photos. So, all of the tags that I found are listed in this excel file. The file also contains links to each one of the photos that matched each tag so you can look through it for yourself.
Let me know what you find! I'd love to see a study of the accuracy of these tags, although I bet there's a whole wing at Facebook that's collecting that data to improve the algorithm as we speak.
Note about the origin of the photos: While I started by trying to only using public figure's photos, I was limited in the number of photos I could pull from each profile and found that searching public hashtags such as "#photographer" and "#business" were much more effective. I have tried not to pick more personal-looking photos to use as examples, but I haven't looked through every one of the three thousand photos that are that are linked in the data file at the end of this post. All the photos are publically available on Instagram, and if you have any issues you can make your post private to keep people from accessing it. If you find any objectionable photos or wish for your photo to be removed from this site please contact me and I will do so as soon as possible.