The Real Reason Microsoft Is Building So Many Computer Vision Apps
Turns out Microsoft isn’t as interested in rating mustaches or guessing ages as it is helping the visually impaired navigate the world.
For the past few years, Microsoft has been steadily releasing goofy little apps that use neural networks to perform tricks ranging from guessing your age and rating your mustache to describing photographs (often comically) and even telling you what kind of dog you look like.
But why? Entertaining though these apps are, they all seemed a little random—until a couple of weeks ago at Build 2016, when Microsoft revealed that these experiments are more than just a sum of their parts. In fact, they represent stepping stones on the road leading to Seeing AI, an augmented-reality project for the visually impaired that aims to give the blind the next best thing to sight: information.
Built by Microsoft Research, Seeing AI is an app that lives either on smartphones or Pivothead-brand smart glasses. It takes all of the tricks Microsoft developed using those „goofy“ machine learning apps and combines them into a digital Swiss Army knife for the blind. By helping the visually impaired user line up and snap a photograph using their device, the app can tell them what they’re „looking“ at; it can read menus or signs, tell you how old the person you’re talking to is, or even describe what’s happening right in front of you—say, that you’re in a park, watching a golden retriever catch an orange frisbee. Presumably, it has some excellent mustache detection skills, too.
„This isn’t the first app for the blind,“ admits project lead Anirudh Koul. „But those apps are extremely limited.“ One app might be dedicated just to helping you know what color you’re looking at. Another might read menus and signs, or tell you what box you’re holding in the grocery store based on the barcode. There are even photography apps for the blind.
But the problem with all these apps is fragmentation. For a blind person, using them seamlessly is like having to screw in a different set of eyes every time you want to read a paper or identify a color. Seeing AI can do all of the above—and more—all within the same app.
Of course, having so much functionality introduces its own design challenges. According to Margaret Mitchell, Seeing AI’s vision-to-language guru, context is key when trying to decode visual information to text. „If you’re outside, for example, you don’t want it to describe the grass as a green carpet anymore than you want it to describe a blue ceiling as a clear sky when you’re indoors,“ she says. It’s also challenging to know how much information Seeing AI should give users at any given moment. Sometimes, it might be more useful to list what’s around a user, while other times, a scene-description is better, so knowing when to automatically switch between modes becomes important.
These are just some of the problems the Seeing AI team is trying to work out before their software becomes a consumer-facing product. But already, Seeing AI’s software is proving indispensable to Microsoft software engineer Saqib Shaikh, who lost his sight at the age of seven. He has helped the Seeing AI team test and tweak its software, as well as identify features that sighted people might not think of as useful, but which the visually impaired really need. For example: finding an empty seat in a restaurant. „His guidance has been amazing,“ says Mitchell. „He can exactly identify what we should be returning and why.“
Although apps that use its machine-learning algorithms are routinely released by Microsoft Garage, neither Koul nor Mitchell could say when Seeing AI would be available for everyone to download. They only say it is a „research project under development.“ But this isn’t just some silly web toy. When released, Seeing AI will be an app that can fundamentally change a person’s life, while continuing the grand tradition of accessibility pushing design forward in exciting directions.