Posts Tagged ‘OCR’

Reid My Mind Radio – Microsoft Seeing AI – Real & Funky

Wednesday, August 2nd, 2017

!T.Reid wearing a hat with a "T" while the Seeing AI logo is imposed on his shades!
Okay, I don’t usually do reviews, but why not go for it! All I can tell you is I did it my way; that’s all I can do!
It took a toll on me… entering my dreams…
I’m going to go out on a limb and say I have the first podcast to include an Audio Described dream! So let’s get it… hit play and don’t forget to subscribe and tell a friend to do the same.

Resources:

Transcript

Show the transcript

TR:

Wasup good people!
Today I am bringing you a first of sorts, a review of an app…

I was asked to do a piece on Microsoft’s new app called Seeing AI.for Gatewave Radio.

The interesting thing about producing a tech related review for Gatewave is that the Gatewave audience most likely doesn’t use smart phones and maybe even the internet. However, they should have a chance to learn about how this technology is impacting the lives of people with vision loss. Chances are they won’t learn about these things through any mainstream media so… I took a shot… And if there’s anything I am trying to get across with the stories and people I profile
it’s we’re all better off when we take a shot and not just accept the status quo

[Audio from Star Trek’s Next Generation… Captain La Forge fire’s at a chasing craft. Ends with crew mate exclaiming… Got em!]
[Audio: Reid My Mind Radio theme Music]

[Audio: Geordi La Forge from Star Trek talk to crew from enemy craft…]
TR:
Geordi La Forge from Star Trek’s Next Generation , played by LeVar Burton, was blind. However, through the use of a visor he was able to see far more than the average person.

While this made for a great story line, it also permanently sealed LeVar Burton and his Star Trek character as the default reference for any new technology that proposes to give “sight” to the blind.

[Audio: from intro above ending with Geordi saying…
“If you succeed, countless lives will be affected”
TR:
What exactly though, is sight?

We know that light is passed through the eye and that information is sent to the brain where it is interpreted and
quickly established to represent shapes, colors, objects and people.

A working set of eyes, optic nerves and brain are a formidable technological team.
They get the job done with maximum efficiency

Today, , with computer processing power growing exponentially and devices getting smaller the idea that devices like smart phones could serve as an alternative input for eyes is less science fiction and well, easier to see.

There are several applications available that bring useful functionality to the smart phone ;
* OCR or optical character recognition which allows a person to take a picture of text and have it read back using text to speech
* Product scanning – makes use of the camera and bar codes which are read and the information is spoken aloud again, using text to speech
* Adding artificial intelligence to the mix we’re seeing facial and object recognition being introduced.

Microsoft has recently jumped into the seeing business, with their new iOS app called Seeing AI… as in Artificial Intelligence!
There’s no magic or anything artificial about these results, they’re real!

In this application, the functionality like reading a document or recognizing a products bar code are split into channels. The inclusion of multiple channels in one application is already a plus for the user. Eliminating the need to open multiple apps.

Let’s start with reading documents.

For those who may have once had access to that super-fast computer interface called eyes , you’re probably familiar with the frustration of the lost ability to quickly scan a document with a glance and make a quick decision.

Maybe;
* You’re looking for a specific envelope or folder.
* you want to quickly grab that canned good or seasoning from the cabinet.

With other reading applications you have to go through the process of taking a picture and hoping you’re on the print side of the envelope or can. After you line it up and take the picture you find out the lighting wasn’t right so you have to do it again.

Using Microsoft’s Seeing AI you simply point the phones camera in the direction of the text

[Audio App in process]

Once it sees text, it starts reading it back! The quick information can be just enough for you to determine what you’re looking for. In fact, during the production of this review, I had a real life use case for the app.

My wife reminded me that I was contacted for Jury duty and I needed to follow up as indicated in the letter. The letter stated I would need to visit a specific website to complete the process. I forgot to put the letter in a separate area in order to scan it later and read the rest of the details. So rather than asking someone to help me find the letter, I grabbed the pile of mail from the table and took out my iPhone.

I passed some of my other blindness apps and launched Microsoft Seeing AI. I simply pointed the camera at each individual piece of paper until finding the specific sheet I was seeking. The process was a breeze. In fact, it was easier than asking someone to help me find the form. Ladies and gentlemen, that’s glancing!

Now that I found the right letter, I could easily get additional information from the sheet by scanning the entire document. I don’t need to open a separate app, I can simply switch to a different channel, by performing the flick up gesture.

Similar to a sighted person navigating the iPhone’s touch screen interface , anyone can non visually accomplish the same tasks using a set of different gestures designed to work with Voice Over, the built in screen reader that reads aloud information presented on the screen.

Using the document channel I can now take a picture of the letter and have it read back.

One of the best ways to do this is to place the camera directly on the sheet in the middle and slowly pull up as the edges come into view. I like to pull my elbows toward the left and right edges to orient myself to the page. Forming a triangle with my phone at the top center. The app informs you if the edges are in view or not.
Once it likes the positioning of the camera and the document is in view, it lets you know it’s processing.

[Audio: Melodic sound of Seeing AI’s processing jingle]

You don’t even have to hit the take picture button. However, if you are struggling to get the full document into view ,
you could take the picture and let it process. It may be good enough for giving you the information you’re seeking.

If you have multiple sheets to read, simply repeat.

Another cool feature here is the ability to share the scanned text with other applications. That jury duty letter, I saved it to a new file on my Drop Box enabling me to access it again from anywhere without having to scan the original letter

Let’s try using the app to identify some random items from my own pantry.

To do this, I switch the channel to products.

[Audio: Seeing App processing an item from my pantry…]

What you hear, is the actual time it took to “see” the product. All I’m doing is moving the item in order to locate the bar code.
As the beeps get faster I know I am getting closer. When the full bar code is in range, the app automatically takes the picture and begins processing.

[Audio: Seeing AI announces the result of the bar code scan… “Goya Salad Olives”

It’s pretty clear to see how this would be used at home, in the work environment and more.

Now let’s check out the A I or artificial intelligence in this application.

By artificial intelligence, the machine is going to use its ability to compute and validate certain factors in order to provide the user with information.

First, I’ll skip to the channel labeled Scene Beta…
Beta is another term for almost ready for prime time. So, if it doesn’t work, hey,, it’s beta!

Take a picture of a scene and the built in artificial intelligence will do its best to provide you with the information enabling you to understand something about that scene.

[Seeing AI reports a living room with a fireplace.]

This could be helpful in cases like
If a child or someone is asleep on the couch.

[Audio: Action Movie sound design]

I can even picture a movie starring me of course, where I play a radio producer who is being sought by the mob. The final scene I use my handy app to see the hitman approaching me. I do a round house kick…
ok, sorry I get a little carried away at the possibilities.

While no technology can replace good mobility travel skills I can imagine a day where the scene identification function will provide additional information about one’s surroundings.
Making it another mobility tool for people who are blind or visually impaired.

Now for my final act… oh wait it’s not magic remember!

Microsoft Seeing AI Offers facial recognition.
That’s right, point your camera at someone and it should tell you who that person is… Well, of course you have to first train the app.

To do this we have to first go into the menu and choose facial recognition.
To add a new person we choose the Add button.
In order to train Seeing AI you have to take three pictures of the person.
We elected to do different facial expressions like a smile, sad and no expression.
Microsoft recommends you let sighted family and friends take their own picture to get a good quality pic.

The setup requirement, while understandable at this point sort of reduces that sci fi feel.

After Seeing AI is trained, once you are in the people channel
when pointing your camera in the direction of the persons face, it can recognize and tell you the person is in the room.

[Audio: Seeing AI announces Raven about 5 feet in front.]

Seeing AI does a better job recognizing my daughter Raven when she smiles. That too me is not artificial intelligence because we all love her smile!

The application isn’t perfect. it struggled a bit with creased labels, making it difficult to read the bar code.

Not all bar codes are in the database. It would be great if users could submit new products for future use.

As a first version launch with the quick processing, Seeing AI really gives me something to keep an eye on. Or maybe I should say AI on!

Peering into the future I can see;

* Faster processing power that makes recognition super quick,
* Interfacing with social media profiles to automatically recognize faces and access information from people in your network
* lenses that can go into any set of glasses sending the information directly to the application not requiring the user to point their phone
at an item or person and privately receiving the information via wireless headset.
That could greatly open up the use cases.

In fact, interfacing with glasses is apparently already in development and
the team includes a lead programmer who is blind.

Microsoft says a Currency identification channel is coming in the future;
making Seeing AI a go to app for almost anything we need to see!

The Microsoft Seeing AI app is available from the Apple App store for Free 99. Yes, it’s free!

I’m Thomas Reid
[Audio: As in artificial intelligence!]
For Gatewave Radio, audio for independent living!

[Audio: Voice of Siri in Voice Over mode announcing “More”]

I don’t know if that’s considered a review in the traditional sense, but honestly I am not trying to be traditional.

The thing is, thinking about the application started to extend past the time when I was working on the piece…

That little jingle sound the app makes when it’s processing… it started to seep into my dreams…
[Audio: Dream Harp]

[Audio: “Funky Microsoft Seeing AI” An original T.Reid Production]

The song is based around the processing tone used in the app with the below lyrics.

(Audio description included in parens)

(Scene opens with Thomas asleep in bed with a dream cloud above his head)

The processing sound becomes a sound with Claps…

(We see a darkened stage)

(As the chorus is about to begin spotlight shines on Thomas & the band)

Chorus:
Microsoft Seeing AI
Helping people see without their eyes

Microsoft Seeing AI
Helping people see without their eyes

(Thomas rips off his shirt!)

Verse:
Download the app on my iPhone

{Background sings… “Download it, Download it!}

Checking out things all around my home

(Thomas dances on stage)

Point the camera from the front
Huh!
Point the camera from the back!

I’m like;
what’s that , what’s this
Jump back give my phone a kiss!
Hey! (James Brown style yell!)

(Thomas spins and drops into a split)

Chorus:
Microsoft Seeing AI
Helping people see without their eyes

Microsoft Seeing AI
Helping people see without their eyes

(Back in the bed we see Thomas with a fading dream cloud above his head)

Ends with the app’s processing sound.

TR:
Wow, definitely time to move on to the next episode…

With that said, make sure you Subscribe wherever you get your podcasts. Tell a friend to do the same – I have some interesting things coming up I think you’re going to like.
And something you may have not expected!

[Audio: RMMRadio Outro]
TR:
Peace!

Hide the transcript