Posts Tagged ‘AI’

Blind Centered Audio Description Chat: – When AI Comes for the Blind

Wednesday, March 8th, 2023

While all the world is talking about AI, Artificial Intelligence, the Blind community has been dealing with what appears to be the eminent take over for the past few years. That’s the adoption of AI and Text to Speech in Audio Description.

In this last BCAD Chat of 2022 we wanted to discuss the pros and cons of AI and TTS voices narrating Audio Description.

Use this letter as a template to personalize and express your concerns about TTS in AD.

Shout out to Scott Blanks & Nefertiti Matos Olivares for the draft.

Join Us Live

The BCAD Live Chats can take place on a variety of platforms including Twitter and Linked In.

To stay up to date with the latest information and join us live follow:
* Nefertiti Matos Olivares
* [Cheryl Green]*(https://twitter.com/whoamitostopit)
* Thomas Reid](https://twitter.com/tsreid)

Listen

Transcript – Created By Cheryl Green

Show the transcript

Music begins
THOMAS: Welcome to the Blind-Centered Audio Description Chats. These are the edited recordings of the Blind-Centered Audio Description Live Chats!
CHERYL: The live is the most fun part! We get together, we start with a question, and then we invite up anybody from the audience who wants to come and chat with us, agree, disagree, shed light on something that we hadn’t thought about before, which is Nefertiti’s favorite. [electric whoosh]
NEFERTITI: I’m Nefertiti Matos Olivares, and I’m a bilingual professional voiceover artist who specializes in audio description narration! I’m also a fervent cultural access advocate and a community organizer.
CHERYL: I’m Cheryl Green, an access artist, audio describer and captioner.
THOMAS: And I’m Thomas Reid, host and producer Reid My Mind Radio, voice artist, audio description narrator, consultant, and advocate.
SCOTT B: Hi, I’m Scott Blanks. I’m a passionate advocate for the highest quality audio description in all of the arts. I’m the co-founder of the LinkedIn Audio Description Group and the Twitter AD community.
SCOTT N: Scott Nixon here. I’m an audio description consumer and advocate, hoping to be an audio description narrator very, very soon. [electronic whoosh]
THOMAS: Hey, Nef, why don’t you tell people how they could join the live recording?
NEFERTITI: That’s really simple. Just follow us on social media to keep up with important details, such as dates, times, and what platform will be using. On Twitter, I’m @NefMatOli. Cheryl?
CHERYL: I’m @WhoAmIToStopIt.
THOMAS: I’m @TSRied, you know, R to the E I D.
NEFERTITI: How about you, Scott?
SCOTT B: I’m @BlindConfucius. That’s Blind Confucius.
SCOTT N: And you can catch me on my social media, Twitter only. That’s @MisterBrokenEyes, Capital M r Capital Broken Capital E y e s.
[smartphone selection beeps]
CHERYL: Recording now!

NEFERTITI: Welcome, welcome. Welcome. And welcome, everyone! Tell a friend if you haven’t already. This is a conversation all about TTS, text to speech, and audio description. Place, the place for TTS in audio description. Is there one? What is it? Do you hate it? Do you love it? If you love it, we are particularly interested in hearing from you tonight. I’d love to hear from people who might change my mind or might make me think a little differently about this topic, because, frankly, I am strongly against TTS.
THOMAS: So, the conversation is all about AI and TTS, mainly TTS. But I figure we should have a little conversation about both, because they’re sort of used together. And I think there is a little bit of a difference when folks talk about AI, or artificial intelligence, in the audio description space and then TTS or text to speech. And so, a little bit of the difference: the AI, artificial intelligence, that usually refers to, so, that is computers sort of learning on their own and adjusting, making changes, and doing the things that humans would usually have to program. But the artificial intelligence and TTS sort of amounts to, if you, which I’m pretty sure everyone here has probably heard audio description when the speech comes in, and then the sound is ducked, well, that’s the artificial thing that’s happening right there. There’s sometimes when it’s done via AI. It’s not a human who’s actually sort of mixing the sound. The artificial intelligence is saying, “Okay, I’m gonna put it here. I’m gonna duck this down, go back up when the speech is finished. I’m gonna duck now. The speech is coming in, so I’m gonna duck the track.” And then you’re gonna hear mainly the speech. “And then when it’s finished saying the audio description, I’m gonna go back up with the track.” And so, the film itself will start playing louder. It’s not clean. It’s sort of a jumpy thing, kind of takes you out. And it’s kind of annoying. It’s kind of annoying. So, that’s part of the artificial intelligence.
There’s some AI that I think they’re also working on when it comes to, I don’t know if anyone’s doing that right now, but actually writing the audio description, as far as I heard. I think that might be in the works if it’s not actually out there. If someone knows if it’s been done, you tell me. But then the TTS part is what we all know as the text to speech or the synthesized speech. That’s the computer who is taking the job of the narrator. And I just mean that on that particular film. I’m not making a blanket statement about these things taking jobs from narrators, but, you know, in a way it’s happening. [laughs] So, yeah.
And so, the questions that we usually get into, the discussion usually is sort of like pro or con. Do you like it? Do you not? Are you okay with it? So, we can start off there. But that’s really, I don’t think that’s really where wanna to stay, because right now, whether we’re pro or con, I think we need to think about that the industry and those who are really offering this and pushing this well, they’re very pro. And they’re pro, we know, because not because of the artistic value of synthetic speech, but they’re pro because they wanna save some money, and as I like to say, so Jeff Bezos can go to space and whatever else he wants to do with all that money. That’s a whole nother conversation. I don’t know what you can do with all that money, but whoo. Anyway. But apparently what you cannot do is provide good audio description! [laughs] I said it!
I wanted to frame the conversation, but Neff and everyone else, Cheryl, Scotts, I’d say the two Scotts, if y’all wanna talk about pro/con, because I think the thing that would be interesting, maybe we can make the argument, maybe we can even invite some folks up to take a side of pro and con first and just to sort of get that to hear why people might actually be pro and hear what their arguments are, because it’s always good to hear from folks.
CHERYL: Cheryl here, and I will say that I’m con. I’m firmly on the con side that Thomas laid out. But I wanna be clear that that is not because I’m a professional audio describer, and I am sad that a computer is taking my job. And I could be, but I feel like, as in the sighted describer community, the narrator community, voice talent community, we need to be careful that our main argument against it isn’t, “I might lose my job.” It is awful to lose the job, but the point that is important to me is that my job is about creating audio description for the audience to have a wonderful, immersive experience. So, it’s the audio description and the user’s perspective, I think, that really should be paramount here when we discuss it. I think it’s fine to have a conversation about jobs, but that might be a different space because these conversations are blind-centered audio description conversations. So, I would ask that if there’s voice talent in here, that we keep it centered on what is the experience the audience is getting? And I just don’t feel like TTS offers, and especially the AI-written stuff, it doesn’t offer the nuance. I’ve seen things where the description is focused just on that moment between dialogue, but there was no opportunity to hear a description about anything that happened before the dialogue. There’s no context. These lines sort of float in space and don’t seem to connect and make a cohesive whole. So, I’ll stop there and hand it over to anybody else. Thanks.
NEFERTITI: How about you, Scott Blanks?
SCOTT B: I am one who is against TTS in the vast majority of the media that is currently being audio described. Let me elaborate. So, when I think about the arts or entertainment broadly writ, I’m thinking about film, TV, stage, other creative presentations, art, artistic exhibits, things like that. I feel in those spaces that as things currently stand that TTS, as has already been mentioned by a few people, is, it is not what is going to make the experience a quality one, and it doesn’t make it an accessible one. The point of audio description is accessibility, and audio description can also be considered an art form. But even if you just consider it on the accessibility side, if the accessibility tool is a synthetic voice that is mispronouncing words, that is, as has been mentioned, there’s an odd rhythm or arrhythmia to it that takes you out of the experience, then your experience is not only not as immersive, it’s not as accessible. And that’s the point of audio description in all of the contexts that we know it right now, and in a lot of the context where we don’t know it.
And I would say if I were, if I were to be pro audio description through TTS narration, it might be in some of those spaces where there is no option right now. If there was a way to access information that scrolled on a TV screen, real-time, newsworthy information, that might be something that I could see because having the quickest access possible to that information is really critical. And I don’t think it would be feasible to think that we could have a human standing by 24/7 on literally thousands of different networks, TV stations, feeds, whatever to provide that. But I think we have to kind of keep our focus. Most of the professionals here, the professionals on this panel that I’m fortunate to be alongside here are audio describing or writing for audio description or providing other contributions to the audio description field through arts and entertainment. And in that space, I don’t see that TTS has a place in the provision of audio description in 2022.
NEFERTITI: Beautifully said. I could not agree more. All right, Scott Nixon, let’s hear from you!
SCOTT N: All right. I would like to conduct a small thought exercise for the sighted people in the audience today. You’re at an art museum. You’re, well, you’re at the Louvre. Okay. You’re standing in front of the Mona Lisa itself, its glory, its majesty, its beauty. You’re drinking it in with your eyes. Imagine for a moment you couldn’t actually see the painting. You couldn’t experience it the way everybody else experiences it, so you have an audio description device plugged into your ear. Would you prefer a member of the artistic community talking passionately about the magnificent painting you’re seeing before you, [imitates stiff robotic voice] or would you like a robotic voice explaining to you what it looks like? [back to regular voice] That is what we’re talking about with TTS.
I myself am vehemently anti-TTS for audio description because it robs something you’re watching of its soul, okay? I have watched sitcoms and movies and various other forms of media with TTS audio description, as, you know, as a curiosity over the years, and it really does take something away from the experience. Why should we as a blind community have to have a lesser experience than everyone else just because a company wants to save a couple of thousand bucks? It is literally a matter of a couple of thousand bucks between bad TTS and even minimally good audio description. So, why not do it? The simple answer is they don’t think we matter enough.
So, at the end of the day, this is something I always say when I’m talking to people about accessibility, audio description, accessible websites, all that sort of stuff, “You are a company. You are ostensibly here to make money. If you make a quality product and an accessible product that vision-impaired and blind people will enjoy, we talk. We talk to each other. We tell people when something is good. If you build it, we will send you sack loads of money. So, why are you sitting on your butts doing something that you shouldn’t be doing?” And that’s me done for now.
THOMAS as Audio Editor:

THOMAS as Audio Editor:
Hey Y’all, I just need to interrupt for a moment.
During this live conversation, we had a challenge getting our technology to work. Well, that we is really me.
We wanted to play a clip in order to have a sample to discuss.

Mmy technology is working today so even though we didn’t have the chance to discuss it, you can have a chance to hear the sample.

Check this out!

Downton Abbey clip:

Test to Speech Audio Description Narrator:
In the English countryside, a turn of the century train barrels past the lake

it rumbles by dead leaves and bare branch trees. puffs of white steam ripple out from its engine and below on to the rolling green hills.

On board a large black haired man in his late 40s peers out his window. Steam envelops the wires of utility poles.

In a village, a wire travels between quaint stone houses to a telegraph office.

NEFERTITI: Amazon Prime is where you can find this example. It’s a show that was hugely popular called Downton Abbey from our British neighbors over there across the pond. What isn’t beautiful about this experience is that as majestic as the show is, it has TTS. And the TTS says a lot of the things or is guilty of a lot of the things that Scott Blanks mentioned: mispronouncing names, misnaming names. So, in addition to the audio description script being kind of crappy, then on top of that, you have this robotic voice who, for those of you who are blind and in the audience who use a screen reader, it’s worse than like the Eloquence screen reader. Eloquence, for those who are not aware, is the most popular, widely used screen reader that blind people use to get around on the Internet, on PCs, on Windows machines. So, I mean, it’s just super distracting, really kind of offensive, and just not at all in keeping with the content, which is very dramatic and passionate! But then you have this [imitates robotic voice] TTS voice: The train rides down the rails. You know, it’s just, it’s, and not for nothing, but I sound great compared to the TTS just now. So, it’s just, it’s just so inappropriate.
SCOTT B: It’s Scott Blanks just to jump in. And if you’ve not ever enabled audio description on Prime video, you can do that once you start playback of an item. There should be an audio and subtitles option on your playback screen that you can access, and in there, you would wanna choose, in the case of Downton Abbey, English audio description.
SCOTT N: Just as an example of bad versus good, the American sitcom, The Big Bang Theory, huge hit in its day. Audio description turned up on Amazon Prime here in Australia about a year ago, and I was all gung-ho and ready to listen to it. I put on the first episode, bang. TTS. Completely robbed the show of its humor and its charm. I gave up after two episodes. Now, this year, HBO Max in America have apparently provided a human-narrated audio description track, and I was played a brief sample of it: 12,000% improvement. It gave the humor of the show. The audio description narrator was playing along with the jokes, smiling at the right times, frowning at the right times with his voice and all that sort of stuff. And it really did enhance the experience. So, the difference between TTS and human AD is like night and day. It’s just really a really important thing. And like I said, it helped to bring the soul of the show alive to the people who can’t see the soul that they put up on the screen. And that’s me.
THOMAS: In this conversation, TTS is sort of the demon. It’s the bad guy. But, you know, the technology’s not the bad guy. Like, we use TTS as blind people, as people with disabilities in general. We use TTS. TTS can, I love my screen reader. It gives me access. The screen reader is my input. It’s the way that I take in information. That is, that’s my access. The screen reader, that’s my guy! [laughs] Like, you know what I mean? Because he’s helping me out all the time. And then in order for me to have digital output, screen reader’s my guy. Like, I need him or her or them, right? And so, it shouldn’t necessarily be demonized. And I think that sometimes there are other people with other disabilities that make use of access technology, of TTS as well. So, Cheryl, you wanna talk about that?
CHERYL: Thanks, Thomas. I feel like you’ve framed it up so beautifully. The point that I wanted to make is that I have listened to different panels and read things and heard people arguing against TTS, which again, to reiterate, [chortles] I am not for TTS, especially as the Scotts pointed out, in a museum or a work of, a film, an art piece, an art film. But what troubles me is sometimes the reasons given end up incorporating a lot of ableist slurs and a lot of really harsh language, which I’ve heard none of tonight. But what I want us to be careful is, like Thomas said, to not demonize the technology. And for those folks who have a lot of communication through one of these systems where they’re typing or selecting images and some kind, a synthesized voice comes out, that’s communication. And so, it’s not the voice that’s “awful and soulless and inhuman.” I just want us to be careful. And when you leave this session and you go out and you promote or you speak about the harms and the problems with TTS, that you be careful to not be too ableist and throw augmentative and alternative communication users under the bus while insulting the sounds of these voices. It’s not the sound of the voice, it’s the application. And like Thomas was talking about, when the AI adjusts the volume of the soundtrack for this TTS to come in, it is like my head starts spinning. It’s just so jumpy. It’s so, it’s not artistic, and it doesn’t fit the vision of the film or the show. So, I’ll pause there.
THOMAS: And I also wanted to jump in with two podcasts ‘cause I think Cheryl, you had a podcast with a AAC user, and I think, so, if folks wanna kind of get to see how people use these devices and how it’s so intertwined with their life, that’s one. So, what’s the name of that podcast, Cheryl? I think it’s called Pigeonhole.
CHERYL: Oh, my! No, no. People should go to endever’s podcast. AAC Town is the podcast that endever* and their comrade, Sam, run. They’re both AAC users full-time or nearly full-time, and they have a podcast. It’s all transcribed. But yes, I did have endever* on my show, Pigeonhole, one time.
THOMAS: That’s what I was talking about.
CHERYL: Yeah.
THOMAS: You had them on your show.
CHERYL: But it was to talk about—
THOMAS: Let me do what I gotta do, Cheryl! [laughs]
CHERYL: OK! [laughs]
THOMAS: You always shout out my podcast, and so I wanna shout out yours. But mainly because it applies, right? I don’t wanna just shout, you know, I’m not just randomly shouting out podcasts. Although I do that around here. Every two hours I open up my window and I shout it out, “Pigeonhole!”
BOTH: [guffaw]
The other one is I was gonna say, now I am gonna do a promo of mine, is because I had a conversation with Lateef McLeod. And Lateef McLeod is a AAC user. And in that episode, we really go into some of the other issues around TTS that never necessarily get talked about. Lateef McLeod is an African American, and the voices that he had all his lives don’t really represent him until he got a voice that was a synthesized voice of a Black man. And so, you know, these issues are big, right? We always talk about it, like, these issues are really big. And so, and in that, Nefertiti was actually in that episode, too, where we did a little bit of a little skit about TTS that touches on a bunch of these things. So, anyway, it was a cool episode, I think. And so, both of those, check it out, and we get into these conversations as well. That’s it. I’m Thomas. I’m done.
NEFERTITI: Yeah. So, I think the summary here is let’s express ourselves, but be mindful to not sort of turn around while advocating for one accessibility, mm, putting down another, you know, or minimizing, punching down on another. So, I think that’s a great point. And we had a great clip to show you related to that, where human narration meets TTS and how it was used judiciously, minimally, but in a way that really drove home the point of where maybe it’s appropriate.
Scream Trailer from Social Audio Description Collective

AD Narrator – Nefertiti Matos Olivares:
The lights are on In a white suburban house at night. A silver cordless landline rings with the ID, Unknown Name.

In the kitchen, Tara pushes reject on the cordless while holding her smartphone. She is a thin light skinned Latina teen with long wavy dark hair pulled back in a ponytail.

She’s just texted

TTS Receiving Text Message:
Mom’s out of town again you should come over here. Free dinner, Many binge watch options.
AD Narrator – Nefertiti Matos Olivares:
Amber responds.
TTS Sending Text Message:
Have to do better.

TTS Receiving Text Message:
Unlock liquor cabinet.

— Landline phone rings

TTS Receiving Text Message:
You should answer it

TTS Sending Text Message:
How did you know my landline was ringing?
Amber?

TTS Receiving Text Message:
This isn’t Amber.

Tara speaking on landline:
This isn’t funny amber

Deep Menacing Voice over landline:
Would you like to play a game? Tara?

Suspenseful Crescendo closes the scene.

THOMAS: That had some really different reactions that I wonder where people stand with.
NEFERTITI: First, I wanna say that this is for a Scream trailer that the Social Audio Description Collective described. It’s a bit of a hacker horror type film. And we had a human narrating the audio description, but there was a scene between somebody who was on camera and somebody who was off camera, and they were texting one another. And so, we decided, full disclosure, I’m part of the Social Audio Description Collective, we decided that why not use a synth to say what those lines of texts were rather than having the human describer say them? Just like we blind people experience TTS all the time with our screen readers, etc., why not just put one of those voices to those text messages? And it was a very brief exchange but still sort of drove home the point.
CHERYL: It worked so beautifully because I’m watching the screen, and I’m seeing basically a computer screen, words pop up on a computer screen. So, hearing that screen reader voice read it was really cool. And it really uplifts, in my mind, the ingenuity and creativity of disability community. Like, who would’ve thought to do that besides people who interact with these voices all the time? I thought it was such an add to the, it elevated the art, I thought.
THOMAS: Yeah, and I think I remember that there were some comments from folks who I don’t think were blind who were very negative toward that text to speech being included in there. And it was like, wow, like this is totally my experience. This is a text message. That was what a text message sounds like to us.
NEFERTITI: Mmhmm.
THOMAS: You know? And so, again, to me, highlighting that no, audio description should always center blind people. And so, blind people need to be a part of this, and blind people need to be a part of that conversation, which is part of my issue personally with, and so, advancing this a little bit, is the framing of this conversation of audio description and the way it’s been framed within the community by those outside of the community, those creating it, the corporations, right? Is that hey, TTS is good because you will get more. So, it’s either, if you want more audio description, then you take TTS. And that, framing TTS that way, is the biggest problem that I have with this entire subject is because we are being told we are being given options, and it’s two options, and we have never been consulted. And if they tell me that, “Oh, no, you were consulted as a community because we issued a survey that some folks got to fill out,” I don’t care about that. Because the thing is, is that it’s still based on that option that you give me. So, a lot of people would say, “Well, if these are my only two options, TTS or no audio description,” I can see why a lot of people would go that route.
NEFERTITI: Mmhmm.
THOMAS: But that’s not the route, that’s not the choice that we should be given. Why are you giving us those two choices? Those aren’t really even choices. And so, that’s my, really, my biggest problem with this whole conversation. I think history sort of says that when large corporations get their hands on something and have it in their cold hands and their cold hearts [chuckles] to do something and to get it done and to save a penny or two, they’re gonna do it. It’s gonna happen. And so, right now, my concern is that is this conversation about pro or con, does it even matter at this point? Is this inevitable?
And so, should the conversation actually move into something else like, “Hey, Amazon, hey, you corporations, why don’t you, you should, you need to be including us in this conversation”? Because like Scott, I think Scott B., you mentioned some other opportunities where, you know, okay, wait. Text to speech, I’ll take it here. This would work. This would help my life here in this particular case. And I’m wondering if there are other examples of that, that apply to film. Can we talk about either this framing of no, of more AD with text to speech or not, but also, is this inevitable? Do y’all think it’s inevitable? Do we still have a chance to say no? Or should we be talking about, hey, let’s come to the “negotiation table” and have these conversations and find out where the blind community says, “Okay, this would work for us”? I wanna throw that out there.
NEFERTITI: Mmhmm. And the blind community and our allies. Let’s never underestimate the power of allyship and togetherness. You know, this is accessibility.
THOMAS: Absolutely.
NEFERTITI: But it’s not to exclude our sighted allies. We center blind people here, but we are here, and we want to be part of the conversation just as much as our sighted allies have been already.
But I would like to hear a little bit from Scott Blanks about this idea that I’m sure is not exclusively something that, you know, just sort of a light bulb went off in his head, but something that he has taken and done something about. And it’s all about advocacy and a campaign of sorts. Because, Thomas, what you were saying about so, what are our choices? No description or TTS? And is this sort of the end of the road? Do we just let them do them being the cold-hearted, cold-handed, as you put it, companies out there to save a penny go the way of TTS, or do we do something about it? Can we do something about it? And I think that Scott has come up with a way that we can, if people get behind it. Scott Blanks, do you wanna talk a little bit about that?
SCOTT B: I have found that there are a lot of different ways to engage with companies, not just talking about audio description, but in so many different things. And particularly if you’re a person with a disability and unfortunately, you have to fight for a lot of stuff, small things, large things. Sometimes small things feel big. And there’s more of that than there should be. That’s a different topic, though. But I find that engaging with companies, it’s very easy to do that in places like social media and in sort of those public spaces. But what tends to be a little bit more of a lift for us, but also, I think has more impact on these companies, is when you start writing to them directly and when they start hearing from people in numbers.
So, one of the things that I did a few months ago was I took a run at a very basic, it’s sort of a template of sorts, a very, very rudimentary template that someone can take and use however they would like to reach out to, if they know of an entity who is providing TTS audio description, and they would like to talk about why they feel like that company should look at doing it a different way. This is a, it’s in a Google document that anyone can access. I would say the best thing you can do is connect to the LinkedIn audio description group, the Twitter community, or you can come find me on LinkedIn or anyone here really would probably be able to get you access to that link. It’s a public link, and it is available for anyone to view, copy, and do with as you see fit. But I believe it’s important. If companies don’t hear from us, and they’re doing a thing, then they think they’re doing that thing correctly. They believe that that’s how it should be done unless they start hearing from people.
And listen, I’m not under any sort of illusion that writing a bunch of letters is guaranteed to make a change. But I don’t like the idea of something becoming so rooted in, and the expectation is that this will be the way things are for now and evermore and thinking that we didn’t try hard enough. And I believe that part of advocacy is it’s not as flashy, but it’s getting those letters written. It’s getting that contact to these companies. And all of these c
THOMAS: Cool. Well, that concludes this week’s conversation. Why don’t y’all keep the conversation going on social media.
CHERYL: Use #ADFUBU, for us by us, #DescribeEverything, and #AudioDescription.
NEFERTITI: And hey, you know we’re out here, right? Mmhmm! Gathered and galvanized y’all. If you haven’t joined us yet, what are you waiting for?! You can find us in the LinkedIn Audio Description group and the AD Twitter community. We know that your participation will only make these spaces better.
Music fades out!

Hide the transcript

Reid My Mind Radio – Microsoft Seeing AI – Real & Funky

Wednesday, August 2nd, 2017

!T.Reid wearing a hat with a "T" while the Seeing AI logo is imposed on his shades!
Okay, I don’t usually do reviews, but why not go for it! All I can tell you is I did it my way; that’s all I can do!
It took a toll on me… entering my dreams…
I’m going to go out on a limb and say I have the first podcast to include an Audio Described dream! So let’s get it… hit play and don’t forget to subscribe and tell a friend to do the same.

Resources:

Transcript

Show the transcript

TR:

Wasup good people!
Today I am bringing you a first of sorts, a review of an app…

I was asked to do a piece on Microsoft’s new app called Seeing AI.for Gatewave Radio.

The interesting thing about producing a tech related review for Gatewave is that the Gatewave audience most likely doesn’t use smart phones and maybe even the internet. However, they should have a chance to learn about how this technology is impacting the lives of people with vision loss. Chances are they won’t learn about these things through any mainstream media so… I took a shot… And if there’s anything I am trying to get across with the stories and people I profile
it’s we’re all better off when we take a shot and not just accept the status quo

[Audio from Star Trek’s Next Generation… Captain La Forge fire’s at a chasing craft. Ends with crew mate exclaiming… Got em!]
[Audio: Reid My Mind Radio theme Music]

[Audio: Geordi La Forge from Star Trek talk to crew from enemy craft…]
TR:
Geordi La Forge from Star Trek’s Next Generation , played by LeVar Burton, was blind. However, through the use of a visor he was able to see far more than the average person.

While this made for a great story line, it also permanently sealed LeVar Burton and his Star Trek character as the default reference for any new technology that proposes to give “sight” to the blind.

[Audio: from intro above ending with Geordi saying…
“If you succeed, countless lives will be affected”
TR:
What exactly though, is sight?

We know that light is passed through the eye and that information is sent to the brain where it is interpreted and
quickly established to represent shapes, colors, objects and people.

A working set of eyes, optic nerves and brain are a formidable technological team.
They get the job done with maximum efficiency

Today, , with computer processing power growing exponentially and devices getting smaller the idea that devices like smart phones could serve as an alternative input for eyes is less science fiction and well, easier to see.

There are several applications available that bring useful functionality to the smart phone ;
* OCR or optical character recognition which allows a person to take a picture of text and have it read back using text to speech
* Product scanning – makes use of the camera and bar codes which are read and the information is spoken aloud again, using text to speech
* Adding artificial intelligence to the mix we’re seeing facial and object recognition being introduced.

Microsoft has recently jumped into the seeing business, with their new iOS app called Seeing AI… as in Artificial Intelligence!
There’s no magic or anything artificial about these results, they’re real!

In this application, the functionality like reading a document or recognizing a products bar code are split into channels. The inclusion of multiple channels in one application is already a plus for the user. Eliminating the need to open multiple apps.

Let’s start with reading documents.

For those who may have once had access to that super-fast computer interface called eyes , you’re probably familiar with the frustration of the lost ability to quickly scan a document with a glance and make a quick decision.

Maybe;
* You’re looking for a specific envelope or folder.
* you want to quickly grab that canned good or seasoning from the cabinet.

With other reading applications you have to go through the process of taking a picture and hoping you’re on the print side of the envelope or can. After you line it up and take the picture you find out the lighting wasn’t right so you have to do it again.

Using Microsoft’s Seeing AI you simply point the phones camera in the direction of the text

[Audio App in process]

Once it sees text, it starts reading it back! The quick information can be just enough for you to determine what you’re looking for. In fact, during the production of this review, I had a real life use case for the app.

My wife reminded me that I was contacted for Jury duty and I needed to follow up as indicated in the letter. The letter stated I would need to visit a specific website to complete the process. I forgot to put the letter in a separate area in order to scan it later and read the rest of the details. So rather than asking someone to help me find the letter, I grabbed the pile of mail from the table and took out my iPhone.

I passed some of my other blindness apps and launched Microsoft Seeing AI. I simply pointed the camera at each individual piece of paper until finding the specific sheet I was seeking. The process was a breeze. In fact, it was easier than asking someone to help me find the form. Ladies and gentlemen, that’s glancing!

Now that I found the right letter, I could easily get additional information from the sheet by scanning the entire document. I don’t need to open a separate app, I can simply switch to a different channel, by performing the flick up gesture.

Similar to a sighted person navigating the iPhone’s touch screen interface , anyone can non visually accomplish the same tasks using a set of different gestures designed to work with Voice Over, the built in screen reader that reads aloud information presented on the screen.

Using the document channel I can now take a picture of the letter and have it read back.

One of the best ways to do this is to place the camera directly on the sheet in the middle and slowly pull up as the edges come into view. I like to pull my elbows toward the left and right edges to orient myself to the page. Forming a triangle with my phone at the top center. The app informs you if the edges are in view or not.
Once it likes the positioning of the camera and the document is in view, it lets you know it’s processing.

[Audio: Melodic sound of Seeing AI’s processing jingle]

You don’t even have to hit the take picture button. However, if you are struggling to get the full document into view ,
you could take the picture and let it process. It may be good enough for giving you the information you’re seeking.

If you have multiple sheets to read, simply repeat.

Another cool feature here is the ability to share the scanned text with other applications. That jury duty letter, I saved it to a new file on my Drop Box enabling me to access it again from anywhere without having to scan the original letter

Let’s try using the app to identify some random items from my own pantry.

To do this, I switch the channel to products.

[Audio: Seeing App processing an item from my pantry…]

What you hear, is the actual time it took to “see” the product. All I’m doing is moving the item in order to locate the bar code.
As the beeps get faster I know I am getting closer. When the full bar code is in range, the app automatically takes the picture and begins processing.

[Audio: Seeing AI announces the result of the bar code scan… “Goya Salad Olives”

It’s pretty clear to see how this would be used at home, in the work environment and more.

Now let’s check out the A I or artificial intelligence in this application.

By artificial intelligence, the machine is going to use its ability to compute and validate certain factors in order to provide the user with information.

First, I’ll skip to the channel labeled Scene Beta…
Beta is another term for almost ready for prime time. So, if it doesn’t work, hey,, it’s beta!

Take a picture of a scene and the built in artificial intelligence will do its best to provide you with the information enabling you to understand something about that scene.

[Seeing AI reports a living room with a fireplace.]

This could be helpful in cases like
If a child or someone is asleep on the couch.

[Audio: Action Movie sound design]

I can even picture a movie starring me of course, where I play a radio producer who is being sought by the mob. The final scene I use my handy app to see the hitman approaching me. I do a round house kick…
ok, sorry I get a little carried away at the possibilities.

While no technology can replace good mobility travel skills I can imagine a day where the scene identification function will provide additional information about one’s surroundings.
Making it another mobility tool for people who are blind or visually impaired.

Now for my final act… oh wait it’s not magic remember!

Microsoft Seeing AI Offers facial recognition.
That’s right, point your camera at someone and it should tell you who that person is… Well, of course you have to first train the app.

To do this we have to first go into the menu and choose facial recognition.
To add a new person we choose the Add button.
In order to train Seeing AI you have to take three pictures of the person.
We elected to do different facial expressions like a smile, sad and no expression.
Microsoft recommends you let sighted family and friends take their own picture to get a good quality pic.

The setup requirement, while understandable at this point sort of reduces that sci fi feel.

After Seeing AI is trained, once you are in the people channel
when pointing your camera in the direction of the persons face, it can recognize and tell you the person is in the room.

[Audio: Seeing AI announces Raven about 5 feet in front.]

Seeing AI does a better job recognizing my daughter Raven when she smiles. That too me is not artificial intelligence because we all love her smile!

The application isn’t perfect. it struggled a bit with creased labels, making it difficult to read the bar code.

Not all bar codes are in the database. It would be great if users could submit new products for future use.

As a first version launch with the quick processing, Seeing AI really gives me something to keep an eye on. Or maybe I should say AI on!

Peering into the future I can see;

* Faster processing power that makes recognition super quick,
* Interfacing with social media profiles to automatically recognize faces and access information from people in your network
* lenses that can go into any set of glasses sending the information directly to the application not requiring the user to point their phone
at an item or person and privately receiving the information via wireless headset.
That could greatly open up the use cases.

In fact, interfacing with glasses is apparently already in development and
the team includes a lead programmer who is blind.

Microsoft says a Currency identification channel is coming in the future;
making Seeing AI a go to app for almost anything we need to see!

The Microsoft Seeing AI app is available from the Apple App store for Free 99. Yes, it’s free!

I’m Thomas Reid
[Audio: As in artificial intelligence!]
For Gatewave Radio, audio for independent living!

[Audio: Voice of Siri in Voice Over mode announcing “More”]

I don’t know if that’s considered a review in the traditional sense, but honestly I am not trying to be traditional.

The thing is, thinking about the application started to extend past the time when I was working on the piece…

That little jingle sound the app makes when it’s processing… it started to seep into my dreams…
[Audio: Dream Harp]

[Audio: “Funky Microsoft Seeing AI” An original T.Reid Production]

The song is based around the processing tone used in the app with the below lyrics.

(Audio description included in parens)

(Scene opens with Thomas asleep in bed with a dream cloud above his head)

The processing sound becomes a sound with Claps…

(We see a darkened stage)

(As the chorus is about to begin spotlight shines on Thomas & the band)

Chorus:
Microsoft Seeing AI
Helping people see without their eyes

Microsoft Seeing AI
Helping people see without their eyes

(Thomas rips off his shirt!)

Verse:
Download the app on my iPhone

{Background sings… “Download it, Download it!}

Checking out things all around my home

(Thomas dances on stage)

Point the camera from the front
Huh!
Point the camera from the back!

I’m like;
what’s that , what’s this
Jump back give my phone a kiss!
Hey! (James Brown style yell!)

(Thomas spins and drops into a split)

Chorus:
Microsoft Seeing AI
Helping people see without their eyes

Microsoft Seeing AI
Helping people see without their eyes

(Back in the bed we see Thomas with a fading dream cloud above his head)

Ends with the app’s processing sound.

TR:
Wow, definitely time to move on to the next episode…

With that said, make sure you Subscribe wherever you get your podcasts. Tell a friend to do the same – I have some interesting things coming up I think you’re going to like.
And something you may have not expected!

[Audio: RMMRadio Outro]
TR:
Peace!

Hide the transcript