Bing Chat, Microsoft's new AI-assisted chatbot, can go to some strange and dark places. It's wigging some people out. But machines that attempt to mimic human social behavior have a long history -- and a long history of making us feel uncomfortable.
With the flood of new and ever more realistic AI helpers on the move, Meteor decided to take a look back at their roots, and what comes next. So we caught up with Stanford Prof. Byron Reeves, who researches human emotional responses to computers and was on the team that brought to life Clippy – Microsoft's pushy and much maligned cartoon helper that appeared in Windows 97 and was retired by popular demand in 2007.
The following transcript of this interview was edited for clarity, style and length. It was created using MacWhisper, an eerily accurate AI speech-to-text utility. (Ums, I thinks, I guesses, you knows, I means were perfectly captured but have been removed for readability.)
EVAN HANSEN: I've been watching all of these developments in artificial intelligence and this new wave of personal assistants that appear to be coming online. I thought it would be really fun to go back to some of the original attempts to create a digital assistant, a desktop assistant.
Clippy obviously comes to mind. I know you were associated with that project.
BYRON REEVES: Yes, but we didn't actually draw or spec out particular characters or assistants.
My colleague Cliff Nass and I were at Microsoft Research in the mid 90s trying to help them figure out what to do with a computer that could display pictures, provide entertainment – much more than just a C prompt and the organization of documents.
We were talking about moving computing more into an entertainment and interactive context, and consideration of virtual assistants was part of that.
There were a lot of people that were quite interested. Clippy and other animated characters were something we talked about a lot. I don't think it was the most important example of social interaction. But it was a good example of trying to bring a different form of interaction to computing.
EH: I think the reason is that people love to hate it. Or it was just one of the more bonkers graphical kinds of representations. I think that kind of got people. Women said it was creepy.
BR: It was really new. You used to be doing spreadsheets and documents, and now there was this animated character. That was a real left turn with respect to what computing was.
I think a lot of the objections to those first assistants were really made and led by the computational folks who were the guardians of the gate and didn't really think that computing should be getting into these soft areas of social interaction and pictures, or naturally using your voice in a dialogue that’s familiar to all of us – just 'cause we’re human.
So many thought that “social” just didn't belong in computing. I think that was behind a lot of the objections. But there were some other things. Clippy interrupted while you were writing a document. I mean, that's a socially-egregious thing to do. So it was impolite, but it was still really social. And the negative response was very much because it was in the context of social interruption.
EH: And tell me what was your role in the development of it? You were trying to figure out the social aspects and give them some advice about how to look at an assistant that would be more accepted, right?
BR: Yeah. Well, what we were trying to do is not unlike what folks are trying to do now. We were trying to make computing easier.
There were a whole lot of people that didn't want to write code, that didn't want to learn Windows, arbitrary prompts, and who wanted computing to be more natural like many contexts that we all take for granted right now.
So, that part of it was very similar to what Alexa does, what a generative AI system does. Let's just make this easier to use. And the way to do that is natural language, it's interaction, it's speaking in your voice, it's not having to write code, it's getting the AI to write the code for you. But I think that the motivation there was really representing people who just thought it was too damn hard.
EH: I find it interesting, the personification, and the way people read into machines that act like people, even though it's not really there, maybe not there. But it feels like there's a detour after Clippy in which people stopped trying to personify it and it just became, oh, it's an invisible assistant, type ahead, which can be really annoying too, but there's no personality to it, there's no face representing the feature to focus your emotions against. And now we have these AI chatbots popping up everywhere and the social machine is back in a big way, and the emotional connections that can create.
BR: Yeah. I have memories of so many different product groups that were very explicitly thinking about social interaction. General Magic, for example, created this wonderful device that was far ahead of its time. And it was audio related.
They had a very good patent about personality and voice, and tried to do script writing and character development in ways that were explicitly social.
The Google assistant and Alexa; there are people I guarantee that worry about social interaction. They worry about "Who do people think Alexa is and what's her personality and how old is she and what's her backstory?" And "Should she be more explicitly social? Would that increase the quality of the prompts that people give and the likability of the system?"
There are just many examples of all kinds of automated dialogue systems that I think were very similar. Clippy is not the best example. I'm not even sure it's the first. General Magic was working on their voice assistant pretty much at the same time.
And around the same time at Microsoft, there was a product group called Microsoft Agent that was working on a very sophisticated, controllable and graphic social actor (the first character was a Wizard) that could talk to users on a Website, for example about what to buy, and the character could move around in a separate window.
So I think there were a lot of people that were interested in bringing that social interaction to computing.
And right now, I see a lot of similarities with AI between the old finite state machines, these IF/THEN computing systems that just tried to guess what the best answer was based on statistics. That's in large part what AI does now.
Today, I think, we’ve got much better ways of doing the stats (there's no deep learning involved in Clippy for certain), but they're trying to do similar things to bring naturalness to interactions.
EH: How do you rate the progress against this goal over the years you've been watching it?
BR: There certainly has been progress. I think in the voice area more than the visual area, but that could change, as AI starts working on video. There are quite useful automated dialogue systems, from United Airlines to BMW to Alexa, and there are lots of companies who specialize in how to do those systems right. In mental health, Woebot is a good example.
In terms of social interaction, AI is similar to the kind of highly curated, well-written, useful, successful IF/THEN machine that we were trying to make years ago. If a user says this, tell them that. Only now the inputs can be complicated, much more so than we had back in the mid-90s, but they rely on the same social sensibilities, I think.
EH: I think it's super interesting how people want to test the machine. That people immediately are trying to thwart the guard rails and they're trying to poke it and just sort of see, where are you in there? I wonder if there's Easter eggs hidden in this thing. I wonder, try X, Y, Z, what's it going to do?
And there's a sense of this potential, this potent thing lying behind a veil, with maybe a human-like personality, and you have to do all of these weird things to see what you can pull out of it. The process itself is engaging for people.
BR: I've noticed the same. That's what a lot of the students and colleagues want to talk about. Is it real? Is it what a real human would say? Is it authentic? Is the content correct? And there's a lot of concern in all those areas.
One of the most interesting effects for me is how AI changes work, how it changes productivity and efficiency. AI changes the speed with which I can get to an answer, the number of options I might consider, how we brainstorm. It’s great for rough drafts, putting together large blocks of thought or a bunch of code and then digging into the details. It’s often easier to edit than it is to create the first draft.
AI is a productivity tool; not a new telegraph or communication technology, but a new assembly line. That part of AI is most impressive!
One of my students brought in a job announcement for a prompt engineer, a couple hundred thousand dollars a year for a job title I’d not yet seen. The job was for somebody that really knew how to talk to generative AI. What words to use, how to iterate, just the nature of sifting and winnowing and funneling and specifying. Those skills are not always straightforward.
I'm not a specialist in what new work will come with AI, what new jobs will emerge. But I was really intrigued by the prompt engineer – it sounded right, when I watch people for the first time interacting with AI.
EH: I've interviewed a few AI artists who, I guess, people debate whether that's an accurate kind of label for what they do. What I've seen though is the people on the very far edge of creativity and innovation are very surprising in how they engage generative AI imaging tools to try to extract the kind of outputs that they're looking for. And they're far from obvious.
BR: They're very far from obvious. And so I can see how this could become a useful specialty.
EH: I find that accidents have very interesting outcomes as well. You mistype and you mistype, now you're a pro. We actually have a whole section in our newsletter called AI bloopers, because it's so entertaining to see what happens if you get an egg. What will the AI do with that?
The results are often hilarious. It's maybe the most interesting thing about AI, not what it gets right but how far wrong it can go. Maybe one of the most human things about them.
BR: Yeah. But you know, going back to the similarities and what was being done 25 years ago, being in the rooms to draft the finite state model for the simplest of agents was really an interesting exercise. We would get entire walls covered with IF/THEN statements. If the user selects this interest, give these four options and if this, the first option, etc.
And then here are the statistics on which are the most probable options that should determine how we order responses. Those are simple things for an AI to work on now.
But there's a lot of similarity there and just trying to be statistically intelligent about guesses or ways to answer the question that was asked, for one. And then the fact that the interactions are me typing words, me speaking words. I haven't done that yet. I don't know what that feels like for ChatGPT, to actually actually talk to it. I don't know if you've done that.
EH: I'm not aware of a speech interface yet for it.
BR: It would be seemingly trivial to gin that up. And it might feel a little different, even more natural. Maybe you might interrupt more. "Wait, wait, wait, that's not what I want. I heard your first two. But let's eliminate this and get more of that."
EH: How far do you think we are from this?
BR: It's probably inevitable that we're kind of marching towards the Holodeck.
It just gets more real and more natural. I've got people all around me who are wondering about their future now. And they can really see the day when you can just kind of talk your way into a computer program that will do real work. Or at least do part of it, so that you could then hand it off to a professional that would clean it up.
I don't know how far away it is, but this seems like a step on the journey that's really important.
One of the things that really is interesting to me is to watch the students and how they respond to AI. I get cues from them as to how novel it is. I teach a course here at Stanford called Media Psychology, and we changed up one of the labs in the course to give students a chance to experiment with AI.
The lab was about writing a script for a two-minute segment of a movie using particular social scientific principles of realism that we had talked about in the lectures; for example how you operate a camera, how you write a narrative that's engaging, and then storyboard the script as well.
So they had to do this job of creating media based on research principles using AI. And they worked on this for an hour and 15 minutes. There were 14 groups of five and they just came up with the most interesting, innovative, creative bunch of ideas – "We're going to have a half alien and half rat that's in a Hogwarts version of Stanford."
The creators were both tickled to death and afraid. Some said, "I need to change my major,” and some said “We're all doomed." I've never heard such a range of extreme reactions.
EH: I think there's both ends of that spectrum because we don't know how far it goes. We don't know where our ability is to implement it in a humane way that helps lift people up. We have all of these forces that push for inequality in our society, and so is it just going to exacerbate that, or are we going to find a way to redistribute the value? Who knows it could go either way.
BR: I think it'll be both. On this campus I certainly feel that there's more of an equitable attention to the pro and con columns than in past technical milestones when they've been introduced.
There is a lot less blind energy about just taking the innovation and running with it – there'll be some bad stuff that happens, but don't worry too much about exactly what that is. I think there's a lot more concern being expressed at the same time as the innovations are being tested. And there’s more of an interdisciplinary take on the value of technology – from humanists, artists and social scientists as well as computational folks.
EH: Is it too soon for some of this stuff? Is it ready for release to the public?
BR: Well, I certainly wouldn't argue with anyone that wanted to cloister it until we knew more what to do with it. But I’m pretty sure that the cat is out of the bag already?
– February 14, 2023
About Byron Reeves
Byron Reeves, PhD, is the Paul C. Edwards Professor of Communication at Stanford University and Professor (by courtesy) in the Stanford School of Education. Byron has a long history of experimental research on the psychological processing of media, and resulting responses and effects. He has studied how media influence attention, memory and emotional responses and has applied the research in the areas of speech dialogue systems, interactive games, advanced displays, social robots, and autonomous cars.
Byron has recently launched (with Stanford colleagues Nilam Ram and Thomas Robinson) the Human Screenome Project (Nature, 2020), designed to collect moment-by-moment changes in technology use across applications, platforms and screens.
At Stanford, Byron has been Director of the Center for the Study of Language and Information, and Co-Director of the H-STAR Institute (Human Sciences and Technologies Advanced Research), and he was the founding Director of mediaX at Stanford, a university-industry program launched in 2001 to facilitate discussion and research at the intersection of academic and applied interests. Byron has worked at Microsoft Research and with several technology startups, and has been involved with media policy at the FTC, FCC, US Congress and White House. He is an elected Fellow of the International Communication Association, and recipient of ICA Fellows book award for The Media Equation (with Prof. Clifford Nass), and the Novim Foundation Epiphany Science and Society Award. Byron has a BFA from Southern Methodist University and a PhD in Communication from Michigan State University.