Chatbots be they voice or text face the same user interface problems that old school command-line interfaces did 20 years ago before windowing GUIs came about. When you start up, you're faced with a void. it's extremely difficult to convey to users what the acceptable entry points are and how to phrase them to get what you want. The inputs are far more generous than classic command-line world, but it's still vague enough to induce paralysis in end users. If I have a bunch of pull-down menus with clear directives on what I can and can't do, I'm going to be productive much more quickly.
I think the ultimate goal is to make the "acceptable entry points" so numerous and the variety of acceptable wordings so broad that you can approach the assistant with pretty much any goal you have in mind and it'll walk you through how to accomplish that.
Imagine if this was a realistic conversation with an assistant:
> "Hey Google, I'd like to order a pizza."
> "Sure, what kind?"
> "Let's see... cheese, pepperoni, sausage... and maybe some green pepers?"
> "Alright. What size?"
> "Hmm, so I need to feed 4 people..."
> "Sounds like a large?"
> "Sure, let's go with a large."
> "Alright. There's a Dominos nearby, I can order that for $8.99."
> "Sounds good."
> "Alright, I've ordered your pizza. Expected delivery in 15 minutes."
No need for the human to understand what the "entry point" is, because you can approach the assistant with pretty much _any_ entry point and it'll give you a useful response. We're still not there yet, unfortunately, and I think it'll be quite a while before we are.
You know, there is at least a dozen chatbot providers who can handle these nonstructured dialogues with multiple entry points, ESPECIALLY with pizza.
In fact, the pizza order is the No. 1 scenario looked at by the chatbot providers.
In fact, it was exactly the scenario my old startup took as a case study and the first application we built with it. It could handle the different toppings, the sizes, and more. You could submit all your requests in one move, it would be parsed and sorted into its little slots.
The problem? There is only a handful of scenarios similar to the pizza. In most cases in the real world, you have to select from an external database, look at proprietary product names, and more. Another staple of chatbot demos, plane ticketing, only works well when limited to North America (in the English word). Good luck asking for a flight to Kinshasa, Kuala Lumpur, or even Wagga Wagga in Australia.
I am not even talking about the switchboards for multiple domains, like in Alexa. These ones only work with "leaky abstraction" (making the user learn magic keywords).
Another problem is really stupid. It's the availability of the datasets. The funny thing is, ye olde style semantic frameworks fare better than the machine learning ones, because there is not enough data for the machine learning chatbot frameworks, and without it, their mighty capabilities are pretty much the proverbial spherical cow in vacuum. But because the semantic paradigm is not kosher/kewl anymore, very few enterprises agree to deploy it.
None of that matters though because the users never liked typing a lot. Back in 1980s - 1990s the adventure type computer games switched from (mostly working) command line interface to point-and-click, and very few users objected.
My take is, the key is a conversational UI with strong visual feedback. For the pizza scenario above, I would draw icons of cheese and numbers, so that the user can be sure it worked.
> plane ticketing, only works well when limited to North America
And when limited to text. I once saw a demo for a voice chat for ticket orders. It took a full minute for it to tell you the multitude of options for flying 'from the Netherlands to New-York next Monday'. A human assistant can reason on what an acceptable flight may be based on a small set of parameters. A chat bot would need to know every preference in detail, like destination airport, time of day, budget, etc.
Well, on the other hand in a GUI - in pretty much a single page - I can see all options I have, all sizes, their exact price ...etc. With CMD and chatbots there's always the FOMO. "What if there are other options I haven't considered?". Humans like to be in control.
Exactly, I prefer using a GUI over calling over the phone for delivery because I might have communication or listening issues with the person on the other side of the phone.
The same thing happens for McDonald's and other places replacing cashiers with touch-screen terminals, not only I am sure of how the system understood me, I can easily navigate the several options without annoying an employee with dozens of questions, going from that to conversational UI is a step back in the wrong direction.
Just this morning I stopped at McDonald's for breakfast. I gave my order, they entered it, and it showed up correctly on the screen at the drive through. However, the order taker read back something completely different (than what they just entered). Since I saw that the order in the system was correct I ignored that and simply said "sure that's right."
This is why I prefer an interface that bypasses the human order taker. They may say one thing and do another. Even in my case, they may have then thought "well he confirmed what I just said back, and that doesn't match what the screen shows, so I'll fix it to match what the customer agreed to." Then I would have the wrong order. That can't happen if I interact directly with the order system.
English is not my primary language and I live in a country where I don't speak the local language which is not English, I have consistent issues with miscommunication.
Once I went to a McDonald's and ordered some double cheeseburger with no onions, when the order came it was just the buns with no meat.
Especially in a place like McDonald's where they serve several people familiar with the menu, the staff seems annoyed if you don't know what the ingredients of one of their menu items are and often don't handle specific requests well.
It's not just language issues, my ex-girlfriend always asked a lot of questions when ordering at restaurants, sometimes checking with me, you can visibly see the waiter getting tired or annoyed because of this, not coincidentally, there is always some minor screw up with the order because they forgot to write something down. The same happens if you go to a restaurant with 20+ people and the waiter comes in to take everyone's order, the chances of making a mistake goes up.
I know! Reading your food order verbally from a paper-printed menu to a human who needs to memorize or write it down and walk back to deliver it to the food prep area, seems to be such an error-prone and ambiguous way to order a product. Doubly silly that often the waiter will simply take the order and enter it right into some kind of computer or kiosk. Just give me the damn computer! It's shocking that the vast majority or restaurants still do it this way.
Wow I'm really shocked by all the responses. I think the majority of the world - ie outside of our tech bubble - would much rather deal with a human than enter their order in on a computer. Particularly in a restaurant. In fact I'd go further and say that a good proportion of non-technical people would not only prefer to deal with humans, but would trust another human more than themselves with a computer interface.
Personally I'm techy and I still prefer the human aspect. In fact part of the appeal of going to a restaurant is to be waited upon - otherwise I might as well just order a takeaway online. Sure they might occasionally screw up your order but this doesn't happen nearly as often as this thread would suggest. I do eat out a lot and I honestly don't think I've had my order messed up in the last 2 years. I wouldn't say I eat at particularly posh places either though I do actively avoid most fast food establishments (not a snobby thing, I just don't like the taste of McDonalds et al) so maybe the issue of reliability is more subject to the lowest paid positions in the food service?
In any case, even if you did have your way and entered your orders directly into a computer, you'd still have to deal with the fallibility of humans with the chef cooking your food, waiter / delivery driver distributing your food, and anyone else who exists along the chain. In fact I wouldn't be at all surprised if many of the mishaps described in this thread were actually failings of those individuals rather than the order takers whom you assumed had messed the order up.
I guess it depends on what type of restaurant you're going to. If you're going to a cheap, fast service restaurant then sure, I'd like to have a tablet I could use to order from.
But if I'm at a good restaurant, a server is actually part of the service. They can give you recommendations on dishes, and wine matches, and help you get exactly what service you want. Also the human experience is just part of the fine dining experience.
It would be really strange if I want to Eleven Madison Park or Bouchon and they just handed me an iPad to order my food from.
It's not just simple pairings, but includes nuances that would take many different fields to capture: Dish X can be made vegan, but tastes better if you then order with extra seasoning.
Discovery can be bad too. If you don't eat meat or pork or whatever, you often have to go through the menu in O(n) and look for your options. And then maybe it's a dish they've stopped making months ago, but printing new menus would have been to expensive. (OK, that won't happen at McDonald's.)
With a tablet, you could filter the list with a single tap. I've thought about building such an app a few years ago because I'd love to use it, but I have no idea how you could sell it to restaurants. It seems most places are too conservative and cash-strapped to tie themselves to proprietary tech.
They're tied to propriatary tech glasses to boots. The ability to hire a tech person is so far outside restaurant spend that they outsource to a single company, which handles all their software, their registers, and provides support. Your best bet is to sell it as a code module to the PoS integrator.
Be careful what you wish for. The Burger King at the NYS Thruway rest stop just north of NYC has one. Think McDonalds has them on Long Island, but I haven’t used one.
It’s awful — probably an order of magnitude more awful than dealing with a non-English speaking cashier. As with anything digital, it’s a sales funnel designed to trick you into ordering whatever their profit driver is. Have fun not buying a value meal.
I remember when Google rolled out a hands free payment system at McDonalds and the identity verification step involved speaking your initials to the McDonalds employee. Despite so much automation (not even needing to pull out your phone) the system totally failed on the verbal interpretation of the initials: "Did you say "J" or "G"??" and gesturing the shape of the letter with fingers.
Relatedly, I've found it a little unfortunate that, whenever I'm asked to spell out something to a human, chances are the receiver has either never heard of the NATO Phoenetic Alphabet or doesn't understand it.
You might be right for people who have ordered a dominos pizza many times and already know exactly what they want. Rattling off your requests to a person might be faster and less stressful than navigating a UI.
But what if you don't know what your options are and how much they cost? You have to ask the person to list off the possible pizza styles, sizes, toppings, and the prices for each one. Then they have to tell you about all the available side dishes and desserts (with prices, again) and how there's a half-off deal if you get THIS side with THAT pizza on a Tuesday, and on and on and on. It'd probably take a good 20 minutes to convey all that over the phone (I hope you have a good memory or are taking notes), and by the time they're done, the poor employee is probably so frustrated that they're ready to strangle you.
Or, I can suck up all that information at a glance on dominos.com, and I won't have to repeat my credit card number over the phone 5 times before they get it right.
Restaurant websites always, without exception, suck. Dominos, while the pizza is garbage, has a wonderful ordering website. But even then the actual menu is awful.
They are always a sales funnel first, menu second. They cannot give accurate ETA, ever. It’s harder to display multiple choices well on a screen vs a sheet of paper.
Unless you are a place with 5 menu items, the paper menu is superior in almost every scenario.
Well, restaurants (simple or fancy) provide an excellent analog counterexample: there's basically always a menu that you look at and select items from - the definition of a graphical user interface.
Conversing with a person tends to be more intuitive than using a machine, but also more ambiguous and sometimes more difficult to get clear information. If I already have all the information I need, or if I want an opinion instead of objective fact, then going through a person can be better.
Otherwise, they're just acting as a voice interface for a GUI that I'm perfectly able to navigate on my own, and I'm not a fan of voice interfaces when dealing with technical systems.
I think this is an assumption that might not hold true, outside of the tech sphere.
Us? Of course we want imperative interfaces! E.g. "Order me a pizza using JSON with my pizza_now script with options -g and -b and promo code CASEOFTHEMONDAYS"
People who are not technically inclined? End-state declarative. I see most as fine saying "Order me a pizza with pepperoni" and having it show up at their door.
Absolute. I wish engineers could use better examples to explain capabilities.
Ordering a pizza as shown in the above example is very contrived, no one needs that, a GUI is much better to execute this use case. But the power of chatbots will light up if it can answer 'Would this pizza be too spicy?", "can you deliver this after 4pm?". What I mean is when the chatbots can take over more of customer queries which otherwise might be directed to the store via phone call. Or something which requires deep knowledge of the product and when not every corner case can be put on the GUI menu.
Well, the command line/voice interface is really useful once you know what you want, it is less useful or even a deterrent if you are in an exploratory mode. I've had several occasions where I just want my regular pizza ordered without having to click a thousand buttons and sit through an IVR. 10 kids are coming home for a party - they'll all eat cheese pizzas with no toppings.
"Ok Google, get me 10 Margherita pizzas by this evening" is super convenient.
I'd prefer a chatbot that just serves up different GUIs based on my initial query. There would be a pizza GUI, an Uber GUI, a map GUI, a chat GUI, a YouTube GUI and so on.
> Well, on the other hand in a GUI - in pretty much a single page - I can see all options I have, all sizes, their exact price ...etc
A while back, one of the big features Google advertised was the ability to start a transaction by voice, and then continue it on a device with a screen.
Obvious problem is you didn’t order pizza, you ordered Dominos, and your selection is limited to who ever signs a deal with Google or pays them off for top billing.
Correcting that means now you’re arguing with Google/Alexa/Siri, which is an infuriating experience.
And now we reach a whole new level of "you'll never get [a chatbot] to understand something when [the company that produced the chatbot] getting paid depends on them not understanding it."
Ideally it would have some kind of anthropomorphized graphical avatar applicable to the context. Research out of Stanford[1] as far back as the '90s has suggested such interfaces as a means for improving human-computer interaction. If I was writing a letter, for example, perhaps an animated document fastener would be appropriate. In this case, why not an animated, anthropomorphic pizza that morphs into the Domino's logo as a paid-for branding.
The sad part is that it might still be faster to do this online even if the conversational intelligence was so good. If I was rich and had a real human assistant, I might say "order us pizza" and they could figure out most of the details on their own. However, given that I don't, it is just quicker for me to order a pizza from my phone filling in the details rather than talking about them.
I would never do something like that.
If I order a pizza either I order it from some place that I know that can do a decent pizza or I’ll thoroughly look at the reviews of all the places that can deliver it to me.
Seriously...Dominos...
That's ok, you just open with "Hey Google, I want a pizza from Good Pizza House" instead. Anything you leave open is up to the assistant to try and interpret what might be a good choice.
For the average consumer who otherwise didn't specify, Dominos is a good choice: it's cheap, reliable, they understand the menu and deals, and it's consistent across time and place.
I'm sure this is technically possible, but it may never be practical. Much like human operators, you're going to want to drive the user down a certain path, because order is often extremely important.
What if Dominoes doesn't have green peppers? What if they don't like the pizza brand? What if they're asking because they have a coupon? You may end up repeating sections of the conversation multiple times, and in the end the user will just end up so confused they give up.
More importantly, how do you know Dominoes (or someone else with an incentive) is not driving you down a certain path?
This happens today already with Ads, Fb feed, instagram etc... but there is something distinctly suspicious about taking away specific control points in the decision making process.
Agreed that Domino's Pizza would pay for this referral, the key difference though is that search engines generate more potential bidding on keywords because they render multiple results.
This isn't that far away. If Dominos had an external API you could build this right now.
The only difference is that you would want a confirmation step with the VUI - i.e. "Alright, you want a large cheese, pepperoni, sausage, & pepper pizza from Dominos, which will take 15 minutes to deliver. Place the order?"
Yeah I don't understand, I can literally call up a pizzeria and do this exact song and dance today whenever I want. I prefer ordering online without talking to anyone. I fail to see the benefit for consumers.
There's a few things glossed over that I don't think AI can solve. For example, if I want to deliver flowers, I kinda want to chose what they look like and how do you do that without, essentially displaying a website with images which would turn it into a glorified google search. If I want a restaurant recommendation... it basically opens yelp and tells me the first hit. And I kinda want to see pictures of the place and read reviews, otherwise what's the point of searching at all?
You can reduce it to a flow of information. And chat is not always the best way to convey that information.
When I want to deliver flowers I want $x worth of flowers to arrive, avoid roses - make sure this poem I wrote is on the tag. I don't care what they look like, any flower is the same (except my wife doesn't like roses much) - the important part is the flower came with a poem.
If we had a secretary (think 1950 when they were common) doing this job mine would choose something nice, yours would come back with your choices. Either way flowers would arrive. Your secretary would probably learn your preferences eventually and soon come with one picture for final approval (which would be right). Mine would remind me a week before our anniversary to start working on the poem.
Both perspectives are correct. There is no reason an AI couldn't handle both, and every reason they should - but of course it will be years (at best) before they do. The first time the AI comes with "I found these flowers, see the pictures [on whatever screen is handy]". I respond with "whatever, just pick something", and the ai decides to not bother asking again. I'm depending on the AI flower ordering algorithm to not make a bad choice - making me happy is harder than making you happy.
I don’t think an AI assistant and visual feedback like google search are mutually exclusive at all.
Imagine a fashion designer having a chat with a human assistant. The designer might tell them a kind of shirt they want to use for a shoot, and if it’s not obvious what to buy the assistant will come back to them with a list of options (pictures of each on a piece of paper, for ex). The designer can ask questions about each one, ask for different shirts, and simply speak when they decide: points “this one”.
This all seems very doable with current AI assistants. Google for ex already has results from recent voice searches waiting for you on your phone.
I have never had much use for AI assistants except using Alexa on my FireTV. I can press a button on the remote or app to start it listening, give it some keywords to search and get results on screen to scroll through and click on. Now that I can direct those voice searches to a browser, it's even better.
If you have a real person as an assistant, and you ask them to buy flowers for somebody, you also don't get to pick exactly what they look like ... unless you ask them to facilitate that with pictures or descriptions.
Real AI would be fully capable of doing this just as well as a person, but it would take a lot more "intelligence" than currently exists.
The real difference is human assistants are paid well to act as their principal's agent. If a PA had a choice buying a nice bunch of flowers or a weedy bunch with a $5 kickback, they'd be dumb to take the kickback and risk their job.
But if I ask Facebook/Google/Amazon to order some flowers for me, why _wouldn't_ they monetise it? Gotta pay for those tens of thousands of engineers somehow.
Ideally. It's possible that the Google business machine might not do a superb job collecting and synthesizing the vast number of these very fine-tuned details and pieces of feedback, at least at this early stage.
This is something a good personal assistant can do easily.
Though eventually AI will gain those capabilities, one way or another.
I think you missed the point. The point isn't chat == text-only, it's that chat == natural-language-style UX. This could well be augmented with images (like is with Siri). Basically "is there a button to do what I want" versus "do I indicate what I want in a human-ish way" - imho that's the defining feature of a "chatbot", not that it's text-driven per se.
Chat bots are on a spectrum between command line and a real human assistant. If I hire an assistant, I probably don't need a manual telling me how to interact with them. A command line is pure mystery. I might be stretching a bit for the metaphor, but the difference is how well they understand human language.
A commandline requires, essentially, learning a new language in order to converse with a computer. A human only requires a small amount of learning in getting the social dynamic relationship right.
Chat bots, virtual assistants, etc. are all pretending to be much closer to the human side of language understanding than they are and they won't be all that useful until they make quite a bit of progress. (and actually I would appreciate a much more commandline-like interface with a voice assistant for now because I'd bet it would be much more likely to give me what I want)
It's like going into a restaurant and asking for a menu, only for the server to respond with "we don't even need menus, you can order whatever you like!"
Just like in the real world, Burger King won't actually let you Have It Your Way, or they'd give me a nice burger that's not overcooked dressed with avocado slices, jalapenos, a thick spread of blue cheese and some strong mustard, in a whole wheat sourdough bun that's not so artifical that it reverts back to dough under compression.
That is very true, and it's something we've struggled with on our bot, but there's also a collorary - every time you add a feature, you have to add UI for it which adds complexity and can have support costs - I have heard anecdotally that when banking apps add a new feature they get a spike of support requests asking "what is this new xyz…?"
This isn't the issue that's addressed by the article. It's another issue entirely. In fact, the article seems to suggest not that users were paralyzed, but that users were overactive and expected too much.
Pull down menus are already being used to replace chatbot experience and is used by millions daily on high-traffic websites. Check out https://airim.co
Anyone who is dealing with this chatbot is probably aware of its main scope -- to provide tech support/answers. Look at the example prompt that is given: e.g.: Reset my Microsoft account password
How try to find that on the Microsoft Support page as it is:
My first instinct is to do a Ctrl-F for "password" or "security"; the first term brings nothing, the second term brings up topics about viruses and ransomware. Even if you realize that you have to go to the "Microsoft account help", you still won't find it. Right now, there's a link for "How to reset your Microsoft account password" but only under the subhed of "Trending topics" -- and I would have never found it if I didn't use Ctrl-F. And past research has found that the vast majority of Internet users do not know how to use Ctrl-F [0] (and I bet it's much worse today, given how most people now just use phone browsers).
Of course the smart thing to do is to use the Search box at the top of the support page -- in fact, I wouldn't even use that because I just use Google for everything, even for finding Stackoverflow answers. But I'm guessing most people don't.
I'd argue that the average user, unlike most anyone who even knows what a command-line interface is, does not care about an interface having "acceptable entry points". And in many such cases, it may be near impossible to design an interface for such users, whose needs and means of expressing them are infinite.
For example, I know that if I can't login to my Microsoft account because my cookies expired and I forgot my password, that I should be searching for "reset microsoft password", or even, "reset microsoft live password". However, I bet there are a lot of people who will state this:
my email doesn't work or i cant see my email
The chatbot helpfully provides a list of options, including "Signing in to Outlook.com", but just in case, options like "Unable to send email". If you search for "my email doesn't work" using the support.microsoft.com search bar, you get a list of Google-like listings pointing to the answers.microsoft.com forums, for topics like "Why doesn't my Hotmail work any more?"
I guess there's no reason why the support searchbar couldn't be tweaked to return the kind of options that the chatbot does, but the chatbot has one key advantage: there's always a "None of the above" option, which takes you to a secondary list of options. This kind of funnel IMHO is way more useful/comfortable than a search interface, in which it's not clear whether you need to rewrite your query, or keep paging through pages of increasingly irrelevant or confusing search results.
Theoretically, the chatbot funnel gets better at funneling people, based on analysis of past users' paths through the decision tree. But even a very dumb bot that doesn't intelligently respond has one more advantage: if you hit "None" a few times in succession, you'll be given a link to "Talk to a person", which signals to the user that, "OK, you win, no need to keep trying search queries, let's get you some human help".
That signal doesn't exist in a traditional webpage or search interface, so the risk is that the user keeps searching the pages in vain until they get so pissed off they just give up. Not everyone has the wherewithal to demand manual help.
Wow, I knew it was a limited beta, but I didn't know it was that limited. As one of the lucky few that had it, I now feel both special and sad that it's gone.
But not that sad -- I didn't use it once after a few easy questions the first day I got access. It would occasionally pop up some suggestions as I was typing messages, but I would always ignore them as they were irrelevant.
I think the biggest issue is that they weren't up front about it. They tried to make it seem like it was an AI doing all the work.
I think if they had straight up said, "this is a human and we're training an AI", it would have been a lot better. It would have allowed them to do things to get stronger feedback, like asking, "was this the right suggestion?" Then when I got irrelevant suggestions, I could give them feedback as to what was wrong and why. But it never asked me for that so I never gave any feedback.
I thought I was just one of millions training the AI and that they would get plenty of signal with all those users. I had no idea it was so limited -- I definitely would have been much more active in giving feedback had I known.
It sounds like you're talking about M suggestions, which are still going strong and have AI doing all of the work. The article is talking about a different aspect of M, the text-based virtual assistant, which you could ask anything and was backed by humans.
The idea behind training like this is a reverse turing test. People speak differently to humans than they do to computers. If the users think they're talking to a machine, they'll talk to the human like they talk to a machine - which gives the training data that is needed for when it is a machine.
M is dead because it failed to be automated, but the headline extends that to all chatbots. some of the hype around chatbots was their potential for a uniform user interface and less need to download new apps. has there been any evidence that chat as an interface has failed?
I've always been bearish about the appeal of chatbots but I did see how they could seem to be a useful interface for many users, especially on support-type sites. They basically seemed to function as a friendlier-version of site search. Yes, ultimately they are an unnecessary middleman facade -- in the same that writing a Google search as a formal question -- e.g. "Where are the best pizza places near me?" -- is unnecessary when you could simply query "best pizza"
But perhaps the perception that your question was being interpreted in an intelligent human way caused users to think differently and rephrase their questions in a way that made it easier to find the most relevant help/support links? I remember how interesting Ask Jeeves seemed to be -- though to be fair, Google wasn't much of a presence in 1997.
If they get good enough, I can totally imagine them being keyboard-driven too. I'd love to be able to just type a quick "email" saying "order paper towels" when I'm at work and not have to shout into my phone in a quiet office.
(To be sure, the tech for a lot of this already exists they're just not exposing a text-based version.)
I wouldn't want to be quite as verbose for a text-based version, but oftentimes it really is easier to type more versus less if you're confident the recipient will read and parse the intent of the whole phrase.
Chatbots are, at present, fancy command-line interfaces. I believe that they have great potential, but I don't think they will transcend the CLI until they are conversational and able to understand language at the level of IBM's Watson. At present, each chatbot has a list of keywords with aliases that activate commands with parameters, and they are unable to act unless they recognize a keyword. They need to be able to infer beyond keywords, and they need to be able to hold the context of a conversation in memory.
> any evidence that chat as an interface has failed?
I love chat as an interface to deployments! Hubot is a great framework/bot for hooking into your own environment. Typing deploy prod master in a Slack channel is great. Why is that better than ssh'ing into a jump box and typing cap deploy prod? It's multi-user! Everyone else can see what's going on.
I think the article overreaches in conflating the problems with chatbots -- which typically don't do the kind of things M promised -- with the cost/scope overrun of M. But I think it's too simplistic to say that Facebook should have just optimized their chatbots. The Facebook service has a much broader and diverse userbase and functionality -- think of the criticism that FB gets for seemingly taking over every aspect of our lives (everything from messaging friends, photo management, video broadcasting, news publishing, gaming, financial transactions).
What purpose would an optimized, limited-scope chatbot for Facebook even look like? Though come to think of it, I can think of a few usecases if Facebook wasn't out to dominate everything about real life. For example:
- When traveling to a new city: "Do I know anyone who lives here or is currently visiting?"
- When wanting to read about or discuss news topics, but only from my current network: "Are any of my friends talking about the election?"
- When bored: "What games are my friends playing?" (I'm thinking back to the time when FB was a games platform for things like Words with Friends)
All of these may be findable through a combination of searches, but I'm not a power user, and I bet most people aren't. I think if I go to the "New York, NY" location page, there's a section that lists friend connections, but a bot that processed a natural language query would be so much smoother.
And what about queries like: "What are my friends doing this weekend?". Searching that exact question brings up nothing of relevance. When I do a search for "weekend", the top results are for things like "Vampire Weekend". I have to scroll down to find a section for "Posts from Friends", and that only contains posts (even from months ago) that contain the literal word, "weekend".
I don't really know how to improve those results, without hurting some other kind of expected functionality. But a chatbot that purports to deal with everyday human questions might be the right interface for everyday quality-of-life questions
I don't disagree; my suggestion was tongue-in-cheek. There is no value in optimizing an application that is so flawed that it shouldn't exist to begin with.
Those flaws derive from a wildly optimistic use case for the technology, though. A much cleaner use case would have been a bot intended for Facebook Help (instead of, or to complement, a KB -- assuming people still need that).
More ambitious maybe, but perhaps not impossible, would be a bot that looks for signs of suicidal tendencies in posts or comments and engages the user in therapeutic conversation. (?)
>"Messaging app Kik staked its company’s future on bots and “chatvertising.”
then:
>"Kik pivoted to blockchain technology."
Is there actually any logical pivot from chatbots to blockchain? I am wondering what of your core tech in the former could allow you to pivot to the latter. Or is this simply grasping at funding?
The gap between the levels of abstraction that humans and machines operate is much bigger than the most AI researchers think. No amount of computation for various kinds of gradients can compensate for that. The next AI breakthrough will be a radical development in knowledge modeling and representation.
I suspect perhaps that the AI community were used to (for decades) solving _no_ problems. Now that _some_ problems are solved in their field (e.g. facial recognition for social networking purposes, playing pong, playing chess) the thinking is that now all problems are going to be solvable. I think we are going to learn that this isn't the case. Perhaps there is just a threshold of problem hardness beyond which we can't get yet at any particular point in technology-time, or perhaps there's a hard barrier waiting to be discovered beyond which current approaches just can never take us regardless of cleverness or hardware speed/density?
>> I suspect perhaps that the AI community were used to (for decades) solving _no_ problems.
Where's that coming from? There's certainly been some important advances recently, but to claim that no progress was made is strange.
Just to give one blatant example, Deep Blue defeated Kasparov in 1997; Chinook had fully solved draughts (checkers) by 2007; TD-Gammon played backgrammon consistently at world champion level by 1995; two computer programs, JACK and WBRIDGE5 won five bridge championships between 2001-2007. All of those are 10 years older than AlphaGo/Zero and each has a very long history going back to the 1950's in the case of draughts AI [1].
You probably haven't hear dabout most of them because they were not advertised by a company with the media clout of Google or Facebook, but they were important successes that solved hard problems. There are many, many more results all over the AI bibliography.
And, just to settle this once and for all- this bibliography starts a lot earlier than 2012.
Wasn't it sort of the same way during the first AI boom? Expert systems providing some real value in limited and sexy prototypes working with a simplified blocks world that hinted at something power world-changing. Then...
> the thinking is that now all problems are going to be solvable
...and then failure and the "AI winter" for a generation after the initial promise had been discredited.
Probably some convergence between the two. You're probably used to changing some syntax around to get Siri/Alexa/* to understand you. That puts some mental tax on you, but you get used to it. Devices will seek to lower that mental tax, but ultimately you'll probably get used to it enough that the tax will feel free and devices won't need to evolve the syntax much past a certain point.
What seems missing in a lot of these threads is the idea of "context", and I think that's where there's lots of room for innovation. Current voice-assistants work "okay" for single-sentence queries, but if the device doesn't understand (or if I fail to phrase things in a way that it's expecting), it doesn't ask clarifying questions, and it doesn't use past exchanges to inform future ones (beyond perhaps some voice training data). It also limits the kinds of things it can do by requiring that all of the necessary information be presented in one utterance. It also raises the "mental tax" on doing "real things" because I know I have to say a long phrase just right or start over, and that's sure to raise anyone's anxiety-levels...
Not only native speaker, you also can't speak with an accent. Most speech recognition currently only works with a very "clean" language free of regional expressions or accents.
Good point. I changed Siri to Spanish to practice my speaking and comprehension, and it was a huge challenge. I assumed it was because Spanish support wasn't as developed as English, but I guess it's that there's not enough training for accents in any language.
"It was easy for M’s leaders to win internal support and resources for the project in 2015, when chatbots felt novel and full of possibility."
Chatbots were new in 2015? I think it might be more accurate to say they were new in the early 1990s, but they had a revival of interest around 2015, driven by the possibility that advances in AI and NLP would allow them to do more.
One place where chatbots still have a large opportunity in front of them is in automobiles. The driver is not suppose to hold their cell phone while driving. But they can talk to the phone, and voice-to-text allows them to interact with chatbots. Someone in the industry told me that Toyota has inked a deal with Pandorabots:
Likewise, during and after my time at Celelot [1], I talked to a lot of salespeople, and they told me that was the #1 thing they'd like to see, as an interface for SalesForce. They wanted to be able to meet a client, make a sale, and then drive home, and while they were driving, they could talk to their cell phone and the Celelot service would put all the data into SalesForce for them.
I have a question for AI/NLP experts (and I might use the wrong terms).
When our voices are processed in systems like these are the results compared to a corpus of our own speech, a local (geo) population, or language speakers as a whole?
I’ve been curious how slang and people with poor grammar affect results of other users. Will we start seeing “thicc” instead of “thick” and “dat” instead of “that” over time?
One of the reasons I’ve been pondering this (anecdote alert) is that I frequently see iOS dictation spelling bizarrely.
The dictionary in keyboard apps is often crowd sourced, especially sentence predictions. Your own common phrases are made more significant as well. I would assume speech to text systems would do the same. Coherent sentences is very important for the recognition algorithms, and you can often see word corrections happening live as you complete your sentence.
Which is exactly how much of today's "AI" in "AI" startups works, as I've heard from a knowledgeable person. "We're doing AI and initially handing over some parts to cheap human labor" sounds much more growthy than "we mainly resell cheap human labor".
Obivious sign that chatsbots are dead (which was also stated in the article's original title): FB's Messenger doesn't show featured chatbots on the home tab anymore.
Voice chatbots are a great way to convert your multi-tasking capable pc/phone into a single-tasking one.
The benefit for text chatbots is you can concurrently interact with multiple bots. So one can be talking to a customer over the phone and use multiple chatbots to find shipping, products, place reservations on stock, verify a credit card and so on.
IMO chatbots are still hugely valuable in Enterprise app space, but also any traditiona multi-tasking environment like customer support, telesales, and environments where is too complex to get a bot to handle everything .... sometimes is better just to let the human brain be (literally) the 'meat in the sandwich' to glue all the chatbot feeds together and create the outcome required. Production line planning is potential example - ask bots to tell you about current environment, stock levels, backlog, shift resources, cashflow, and then human decides which work gets done today. Most good planners i have met can do the planning vastly better, and quicker, than powerful systems with optimisation algorithms and Tb of data.
Sounds like despite their best efforts, NLP + full automation still has a long way to go. Unless they didn't get the resources they need to push it over the edge?
> That’s because most of the tasks fulfilled by M required people.
> One source familiar with the program estimates M never surpassed 30 percent automation. Last spring, M’s leaders admitted the problems they were trying to solve were more difficult than they’d initially realized.
> But as it became clear that M would always require a sizable workforce of expensive humans, the idea of expanding the service to a broader audience became less viable.
The scope was just way too large. Think about it. You could ask M for anything. How many services do you use or pull data from on a day to day basis? Now how many of them have an API?
Until nearly everything has a publicly accessible API I don’t see how something like M could ever happen without extensive human interaction. I’ve been giving this a lot of thought lately. I pull information from a few intranet sites routinely for managing travel for my job. It’s almost exclusively online and because of the scattered sites it can be annoying to manage at times. It’s perfect for automation and an AI assistant but without any kind of programmable interface how would an AI assistant ever work with the sites?
I think Alexa and Google Assistant ultimately have the right idea, privacy issues aside. Start with a small enough scope and attract so many users that eventually services are compelled to support the devices. Over time the scope covers almost everything.
I agree with this. It's a matter of scope and therefore a product problem. What can be done using NLP today is not a secret and I'm sure Facebook was already aware of the shortcomings. They could have reduced to scope to a more tractable problem, but they chose to cover everything that a consumer might want.
I think there are a lot of good use cases that can be automated today.
As someone who had access to M, initially my friends and I mostly just asked it to do all kinds of obscure things, which at scale would be very difficult (and I imagine most users would enjoy actually using it for as well for the sake of entertainment).
We asked that it drew pictures of us, and tons of other queries around memes, ect.
Outside of that the only other use case was having it remind me to wish friends a happy birthday when they otherwise didn't list it on facebook.
I wonder how many other products are secretly hand cranked. When you talk to Alexa, and the algorithm can't tell what you are saying, is there a guy in a call centre somewhere typing up what you are saying? Either to service your request or to provide a dataset of transcribed tricky samples?
Alexa is bad enough, the idea that a human really is listening to me in my home is somehow worse.
I think its interesting that they killed it because it was a "cost center" and never even tried to monetize it. I wonder how much it would actually cost if you measured the contractors to the minute and only charged for the tasks that humans needed to perform?
I have only played around with Magic. Really like the idea of it but never got any real use of it. They executed all tasks I threw at them so slowly which in turn costed me too much money (asking them to warn me each morning if it is going to rain.. well, that cost me 40 minutes the first day, then I canceled that task).
Since then I got a credit card with concierge service, which I pay around 10 USD/month for. Solves all my easy tasks for a much better price.
I used Magic recently but I had never heard of the other three so thanks for that info! I asked them to cancel a gym membership and it took them 70 minutes. Ugh. After that happened I am hesitant to ask them to do anything else. What credit card are you using for concierge service and what kind of tasks do you use it for? I assume theres limits to the kinds of tasks that Magic could do but your concierge can't.
There are certainly tasks that a virtual assistant can handle and not a concierge-service, but in my personal case those did not deserve the extra cost.
I use the concierge service over email, and they usually take a few hours to respond. Magic started to work on my tasks within a few minutes, often faster, which already here changes what you can use it for. It is possible to call the concierge over phone when you need it faster, but email has served my need.
Here is a few things I have used them for the last month or so:
* I traveled away two weeks recently, wanted to leave my car at a car service the day before and have them fix it and store it until I was back home. Had the concierge call around and book that for me. Booked change of winter tires this way as well.
* Wanted a get a haircut a certain time on a holiday day. Had them book that as I did not no anyone that was open.
* Called them 30 minutes ahead of a full-booked train and they managed to get my a ticket (still not sure how they did this).
* Tried to buy outsold tickets to a concerts, which they did not manage to do. Wrote back that they were sorry.
* Investigate the ability to book a meeting room within a 500 meters-area.
All this costed me around 10 USD/month, and no premium when you purchase things. I have just started, possible I find a better use for it in the future.
I live in Sweden and use https://www.supremecard.se. Most credit cards buy the same service from the same third party concierge service, so I just got the cheapest credit card as the service is the same.
It seemed obvious that it was and would remain a cost-center, but I figured it would be the kind of loss-leader that would eventually turn into a service that would be scalable and profitable. It sounds like the scope of M's service was too ambitious, but that doesn't necessarily mean that it couldn't eventually provide insights as to the optimal (and profitable) balance between AI and human intervention. Given Facebook's overall profit margins and ability to throw money at moonshots, maybe M's losses were extreme to the point that a monetization effort was seen as unlikely?
The chatbot/AI race reminds me of AOL, CompuServe and France's Minitel back in the days. They wanted to be THE network, but we all know what happened.
So Google, Amazon, Facebook, Apple, etc, are all trying to win the AI game, but in the end, to truly have an AI that is flexible and can learn on its own, we would need to go beyond APIs and create a way for machine to understand things, concepts and ideas. This is a massive task that cannot be done by one company alone. It is a new way of writing apps and services.
The reason why it died has nothing to do with the underlying AI-like technology (which was awful to begin with). It's because chatting, talking with someone takes more effort (mentally, physically, emotionally) than selecting from a list. In terms of unit quantity, how many motions are required?
The article suggests that facebook is ending chat bots support, thats not the case - Services like https://jumper.ai/ are still alive, its AI assisted chat bot with payments inside messenger.
You would think the largest social media company would be able to make a chatbot and virtual assistant. I guess Facebook isn't really hiring the best machine learning talent, as all of their AI projects suck.
I wonder if they've already pre-written the 2020 version of this piece, "Samsung's Virtual Assistant Bixby Is Dead", which will go on to call out the hubris of the voice interface gold rush.
Interestingly, they actually have a chatbot interface: All Erin Griffith has to do is say, "Update Virtual Assistant article for Bixby," and an AI will substitute in references to Samsung and Bixby and immediately publish.
I use two voice interfaces at least a few times every day, and I think that’s becoming common. Alexa is being built into devices all over. Voice interfaces are obviously different from the subject of this piece.
Do you use them at home or out? I can see the usefulness of voice interfaces in the home although I don't find the idea compelling. What I really don't see is people using voice regularly outside the home. I don't see it happening at all right now and we've had voice assistants for a number of years now.so it seems unlikely that people will suddenly start liking the idea.
I don't know whether this qualifies as chatbots, but I simply like the command-response interface of chat based services for selecting options without an AI layer interpreting what I mean.
I usually I prefer the chat interface rather than a clunky browser UI where companies and govt agencies provide it, and also that in some cases you can add them to your address book like any other contact.
I guess I'm one of those computer technologists who haven't bought into the whole AI taking over the world hype.
I've been thinking about this for a while, there are some cool things you could do with command-based interfaces.
My take on it atm is to have a flexible (should work if you mistype a letter) command-based interface which is both voice and text based, where you can perform commands like:
play songs from coldplay
set alarm to friday at 3pm
make list with words a, b and c. Give me a random item from the list
There are some tricky parts though:
- Should it be context aware? Notice how I, in the second part of the last command mentioned "the list". I think it should, and maybe even ask "which list?" in case there is more than one.
- How do you define commands in a way that makes it easy to add and compose commands, and reduces or eliminates the ambiguity for the parser?
- Is the kind of parsing you do in voice recognition similar enough to be compatible with text parsing?
If someone know about some tool similar to this, please let me know.
$ Play all Coldplay songs that were the most popular of their album
$ Play all Coldplay songs about love
$ Play the list of Coldplay songs for my 8 a.m. alarm
Now, how would the UI go about specifying which previously defined list you want to refer to? Sometimes you'll want to pick the most recent, sometimes the one most closely matching the definition, sometimes the one matching the "alarm clock" format most, sometimes you'll want to offer the user a choice among all objects similar enough to the description (What about if the object was built iteratively, do you offer intermediary objects as possible choices?), sometimes you'll want to ask a short question to restrict possible choices, if there's seems to be a clear criterion that probably improves understanding fast enough more relative to the time it takes to ask (this can depend on the user/environment/situation to choose between fast/precise answer).
To me the difficulty is that the choice of strategy can be built on the fly depending on context by humans, usually without building an understanding of all possible strategies but instead by just magically guessing a strategies which seems to fit well enough. This means being able to learn strategies based on previous experiences and building an evolving understanding of contexts.
Now this is probably not necessary to build a functioning interactor, but this is a reasonable description of normal human interaction, and the capacity for systems to adapt to contexts without much more outside help than humans do is going to be a good way to rate them.
Notice that your first and last example are somewhat simple to explain conceptually, so they should also be simple to build by composing the right building blocks:
"Play" gives away the fact you're looking for a song or list of songs, songs which you'll hopefully have in an internal knowledge base.
"all Coldplay songs": just filter by Coldplay. If your library is big enough, you could figure out that it refers to the artist.
"that": we need to filter again by the condition which follows.
Speeding up... "were the most popular of their album": take each song and corresponding album, sort album by property "popularity" and check it's the first one. You'll need to know "popular" refers to the property "popularity".
The third one is pretty similar. The second one could be easy too, but the piece "about love" is complicated on it's own.
IMO this means three things: firstly, words don't map directly to commands / capabilities, which means that having a composable way define capabilities is hard. You'll likely need to define many ways to do each thing, but you could add them one by one, over time. Secondly, the tool should be able to tell that it doesn't know what you're talking about (what is this "about love" thing about?!?). Lastly, it should be interactive, so that it can ask/tell you when it doesn't know ("what list do you mean, A or B").
Your comment about context is on point. We can't expect the tool to understand context it doesn't know about, which is why we cannot expect human level from this. But it doesn't have to be human level, it just needs to be good enough to be useful, and we do have a lot of space to improve.
We spend years learning this stuff, we could slowly teach a few tricks to the computer ;)
Yea I agree. Because "AI" is out of our reach.
If we can get hold on something we give it another name (like neural networks) and move on.
Continuing the search for "AI".