Thoughts on Siri and Artificial Intelligence

When I was at Uni, I did some work that covered a bit of Artificial Intelligence.  Now AI is a huge field, but my project was on a basic speech recognition engine.  There are many areas being researched in AI, but one large problem is “learning”.   If we could train an AI system by feeding it more input, and if it can continually get better at categorising/recognising/classifying the input, then it would appear to be get more “intelligent”.

One problem is getting enough input.  For my speech recognition project at Uni, I had access to a CD with a handful of sound samples – of different people saying the same set of words.  The idea is, the more it heard different people saying, eg. the number “One”, then it should get eventually be able to recognise anyone saying “One”.   It would have been wonderful to record everyone in my unit, or even everyone doing Engineering as input to my project – but the logistics were too hard.

The problem of getting a large bank of input is not just for my speech recognition project – it’s for any AI learning system.

And that’s one thing that I’m so excited about for Siri – the new Personal Assistant Application on the iPhone 4S.

Imagine this – you gather over 300 researchers from 25 of the top research institutions in the world to work on an Artificial Intelligence project to build an AI assistant.   Then you put the findings and results of their work into a common handheld device, combine it with a voice recognition system, and unleash it to millions of people around the world who might be using it regularly, if not just to play with for novelty’s sake.

That is one MASSIVE bank of input data from an Artificial Intelligence point of view.

Just imagine the amount of new input data that is being received and processed.  On one hand, I wonder if Apple are collecting and storing the phrases people are saying into Siri and the privacy concerns around this.  But on the other hand, if Apple are using this data for Siri to continually improve its responses and intelligence, then you’d hope that it will get better at producing more useful results as time moves on.

The other thing I’m excited about (wow, readers must think I’m a real Apple fanboy), is that, yes, even though speech/voice recognition is not new by any means, Siri’s conversational interface is in line with the touch UI of the Apple iPad/iPhone – in that user’s shouldn’t need to THINK about HOW to interact with the technology, it should just work naturally.

All those patents about how the iPhone touch interface bounces at the end of scrolling, and how it slows down – those are concepts to make the screen appear and act like a physical object with momentum and inertia – it makes it seem more “real”.   And so with Siri – I know it’s not perfect, but the user doesn’t have to remember some strange sequence of words or to re-work sentences into specific structures, or speak slowly one syllable or word at a time to make it understand you.   You should be able to talk to it like any other person, and it should just understand.   I like where Apple is heading with this – where technology really is there to “assist” humans when required, and us “humans” don’t need to try to translate our intentions into the computer’s view of the world.

Parking machine with Windows errors

Windows errors on parking machine

I was at the Perth airport on the weekend picking up Wifey.  Went to the parking machine to pay for the ticket and saw this amusing error.

Seems like the parking machine computer needs a bit of an update!  I was surprised to see it running Windows too, but maybe it’s Windows CE.  I tried to press onto the Start button, but (unfortunately) the screen is not a touch screen.

How did parking machines get so complicated?   I remember when the Perth airport carpark used to give you a punch card on entry that you had to give to an attendant on exiting.   I guess running 6 or so PCs and associated software  and maintenance is cheaper in the long run.