It's easy to forget that the speech recognition isn't happening on your phone. It's happening on Google's servers. It's Google's vast database of speech data that makes the speech recognition work so well. It would be hard to pack all that into a local device.
His second idea is one that many companies are working on, and Nokia have outright said it's what they are aiming for, and that's the use of sensor technology as input, and providing suitable programming hooks and api's around them.
Could a phone recognize the gesture of raising the camera up and then holding it steady to launch the camera application? Could we talk to the phone to adjust camera settings? (There's a constrained language around lighting and speed and focus that should be easy to recognize.)
Certainly worth a read both of the article, and the details comments from other readers.