The research of human-computer interaction is no longer the design of devices and psychological experiments of windows layouts but evaluates to a new stage: intelligent interaction. One aspect is that computers should be able to accept audio and visual sensory inputs, and then make some kind of analysis and interpretation, and then provide intuitive feedbacks by synthesizing speech, video, or actions. Fundamentally, besides speech recognition, computers should be able to recognize, interpret, and understand human actions and behaviors from visual inputs.