This paper proposes an approach to searching human behaviors in videos using spatial-temporal words which are learnt from unlabelled data with various human behaviors through unsupervised learning. Both the query and the searched videos are represented by codewords frequencies, which capture the intrinsic information of motion and appearance of human behaviors. This representation further enables us to make use of integral histograms to accelerate the searching procedure. The performance also benefits from our feature representation that, through a MAX-like operation, may sim-late the cortical equivalent of the machine-vision "window of analysis". Examples of challenging sequences with complex behaviors, including tennis and ballet, are shown.