Learning Scene-Independent Group Descriptors for Crowd Understanding


Groups are the primary entities that make up a crowd. Understanding group-level dynamics and properties is thus scientifically important and practically useful in a wide range of applications, especially for crowd understanding. In this paper, we show that fundamental group-level properties, such as intra-group stability and inter-group conflict, can be systematically quantified by visual descriptors. This is made possible through learning a novel collective transition prior, which leads to a robust approach for group segregation in public spaces. From the former, we further devise a rich set of group-property visual descriptors. These descriptors are scene-independent and can be effectively applied to public scenes with a variety of crowd densities and distributions. Extensive experiments on hundreds of public scene video clips demonstrate that such property descriptors are complementary to each other, scene-independent, and they convey critical information on physical states of a crowd. The proposed group-level descriptors show promising results and potentials in multiple applications, including crowd dynamic monitoring, crowd video classification, and crowd video retrieval.

IEEE Transactions on Circuits and Systems for Video Technology (IEEE T-CSVT), 2017