Along with the recent advances in technology, large quantities of multi-modal
data has arisen and became prevalent. Hence, effective and efficient retrieval,
organization and analysis of such data constitutes a big challenge. Both news
photographs on the web and news videos on television form this kind of data by
covering rich sources of information. People are mostly the main subject of the
news; therefore, queries related to a specific person are often desired.
In this study, we propose a graph based method to improve the performance of person queries in large news video and photograph collections. We exploit the multi-modal structure of the data by associating text and face information. On the assumption that a person's face is likely to appear when his/her name is mentioned in the news, only the faces associated with the query name are selected first to limit the search space for a query name. Then, we construct a similarity graph of the faces in this limited search space, where nodes correspond to the faces and edges correspond to the similarity between the faces. Among these faces, there could be many faces corresponding to the queried person in different conditions, poses and times. There could also be other faces corresponding to other people in the news or some non-face images due to the errors in the face detection method used. However, in most cases, the number of corresponding faces of the queried person will be large, and these faces will be more similar to each other than to others. To this end, the problem is transformed into a graph problem, in which we seek to find the densest component of the graph. This most similar subset (densest component) is likely to correspond to the faces of the query name. Finally, the result of the graph algorithm is used as a model for further recognition when new faces are encountered. In this study, it has been shown that the graph approach can also be used for detecting the faces of the anchorpersons without any supervision.