Along with the recent advances in technology, large quantities of multi-modal
data has arisen and became prevalent. Hence, effective and efficient retrieval,
organization and analysis of such data constitutes a big challenge. Both news
photographs on the web and news videos on television form this kind of data by
covering rich sources of information. People are mostly the main subject of the
news; therefore, queries related to a specific person are often desired.
In this study, we propose a graph based method to improve the performance of
person queries in large news video and photograph collections. We exploit the
multi-modal structure of the data by associating text and face information. On
the assumption that a person's face is likely to appear when his/her name is
mentioned in the news, only the faces associated with the query name are
selected first to limit the search space for a query name. Then, we construct a
similarity graph of the faces in this limited search space, where nodes
correspond to the faces and edges correspond to the similarity between the
faces. Among these faces, there could be many faces corresponding to the queried
person in different conditions, poses and times. There could also be other faces
corresponding to other people in the news or some non-face images due to the
errors in the face detection method used. However, in most cases, the number of
corresponding faces of the queried person will be large, and these faces will be
more similar to each other than to others. To this end, the problem is
transformed into a graph problem, in which we seek to find the densest component
of the graph. This most similar subset (densest component) is likely to
correspond to the faces of the query name. Finally, the result of the graph
algorithm is used as a model for further recognition when new faces are
encountered. In this study, it has been shown that the graph approach can also
be used for detecting the faces of the anchorpersons without any supervision.