21.3.4.1 Combined Audio Visual Recognition

Chapter Contents (Back)
Real Time Vision. Application, Lipreading. Speech.

Wu, J.X.[Jian-Xiong], Chan, C.[Chorkin],
Recognition of phonetic labels of the timit speech corpus by means of an artificial neural network,
PR(24), No. 11, 1991, pp. 1085-1091.
WWW Version. 0401
BibRef

Wu, J.T.[Jian-Tong], Tamura, S.[Shinichi], Mitsumoto, H.[Hiroshi], Kawai, H.[Hideo], Kurosu, K.[Kenji], Okazaki, K.[Kozo],
Neural network vowel-recognition jointly using voice features and mouth shape image,
PR(24), No. 10, 1991, pp. 921-927.
WWW Version. 0401
BibRef

Lavagetto, F.,
Time-Delay Neural Networks for Estimating Lip Movements from Speech Analysis: A Useful Tool in Audio Video Synchronization,
CirSysVideo(7), No. 5, October 1997, pp. 786-800.
IEEE Top Reference. 9710
BibRef

Movellan, J.R., Mineiro, P.,
Robust Sensor Fusion: Analysis and Application to Audio-Visual Speech Recognition,
MachLearn(32), No. 2, August 1998, pp. 85-100. 9810
BibRef

Wachsmuth, S.[Sven], Socher, G.[Gudrun], Brandt-Pook, H.[Hans], Kummert, F.[Franz], Sagerer, G.[Gerhard],
Integration of Vision and Speech Understanding Using Bayesian Networks,
Videre(1), No. 4, Winter 2000, pp. xx-yy. 0005
BibRef
Earlier: A1, A3, A2, A4, A5:
Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks,
CVS99(231 ff.).
HTML Version. 0209
BibRef

Chien, J.T., Lin, M.S.,
Frame-synchronous noise compensation for hands-free speech recognition in car environments,
VISP(147), No. 6, December 2000, pp. 508-515. 0101
BibRef

Patel, D., Turner, L.F.,
Effects of ATM network impairments on audio-visual broadcast applications,
VISP(147), No. 5, October 2000, pp. 436-444. 0101
BibRef

Aleksic, P.S.[Petar S.], Williams, J.J.[Jay J.], Wu, Z.L.[Zhi-Lin], Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features,
JASP(2002), No. 11, November 2002, pp. 1213.
HTML Version. 0304
BibRef
Earlier:
Audio-visual continuous speech recognition using MPEG-4 compliant visual features,
ICIP02(I: 960-963).
IEEE Abstract. IEEE Top Reference. 0210
BibRef

Aleksic, P.S.[Petar S.], Katsaggelos, A.K.[Aggelos K.],
Audio-Visual Biometrics,
PIEEE(94), No. 11, November 2006, pp. 2025-2044.
IEEE DOI Link 0611
BibRef

Aleksic, P.S.[Petar S.], Katsaggelos, A.K.[Aggelos K.],
Speech-to-video synthesis using MPEG-4 compliant visual features,
CirSysVideo(14), No. 5, May 2004, pp. 682-692.
IEEE Abstract. IEEE Top Reference. 0407
BibRef
Earlier:
Comparison of MPEG-4 Facial Animation Parameter Groups with Respect to Audio-Visual Speech Recognition Performance,
ICIP05(III: 501-504).
IEEE DOI Link 0512
BibRef

Jiang, J.T.[Jin-Tao], Alwan, A.[Abeer], Keating, P.A.[Patricia A.], Auer Jr., E.T.[Edward T.], Bernstein, L.E.[Lynne E.],
On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics,
JASP(2002), No. 11, November 2002, pp. 1174.
HTML Version. 0304
BibRef

Sodoyer, D.[David], Schwartz, J.L.[Jean-Luc], Girin, L.[Laurent], Klinkisch, J.[Jacob], Jutten, C.[Christian],
Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli,
JASP(2002), No. 11, November 2002, pp. 1165.
HTML Version. 0304
BibRef

Zotkin, D.N.[Dmitry N.], Duraiswami, R.[Ramani], Davis, L.S.[Larry S.],
Joint Audio-Visual Tracking Using Particle Filters,
JASP(2002), No. 11, November 2002, pp. 1154.
HTML Version. 0304
BibRef

Heckmann, M.[Martin], Berthommier, F.[Frédéric], Kroschel, K.[Kristian],
Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1260.
HTML Version. 0304
BibRef

Nefian, A.V.[Ara V.], Liang, L.H.[Lu-Hong], Pi, X.B.[Xiao-Bo], Liu, X.X.[Xiao-Xing], Murphy, K.[Kevin],
Dynamic Bayesian Networks for Audio-Visual Speech Recognition,
JASP(2002), No. 11, November 2002, pp. 1274.
HTML Version. 0304
BibRef

Nefian, A.V.[Ara V.], Liang, L.H.[Lu Hong], Fu, T.Y.[Tie-Yan], Liu, X.X.[Xiao Xing],
A Bayesian Approach to Audio-Visual Speaker Identification,
AVBPA03(761-769).
HTML Version. 0310
BibRef

Patterson, E.K.[Eric K.], Gurbuz, S.[Sabri], Tufekci, Z.[Zekeriya], Gowdy, J.N.[John N.],
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus,
JASP(2002), No. 11, November 2002, pp. 1189.
HTML Version. 0304
BibRef

Gurbuz, S.[Sabri], Patterson, E.K.[Eric K.], Tufekci, Z.[Zekeriya], Gowdy, J.N.[John N.],
Affine-Invariant Visual Features Contain Supplementary Information to Enhance Speech Recognition,
AVBPA01(175).
HTML Version. 0310
BibRef

Garg, A.[Ashutosh], Pavlovic, V.[Vladimir], Rehg, J.M.[James M.],
Boosted learning in dynamic Bayesian networks for multimodal speaker detection,
PIEEE(91), No. 9, September 2003, pp. 1355-1369.
IEEE DOI Link 0309
BibRef
Earlier:
Audio-visual speaker detection using dynamic Bayesian networks,
AFGR00(384-390).
IEEE DOI Link 0003
BibRef

Pavlovic, V.[Vladimir], Garg, A.[Ashutosh], Rehg, J.M.[James M.], Huang, T.S.[Thomas S.],
Multimodal Speaker Detection using Error Feedback Dynamic Bayesian Networks,
CVPR00(II: 34-41).
IEEE Abstract. IEEE Top Reference.
WWW Version. 0005
BibRef

Pavlovic, V., Berry, G., and Huang, T.S.,
Integration of Audio/Visual Information for Use in Human-Computer Intelligent Interaction,
ICIP97(I: 121-124).
IEEE DOI Link BibRef 9700

Choudhury, T.[Tanzeem], Rehg, J.M., Pavlovic, V., Pentland, A.P.,
Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection,
ICPR02(III: 789-794).
IEEE DOI Link 0211
BibRef

Pavlovic, V.[Vladimir],
Multimodal tracking and classification of audio-visual features,
ICIP98(I: 343-347).
IEEE DOI Link 9810
BibRef

Rehg, J.M.[James M.], Murphy, K.P.[Kevin P.], Fieguth, P.W.[Paul W.],
Vision-Based Speaker Detection Using Bayesian Networks,
CVPR99(II: 110-116).
IEEE Abstract. IEEE Top Reference.
WWW Version. More particuarly the one talking. BibRef 9900

Kalberer, G.A.[Gregor A.], Müller, P.[Pascal], Van Gool, L.J.[Luc J.],
Visual speech, a trajectory in viseme space,
IJIST(13), No. 1, 2003, pp. 74-84.
WWW Version. 0308
BibRef

Sharma, R., Yeasin, M., Krahnstoever, N., Rauschert, I., Cai, G., Brewer, I., MacEachren, A.M., Sengupta, K.,
Speech-gesture driven multimodal interfaces for crisis management,
PIEEE(91), No. 9, September 2003, pp. 1327-1354.
IEEE DOI Link 0309
BibRef

Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.,
Recent advances in the automatic recognition of audiovisual speech,
PIEEE(91), No. 9, September 2003, pp. 1306-1326.
IEEE DOI Link 0309
BibRef

Kaynak, M.N., Zhi, Q., Cheok, A.D., Sengupta, K., Jian, Z., Chung, K.C.,
Analysis of Lip Geometric Features for Audio-Visual Speech Recognition,
SMC-A(34), No. 4, July 2004, pp. 564-570.
IEEE Abstract. IEEE Top Reference. 0407
BibRef

Foo, S.W.[Say Wei], Lian, Y.[Yong], Dong, L.[Liang],
Recognition of visual speech elements using adaptively boosted hidden Markov models,
CirSysVideo(14), No. 5, May 2004, pp. 693-705.
IEEE Abstract. IEEE Top Reference. 0407
BibRef

Albiol, A.[Alberto], Torres, L.[Luis], Delp, E.J.[Edward J.],
Fully automatic face recognition system using a combined audio-visual approach,
VISP(152), No. 3, June 2005, pp. 318-326.
WWW Version. 0510
BibRef
Earlier:
A Fast Anchor Person Searching Scheme in News Sequences,
AVBPA01(366).
HTML Version. 0310
BibRef
And:
An Unsupervised Color Image Segmentation Algorithm for Face Detection Applications,
ICIP01(II: 681-684).
IEEE Abstract. IEEE Top Reference. 0108
BibRef
Earlier:
Optimum Color Spaces for Skin Detection,
ICIP01(I: 122-124).
IEEE Abstract. IEEE Top Reference. 0108
BibRef

Kleindienst, J.[Jan], Macek, T.[Tomáš], Serédi, L.[Ladislav], Šedivý, J.[Jan],
Interaction framework for home environment using speech and vision,
IVC(25), No. 12, 3 December 2007, pp. 1836-1847.
WWW Version. 0710
BibRef
Earlier:
Djinn: Interaction Framework for Home Environment Using Speech and Vision,
CVHCI04(153-164).
WWW Version. 0505
Multi-modal; Computer-vision; Context-aware; Speech recognition BibRef

Palanivel, S., Yegnanarayana, B.,
Multimodal person authentication using speech, face and visual speech,
CVIU(109), No. 1, January 2008, pp. 44-55.
WWW Version. 0801
Multimodal person authentication; Face tracking; Eye location; Visual speech; Multiscale morphological dilation and erosion; Autoassociative neural network BibRef

Talantzis, F., Pnevmatikakis, A., Constantinides, A.G.,
Audio-Visual Active Speaker Tracking in Cluttered Indoors Environments,
SMC-B(39), No. 1, February 2009, pp. 7-15.
IEEE DOI Link 0902
BibRef
Earlier: SMC-B(38), No. 3, June 2008, pp. 799-807.
IEEE DOI Link 0711
The top one is the special issue, it was published early in the other issue. BibRef

Chetty, G.[Girija], Wagner, M.[Michael],
Robust face-voice based speaker identity verification using multilevel fusion,
IVC(26), No. 9, 1 September 2008, pp. 1249-1260.
WWW Version. 0806
BibRef
Earlier:
Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features,
PReMI07(469-478).
Springer DOI Link 0712
Lip; 3D Face; Voice; Biometric; Identity verification; Robust; Multilevel fusion BibRef

Delakis, M.[Manolis], Gravier, G.[Guillaume], Gros, P.[Patrick],
Audiovisual integration with Segment Models for tennis video parsing,
CVIU(111), No. 2, August 2008, pp. 142-154.
WWW Version. 0808
Hidden Markov Models; Segment Models; Multimodal fusion; Video indexing; Video summarization BibRef

Vajaria, H.[Himanshu], Sankar, R.[Ravi], Kasturi, R.[Ranga],
Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization,
CirSysVideo(18), No. 11, November 2008, pp. 1608-1617.
IEEE DOI Link 0811
BibRef

Vajaria, H.[Himanshu], Islam, T.[Tanmoy], Sarkar, S.[Sudeep], Sankar, R.[Ravi], Kasturi, R.[Ranga],
Audio Segmentation and Speaker Localization in Meeting Videos,
ICPR06(II: 1150-1153).
WWW Version. 0609
BibRef

Hospedales, T.M.[Timothy M.], Vijayakumar, S.[Sethu],
Structure Inference for Bayesian Multisensory Scene Understanding,
PAMI(30), No. 12, December 2008, pp. 2140-2157.
IEEE DOI Link 0811
Audio-visual inputs. speakers in meetings. BibRef

Liu, Z.C.[Zi-Cheng], Cohen, M., Bhatnagar, D., Cutler, R., Zhang, Z.Y.[Zheng-You],
Head-Size Equalization for Improved Visual Perception in Video Conferencing,
MultMed(9), No. 7, November 2007, pp. 1520-1527.
IEEE DOI Link 0905
BibRef

Liu, Z.C.[Zi-Cheng], Cutler, R.[Ross], Cohen, M.[Michael], Zhang, Z.Y.[Zheng-You],
System and method for head size equalization in 360 degree panoramic images,
US_Patent7,184,609, Feb 27, 2007
WWW Version. BibRef 0702

Cutler, R.[Ross],
User interface for a system and method for head size equalization in 360 degree panoramic images,
US_Patent7,149,367, Dec 12, 2006
WWW Version. BibRef 0612

Cutler, R.[Ross], Kapoor, A.[Ashish],
System and method for audio/video speaker detection,
US_Patent7,343,289, Mar 11, 2008
WWW Version. BibRef 0803

Heracleous, P., Aboutabit, N., Beautemps, D.,
Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in Cued Speech for French,
SPLetters(16), No. 5, May 2009, pp. 339-342.
IEEE DOI Link 0903
BibRef

Zhang, C.[Cha], Yin, P.[Pei], Rui, Y.[Yong], Cutler, R., Viola, P., Sun, X.D.[Xin-Ding], Pinto, N., Zhang, Z.Y.[Zheng-You],
Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos,
MultMed(10), No. 8, December 2008, pp. 1541-1552.
IEEE DOI Link 0905
BibRef

Lee, J.S.[Jong-Seok], Park, C.H.[Cheol Hoon],
Robust Audio-Visual Speech Recognition Based on Late Integration,
MultMed(10), No. 5, August 2008, pp. 767-779.
IEEE DOI Link 0905
BibRef

Saenko, K.[Kate], Livescu, K.[Karen], Glass, J.[James], Darrell, T.J.[Trevor J.],
Multistream Articulatory Feature-Based Models for Visual Speech Recognition,
PAMI(31), No. 9, September 2009, pp. 1700-1707.
IEEE DOI Link 0907
Lip opening, lip rounding features. BibRef

Saenko, K.[Kate], Livescu, K.[Karen], Siracusa, M.[Michael], Wilson, K.[Kevin], Glass, J.[James], Darrell, T.J.[Trevor J.],
Visual Speech Recognition with Loosely Synchronized Feature Streams,
ICCV05(II: 1424-1431).
IEEE DOI Link 0510
BibRef

Schuller, B.[Bjorn], Muller, R.[Ronald], Eyben, F.[Florian], Gast, J.[Jurgen], Hornler, B.[Benedikt], Wollmer, M.[Martin], Rigoll, G.[Gerhard], Hothker, A.[Anja], Konosu, H.[Hitoshi],
Being bored? Recognising natural interest by extensive audiovisual integration for real-life application,
IVC(27), No. 12, November 2009, pp. 1760-1774,.
Elsevier DOI Link
WWW Version. 0910
Interest recognition; Affective computing; Audiovisual processing BibRef

Althoff, F.[Frank], McGlaun, G.[Gregor], Lang, M.K.[Manfred K.], Rigoll, G.[Gerhard],
Evaluating Multimodal Interaction Patterns in Various Application Scenarios,
GW03(421-435).
WWW Version. 0405
BibRef


Lee, J.S.[Jong-Seok], Ebrahimi, T.[Touradj],
Two-Level Bimodal Association for Audio-Visual Speech Recognition,
ACIVS09(133-144).
Springer DOI Link 0909
BibRef

Marchegiani, M.L.[Maria Letizia], Pirri, F.[Fiora], Pizzoli, M.[Matia],
Multimodal Speaker Recognition in a Conversation Scenario,
CVS09(11-20).
Springer DOI Link 0910
BibRef

Kumar, K.[Kshitiz], Navratil, J.[Jiri], Marcheret, E.[Etienne], Libal, V.[Vit], Ramaswamy, G.[Ganesh], Potamianos, G.[Gerasimos],
Audio-visual speech synchronization detection using a bimodal linear prediction model,
Biometrics09(53-59).
IEEE DOI Link 0906
BibRef

Karam, W.[Walid], Mokbel, C.[Chafic], Greige, H.[Hanna], Chollet, G.[Gérard],
Audio-Visual Identity Verification and Robustness to Imposture,
ICB09(796-805).
Springer DOI Link 0906
BibRef

El-Sallam, A.A.[Amar A.], Mian, A.S.[Ajmal S.],
Speech-Video Synchronization Using Lips Movements and Speech Envelope Correlation,
ICIAR09(397-407).
Springer DOI Link 0907
BibRef

Rebillat, M.[Marc], Katz, B.F.G.[Brian F.G.], Corteel, E.[Etienne],
SMART-I2: Spatial Multi-user Audio-visual Real-time interactive interface, A broadcast application context,
3DTV09(1-4).
IEEE DOI Link 0905
BibRef

Eisenstein, J.[Jacob],
Gesture in Automatic Discourse Processing,
CSAIL-2008-027, May 2008. BibRef 0805 Ph.D.Thesis, MIT, May 2008.
WWW Version. BibRef

Das, A.[Amitava], Manyam, O.K.[Ohil K.], Tapaswi, M.[Makarand],
Audio-Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles,
ICCVGIP08(39-46).
IEEE DOI Link 0812
BibRef

Cao, Y.[Yu], Baang, S.[Sung], Liu, S.H.[Shih-Hsi], Li, M.[Ming], Hu, S.[Sanqing],
Audio-visual event classification via spatial-temporal-audio words,
ICPR08(1-5).
IEEE DOI Link 0812
BibRef

Terry, L.H.[Louis H.], Shiell, D.J.[Derek J.], Katsaggelos, A.K.[Aggelos K.],
Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition,
ICIP08(1316-1319).
IEEE DOI Link 0810
BibRef

Naseem, I.[Imran], Mian, A.S.[Ajmal S.],
User Verification by Combining Speech and Face Biometrics in Video,
ISVC08(II: 482-492).
Springer DOI Link 0812
BibRef

Ettinger, E.[Evan], Freund, Y.[Yoav],
Coordinate-free calibration of an acoustically driven camera pointing system,
ICDSC08(1-9).
IEEE DOI Link 0809
BibRef

Hung, H.[Hayley], Friedland, G.[Gerald],
Towards Audio-Visual On-line Diarization Of Participants In Group Meetings,
M2SFA208(xx-yy). 0810
BibRef

Liu, Y.[Yuyu], Sato, Y.[Yoichi],
Finding Speaker Face Region by Audiovisual Correlation,
M2SFA208(xx-yy). 0810
BibRef

Kelly, D.[Damien], Pitie, F.[Francois], Kokaram, A.[Anil], Boland, F.[Frank],
A Comparative Error Analysis of Audio-Visual Source Localization,
M2SFA208(xx-yy). 0810
BibRef

Katsarakis, N.[Nikos], Talantzis, F.[Fotios], Pnevmatikakis, A.[Aristodemos], Polymenakos, L.[Lazaros],
The AIT 3D Audio / Visual Person Tracker for CLEAR 2007,
MTPH07(xx-yy).
Springer DOI Link 0705
See also AIT 2D Face Detection and Tracking System for CLEAR 2007, The. See also AIT Multimodal Person Identification System for CLEAR 2007, The. BibRef

Pachoud, S., Gong, S., Cavallaro, A.,
Video Augmentation for Improving Audio Speech Recognition under Noise,
BMVC08(xx-yy).
PDF Version. 0809
BibRef

Horii, Y.[Yu], Kawashima, H.[Hiroaki], Matsuyama, T.[Takashi],
Speaker detection using the timing structure of lip motion and sound,
CVPR4HB08(1-8).
IEEE DOI Link 0806
BibRef

Rúa, E.A.[Enrique Argones], Castro, J.L.A.[José Luis Alba], Mateo, C.G.[Carmen García],
Quality-Based Score Normalization for Audiovisual Person Authentication,
ICIAR08(xx-yy).
Springer DOI Link 0806
BibRef

Wang, L.[Lei], Tjondrongoro, D.[Dian], Liu, Y.[Yuee],
Clustering and Visualizing Audio-Visual Dataset on Mobile Devices in a Topic-Oriented Manner,
Visual07(310-321).
Springer DOI Link 0706
BibRef

Zajdel, W., Krijnders, J.D., Andringa, T., Gavrila, D.M.,
CASSANDRA: audio-video sensor fusion for aggression detection,
AVSBS07(200-205).
IEEE DOI Link 0709
BibRef

Stødle, D.[Daniel], Bjørndalen, J.M.[John Markus], Anshus, O.J.[Otto J.],
A System for Hybrid Vision- and Sound-Based Interaction with Distal and Proximal Targets on Wall-Sized, High-Resolution Tiled Displays,
CVHCI07(59-68).
Springer DOI Link 0710
BibRef

van Hengel, P.W.J., Andringa, T.C.,
Verbal aggression detection in complex social environments,
AVSBS07(15-20).
IEEE DOI Link 0709
BibRef

Ikeda, O.[Osamu],
Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement,
ISVC07(II: 602-610).
Springer DOI Link 0711
BibRef

Das, A.[Amitava],
Audio Visual Person Authentication by Multiple Nearest Neighbor Classifiers,
ICB07(1114-1123).
Springer DOI Link 0708
BibRef

Xin, L.[Le], Tao, J.H.[Jian-Hua], Tan, T.N.[Tie-Niu],
Dynamic Audio-Visual Mapping using Fused Hidden Markov Model Inversion Method,
ICIP07(III: 293-296).
IEEE DOI Link 0709
BibRef

Casanovas, A.L.[Anna Llagostera], Monaci, G.[Gianluca], Vandergheynst, P.[Pierre],
Blind Audiovisual Source Separation using Sparse Representations,
ICIP07(III: 301-304).
IEEE DOI Link 0709
BibRef

Barzelay, Z.[Zohar], Schechner, Y.Y.[Yoav Y.],
Harmony in Motion,
CVPR07(1-8).
IEEE DOI Link 0706
Audio-visual analysis. BibRef

O'Donovan, A.[Adam], Duraiswami, R.[Ramani], Neumann, J.[Jan],
Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing,
CVPR07(1-8).
IEEE DOI Link 0706
BibRef

Abbas, J.[Jehanzeb], Dagli, C.K.[Charlie K.], Huang, T.S.[Thomas S.],
A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video,
SLAM07(1-8).
IEEE DOI Link 0706
BibRef

Kushal, A.[Akash], Rahurkar, M.[Mandar], Fei-Fei, L.[Li], Ponce, J.[Jean], Huang, T.[Thomas],
Audio-Visual Speaker Localization Using Graphical Models,
ICPR06(I: 291-294).
WWW Version. 0609
BibRef

Tsuji, T.[Tokuo], Yamamoto, K.[Kenkichi], Ishii, I.[Idaku],
Real-time Sound Source Localization Based on Audiovisual Frequency Integration,
ICPR06(IV: 322-325).
WWW Version. 0609
BibRef

Monaci, G.[Gianluca], Vandergheynst, P.[Pierre],
Audiovisual Gestalts,
PercOrg06(200).
IEEE DOI Link 0609
BibRef

Zhu, Z.G.[Zhi-Gang], Li, W.H.[Wei-Hong], Molina, E.[Edgardo], Wolberg, G.[George],
LDV Sensing and Processing for Remote Hearing in a Multimodal Surveillance System,
MSCSAS07(1-2).
IEEE DOI Link 0706
BibRef

Zhu, Z.G.[Zhi-Gang], Li, W.H.[Wei-Hong], Wolberg, G.,
Integrating LDV Audio and IR Video for Remote Multimodal Surveillance,
OTCBVS05(III: 10-10).
IEEE DOI Link 0507
BibRef

Chetty, G.[Girija], Wagner, M.[Michael],
Face-Voice Authentication Based on 3D Face Models,
ACCV06(I:559-568).
Springer DOI Link 0601
BibRef

Wu, Z.Y.[Zhi-Yong], Cai, L.H.[Lian-Hong], Meng, H.[Helen],
Multi-level Fusion of Audio and Visual Features for Speaker Identification,
ICB06(493-499).
Springer DOI Link 0601
BibRef

Yang, P.[Pu], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Exploiting Glottal Information in Speaker Recognition Using Parallel GMMs,
AVBPA05(804).
Springer DOI Link 0509
BibRef

Lei, Z.[Zhenchun], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
An UBM-Based Reference Space for Speaker Recognition,
ICPR06(IV: 318-321).
WWW Version. 0609
BibRef

Li, D.D.[Dong-Dong], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Dynamic Bayesian Networks for Audio-Visual Speaker Recognition,
ICB06(539-545).
Springer DOI Link 0601
BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Data Association in Multi-Target Tracking Using Belief Theory: Handling Target Emergence and Disappearance Issue,
AVSBS05(517-521).
IEEE DOI Link 0602
BibRef

Megherbi, N., Ambellouis, S., Colot, O., Cabestaing, F.,
Joint audio-video people tracking using belief theory,
AVSBS05(135-140).
IEEE DOI Link 0602
BibRef

Lei, Z.C.[Zhen-Chun], Yang, Y.C.[Ying-Chun], Wu, Z.H.[Zhao-Hui],
Constructing the Discriminative Kernels Using GMM for Text-Independent Speaker Identification,
IWBRS05(165).
Springer DOI Link 0601
BibRef
And:
Speaker Identification Using the VQ-Based Discriminative Kernels,
AVBPA05(797).
Springer DOI Link 0509
BibRef

Fox, N.A.[Niall A.], O'Mullane, B.A.[Brian A.], Reilly, R.B.[Richard B.],
VALID: A New Practical Audio-Visual Database, and Comparative Results,
AVBPA05(777).
Springer DOI Link 0509

WWW Version. Dataset, Faces. BibRef

Sharma, P.[Prag], Reilly, R.B.[Richard B.],
The UCD Colour Face Image Database for Face Detection,
Online1998.
WWW Version. Dataset, Faces. BibRef 9800

Fox, N.A.[Niall A.], O'Mullane, B.A.[Brian A.], Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities,
AVBPA05(787).
Springer DOI Link 0509
BibRef

Zhang, X.Q.[Xiao-Qin], Hu, W.M.[Wei-Ming], Zhao, Z.X.[Zi-Xiang], Wang, Y.G.[Yan-Guo], Li, X.[Xi], Wei, Q.D.[Qing-Di],
SVD based Kalman particle filter for robust visual tracking,
ICPR08(1-4).
IEEE DOI Link 0812
BibRef

Li, X.[Xin], Sun, L.[Luo], Tao, L.M.[Lin-Mi], Xu, G.Y.[Guang-You], Jia, Y.[Ying],
A Speaker Tracking Algorithm Based on Audio and Visual Information Fusion Using Particle Filter,
ICIAR04(II: 572-580).
WWW Version. 0409
BibRef

Zhang, D., Ghobakhlou, A., Kasabov, N.,
An adaptive model of person identification combining speech and image information,
ICARCV04(I: 413-418).
IEEE DOI Link 0412
BibRef

Kratt, J.[Jan], Metze, F.[Florian], Stiefelhagen, R.[Rainer], Waibel, A.[Alex],
Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit,
DAGM04(488-495).
WWW Version. 0505
BibRef

Hanafiah, Z.M., Yamazaki, C., Nakamura, A., Kuno, Y.,
Understanding inexplicit utterances using vision for helper robots,
ICPR04(IV: 925-928).
IEEE DOI Link 0409
BibRef

Hermann, T.[Thomas], Henning, T.[Thomas], Ritter, H.[Helge],
Gesture Desk an Integrated Multi-modal Gestural Workplace for Sonification,
GW03(369-379).
WWW Version. 0405
BibRef

Merola, G.[Giorgio],
The Effects of the Gesture Viewpoint on the Students' Memory of Words and Stories,
GW07(272-281).
Springer DOI Link 0705
BibRef

Merola, G.[Giorgio], Poggi, I.[Isabella],
Multimodality and Gestures in the Teacher's Communication,
GW03(101-111).
WWW Version. 0405
BibRef

Kranstedt, A.[Alfred], Kühnlein, P.[Peter], Wachsmuth, I.[Ipke],
Deixis in Multimodal Human Computer Interaction: An Interdisciplinary Approach,
GW03(112-123).
WWW Version. 0405
BibRef

Saeed, K.[Khalid], Kozlowski, M.[Marcin],
An Image-Based System for Spoken-Letter Recognition,
CAIP03(494-502).
WWW Version. 0311
BibRef

Ho, P.[Purdy], Armington, J.[John],
A Dual-Factor Authentication System Featuring Speaker Verification and Token Technology,
AVBPA03(128-136).
HTML Version. 0310
BibRef

Fox, N.A.[Niall A.], Reilly, R.B.[Richard B.],
Audio-Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features,
AVBPA03(743-751).
HTML Version. 0310
BibRef

Czyz, J.[Jacek], Bengio, S.[Samy], Marcel, C.[Christine], Vandendorpe, L.[Luc],
Scalability Analysis of Audio-Visual Person Identity Verification,
AVBPA03(752-760).
HTML Version. 0310
BibRef

Bengio, S.[Samy],
Multimodal Authentication Using Asynchronous HMMs,
AVBPA03(770-777).
HTML Version. 0310
BibRef

Lucey, S.[Simon], Chen, T.H.[Tsu-Han],
Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy,
AVBPA03(929-936).
HTML Version. 0310
BibRef

Krahnstoever, N., Schapira, E., Kettebeko, S., Sharma, R.,
Multimodal human-computer interaction for crisis management systems,
WACV02(203-207).
IEEE Abstract. IEEE Top Reference. 0303
BibRef

Kettebekov, S., Yeasin, M., Sharma, R.,
Improving continuous gesture recognition with spoken prosody,
CVPR03(I: 565-570).
IEEE Abstract. IEEE Top Reference. 0307
BibRef

Higgins, J.E., Damper, R.I.,
An HMM-Based Subband Processing Approach to Speaker Identification,
AVBPA01(169).
HTML Version. 0310
BibRef

Poh, N.[Norman], Korczak, J.[Jerzy],
Hybrid Biometric Person Authentication Using Face and Voice Features,
AVBPA01(348).
HTML Version. 0310
BibRef

Nakamura, S.[Satoshi],
Fusion of Audio-Visual Information for Integrated Speech Processing,
AVBPA01(127).
HTML Version. 0310
BibRef

Sullivan, K.P.H.[Kirk P.H.], Pelecanos, J.[Jason],
Revisiting Carl Bildt's Impostor: Would a Speaker Verification System Foil Him?,
AVBPA01(144).
HTML Version. 0310
BibRef

Geiger, G.[Gadi], Ezzat, T.[Tony], Poggio, T.[Tomaso],
Perceptual Evaluation of Video-Realistic Speech,
MIT AIMAIM-2003-003, February 28, 2003.
WWW Version. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. 0306
BibRef

Blake, A., Gangnet, M., Perez, P., Vermaak, J.,
Integrated tracking with vision and sound,
CIAP01(354-357).
IEEE Top Reference. 0210
BibRef

Zhang, X.Z.[Xiao-Zheng], Merserratt, R.M., Clements, M.,
Bimodal fusion in audio-visual speech recognition,
ICIP02(I: 964-967).
IEEE Abstract. IEEE Top Reference. 0210
BibRef

Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.[Fu Jie],
Visual prosody: facial movements accompanying speech,
AFGR02(381-386).
IEEE DOI Link 0206
BibRef

Qi, Y.[Yuan],
Learning Algorithms for Audio and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches,
UMD--TR4174, August 2000.
WWW Version.
WWW Version. BibRef 0008

Nankaku, Y., Tokuda, K., Kitamura, T.,
Normalized Training for HMM-based Visual Speech Recognition,
ICIP00(Vol III: 234-237).
IEEE Abstract. IEEE Top Reference. 0008
BibRef

Zhang, Y.[You], Levinson, S.[Stephen], Huang, T.S.[Thomas S.],
Speaker Independent Audio-Visual Speech Recognition,
ICME00(TP8). 0007
BibRef

Pan, H.[Hao], Huang, T.S.[Thomas S.],
A New Approach to Integrate Audio and Visual Features of Speech,
ICME00(TP8). 0007
BibRef

Potamianos, G.[Gerasimos], Verma, A.[Ashish], Neti, C.[Chalapathy], Iyengar, G.[Giri], Basu, S.[Sankar],
A Cascade Image Transform for Speaker Independent Automatic Speech Reading,
ICME00(TP8). 0007
BibRef

Pan, H., Liang, Z.P., Huang, T.S.,
Fusing Audio and Visual Features of Speech,
ICIP00(Vol III: 214-217).
IEEE Abstract. IEEE Top Reference. 0008
BibRef

Faruquie, T.A., Majumdar, A., Rajput, N., Subramaniam, L.V.,
Large Vocabulary Audio-visual Speech Recognition Using Active Shape Models,
ICPR00(Vol III: 106-109).
IEEE DOI Link
HTML Version. 0009
BibRef

Yu, K., Jiang, X., Bunke, H.,
Combining Acoustic and Visual Classifiers for the Recognition of Spoken Sentences,
ICPR00(Vol II: 491-494).
IEEE DOI Link
HTML Version. 0009
BibRef

Nam, J., Alghoniemy, M., Tewfik, A.H.[Ahmed H.],
Audio-visual content-based violent scene characterization,
ICIP98(I: 353-357).
IEEE DOI Link 9810
BibRef

Luettin, J.[Juergen], Dupont, S.[Stéphane],
Continuous Audio-Visual Speech Recognition,
ECCV98(II: 657).
WWW Version. BibRef 9800

Yang, J.[Jie], Xiao, J.[Jing], Ritter, M.[Max],
Automatic Selection of Visemes for Image-based Visual Speech Synthesis,
ICME00(TP8). 0007
BibRef

Sharma, R.[Rajeev], Cai, J.Y.[Jiong-Yu], Chakravarthy, S.[Srivatsan], Poddar, I.[Indrajit], Sethi, Y.[Yogesh],
Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration,
AFGR00(422-427).
IEEE DOI Link 0003
BibRef

Yamamoto, E., Nakamura, S., Shikano, K.,
Lip Movement Synthesis from Speech Based on Hidden Markov Models,
AFGR98(154-159).
IEEE DOI Link BibRef 9800

Roy, D., Pentland, A.P.,
Automatic spoken affect classification and analysis,
AFGR96(363-367).
IEEE DOI Link 9610
BibRef

Petajan, E.D.[Eric D.],
An Architecture for Automatic Lipreading to Enhance Speech Recognition,
CVPR85(40-47). (AT&T Bell Labs) Application, Lipreading. A real hardware implementation of a system that tracks the nostrils and mouth. Improvement over use of acoustic data alone. BibRef 8500

Chapter on Face Recognition, Detection, Tracking, Gesture Recognition, Fingerprints, Biometrics continues in
Mouth Location, Lip Location, Detection .


Last update:Nov 16, 2009 at 19:35:14