In this thesis we present an approach for a visual communication application for a dark, theater-like interactive virtual simulation training environment. Our system visually estimate and tracks the body position, orientation and the limb configuration of the user. This system uses a near-IR camera array to capture images of the trainee from different angles in the dim-lighted theater. Image features like silhouettes and intermediate silhouette body axis points are then segmented and extracted from image backgrounds. 3D body shape information such as 3D body skeleton points and visual hulls can be reconstructed from these 2D features in multiple calibrated images. For body pose estimation, we propose a particle-filtering based method that fits an articulated body model to the observed image features. Currently we focus on estimating the pose of upper body. From the fitted articulated model we can derive information use by HCI system, such as the position on the screen the user is pointing to. We use current graphic hardware to accelerate the processing speed so the system is able to work in real-time. The system serves as part of multi-modal user-input device in the interactive simulation.