TY - GEN
T1 - Low bit-rate video streaming for face-to-face teleconference
AU - Wen, Then
AU - Liu, Zicheng
AU - Cohen, Michael
AU - Li, Jin
AU - Zheng, Ke
AU - Huang, Tomas
PY - 2004
Y1 - 2004
N2 - Face-to-face video teleconferencing is very important for real time communication. Current teleconferencing application uses standard video codec, such as MPEG 1/2/4, for the compression of face video. It either requires high bandwidth for high quality video transmission, or the transmitted face video be blurred at low bitrate. In this paper, we present a system for real-time coding of face video at low bit-rate. There are two main contributions. First, we improve the technique of long term memory prediction by selecting frames into the database in an optimal way. A new frame is selected into the database only when it is significantly different from those frames which are already in the database. In this way, the database can cover a wider range of images. Second, we incorporate the prior knowledge about faces into the long term memory prediction framework. The prior knowledge includes: (1) facial motions are repetitive such that most of them can be reconstructed from multiple reference frames; and (2) different components of the face and the background could tolerate different level of error because of different perceptual importance. Experiments show that at similar PSNR the proposed system works much faster and achieves better visual quality than standard H.264/JVT codec.
AB - Face-to-face video teleconferencing is very important for real time communication. Current teleconferencing application uses standard video codec, such as MPEG 1/2/4, for the compression of face video. It either requires high bandwidth for high quality video transmission, or the transmitted face video be blurred at low bitrate. In this paper, we present a system for real-time coding of face video at low bit-rate. There are two main contributions. First, we improve the technique of long term memory prediction by selecting frames into the database in an optimal way. A new frame is selected into the database only when it is significantly different from those frames which are already in the database. In this way, the database can cover a wider range of images. Second, we incorporate the prior knowledge about faces into the long term memory prediction framework. The prior knowledge includes: (1) facial motions are repetitive such that most of them can be reconstructed from multiple reference frames; and (2) different components of the face and the background could tolerate different level of error because of different perceptual importance. Experiments show that at similar PSNR the proposed system works much faster and achieves better visual quality than standard H.264/JVT codec.
UR - http://www.scopus.com/inward/record.url?scp=11244341196&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=11244341196&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:11244341196
SN - 0780386035
SN - 9780780386037
T3 - 2004 IEEE International Conference on Multimedia and Expo (ICME)
SP - 1631
EP - 1634
BT - 2004 IEEE International Conference on Multimedia and Expo (ICME)
T2 - 2004 IEEE International Conference on Multimedia and Expo (ICME)
Y2 - 27 June 2004 through 30 June 2004
ER -