The usage and number of available video conferencing (VC) applications are rising as the high-bandwidth, low latency networks on which they depend become increasingly prevalent. Since VC applications support real-time human interaction, problems with performance that impair interactivity are social issues. Currently, performance measurements cannot easily be obtained due to the proprietary nature of VC applications, however, such measurements would be useful because they enable researchers to easily evaluate the performance impact of architectural and design decisions, quantitatively compare VC applications, and determine service level agreement (SLA) compliance. In this paper, we present a tool called Av Cloak that is capable of measuring several key performance metrics in proprietary VC applications: mouth-to-ear latency and jitter, capture-to-display latency and jitter, and audio-visual synchronization skew. AvCloak takes these measurements by wrapping ('cloaking') the VC application's audio/video inputs/outputs and transmitting timestamp data through them. At the sender side, AvCloak synthesizes media data encoding timestamps and feeds them to the VC application's media inputs, while at the receiver side, AvCloak decodes timestamps from the VC application's media outputs. Since AvCloak interacts with the target VC application only through its media inputs and outputs, it treats the target application as a black box and is thus applicable to arbitrary VC applications. We provide extensive analyses to measure AvCloak's overhead and show how to improve accuracy in measurements using two popular VC applications: Skype and Google+ Hangouts.