TY - JOUR
T1 - Creating TikToks, Memes, Accessible Content, and Books from Engineering Videos? First Solve the Scene Detection Problem
AU - Angrave, Lawrence
AU - Li, Jiaxi
AU - Zhong, Ninghan
N1 - The research reported here was supported by a Microsoft Corporation gift to the University of Illinois as part of the 2019 and 2020 Lighthouse Accessibility Microsoft-Illinois partnership, GIANT award (GIANT2021-03) from the IDEA institute [22], an award from Center for Innovative Teaching and Learning, and the Institute of Education Sciences, U.S. Department of Education through Grant R305A180211 to the Board of Trustees of the University of Illinois. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education. We acknowledge support of NCSA, and the many former students who worked on the ClassTranscribe source code and early implementations of the scene detection and phrase hinting code including, Akhil Vyas and Vicky Cai.
PY - 2022/8/23
Y1 - 2022/8/23
N2 - To efficiently create books and other instructional content from videos and further improve accessibility of our course content we needed to solve the scene detection (SD) problem for engineering educational content. We present the pedagogical applications of extracting video images for the purposes of digital book generation and other shareable resources, within the themes of accessibility, inclusive education, universal design for learning and how we solved this problem for engineering education lecture videos. Scene detection refers to the process of merging visually similar frames into a single video segment, and subsequent extraction of semantic features from the video segment (e.g., title, words, transcription segment and representative image). In our approach, local features were extracted from inter-frame similarity comparisons using multiple metrics. These include numerical measures based on optical character recognition (OCR) and pixel similarity with and without face and body position masking. We analyze and discuss the trade-offs in accuracy, performance and computational resources required. By applying these features to a corpus of labeled videos, a support vector machine determined an optimal parametric decision surface to model if adjacent frames were semantically and visually similar or not. The algorithm design, data flow, and system accuracy and performance are presented. We evaluated our system using videos from multiple engineering disciplines where the content was comprised of different presentation styles including traditional paper handouts, Microsoft PowerPoint slides, and digital ink annotations. For each educational video, a comprehensive digital-book composed of lecture clips, slideshow text, and audio transcription content can be generated based on our new scene detection algorithm. Our new scene detection approach was adopted by ClassTranscribe, an inclusive video platform that follows Universal Design for Learning principles. We report on the subsequent experiences and feedback from students who reviewed the generated digital-books as a learning component. We highlight remaining challenges and describe how instructors can use this technology in their own courses. The main contributions of this work are: Identifying why automated scene detection of engineering lecture videos is challenging; Creation of a scene-labeled corpus of videos representative of multiple undergraduate engineering disciplines and lecture styles suitable for training and testing; Description of a set of image metrics and support vector machine-based classification approach; Evaluation of the accuracy, recall and precision of our algorithm; Use of an algorithmic optimization to obviate GPU resources; Student commentary on the digital book interface created from videos using our SD algorithm; Publishing of a labeled corpus of video content to encourage additional research in this area; and an independent open-source scene extraction tool that can be used pedagogically by the ASEE community e.g., to remix and create fun shareable instructional content memes, and to create accessible audio and text descriptions for students who are blind or have low vision. Text extracted from each scene can also used to improve the accuracy of captions and transcripts, improving accessibility for students who are hard of hearing or deaf.
AB - To efficiently create books and other instructional content from videos and further improve accessibility of our course content we needed to solve the scene detection (SD) problem for engineering educational content. We present the pedagogical applications of extracting video images for the purposes of digital book generation and other shareable resources, within the themes of accessibility, inclusive education, universal design for learning and how we solved this problem for engineering education lecture videos. Scene detection refers to the process of merging visually similar frames into a single video segment, and subsequent extraction of semantic features from the video segment (e.g., title, words, transcription segment and representative image). In our approach, local features were extracted from inter-frame similarity comparisons using multiple metrics. These include numerical measures based on optical character recognition (OCR) and pixel similarity with and without face and body position masking. We analyze and discuss the trade-offs in accuracy, performance and computational resources required. By applying these features to a corpus of labeled videos, a support vector machine determined an optimal parametric decision surface to model if adjacent frames were semantically and visually similar or not. The algorithm design, data flow, and system accuracy and performance are presented. We evaluated our system using videos from multiple engineering disciplines where the content was comprised of different presentation styles including traditional paper handouts, Microsoft PowerPoint slides, and digital ink annotations. For each educational video, a comprehensive digital-book composed of lecture clips, slideshow text, and audio transcription content can be generated based on our new scene detection algorithm. Our new scene detection approach was adopted by ClassTranscribe, an inclusive video platform that follows Universal Design for Learning principles. We report on the subsequent experiences and feedback from students who reviewed the generated digital-books as a learning component. We highlight remaining challenges and describe how instructors can use this technology in their own courses. The main contributions of this work are: Identifying why automated scene detection of engineering lecture videos is challenging; Creation of a scene-labeled corpus of videos representative of multiple undergraduate engineering disciplines and lecture styles suitable for training and testing; Description of a set of image metrics and support vector machine-based classification approach; Evaluation of the accuracy, recall and precision of our algorithm; Use of an algorithmic optimization to obviate GPU resources; Student commentary on the digital book interface created from videos using our SD algorithm; Publishing of a labeled corpus of video content to encourage additional research in this area; and an independent open-source scene extraction tool that can be used pedagogically by the ASEE community e.g., to remix and create fun shareable instructional content memes, and to create accessible audio and text descriptions for students who are blind or have low vision. Text extracted from each scene can also used to improve the accuracy of captions and transcripts, improving accessibility for students who are hard of hearing or deaf.
UR - https://www.scopus.com/pages/publications/85138317256
UR - https://www.scopus.com/pages/publications/85138317256#tab=citedBy
M3 - Conference article
AN - SCOPUS:85138317256
SN - 2153-5965
JO - ASEE Annual Conference and Exposition, Conference Proceedings
JF - ASEE Annual Conference and Exposition, Conference Proceedings
T2 - 129th ASEE Annual Conference and Exposition: Excellence Through Diversity, ASEE 2022
Y2 - 26 June 2022 through 29 June 2022
ER -