TY - GEN
T1 - Free for All! Assessing User Data Exposure to Advertising Libraries on Android
AU - Demetriou, Soteris
AU - Merrill, Whitney
AU - Yang, Wei
AU - Zhang, Aston
AU - Gunter, Carl A.
N1 - This work was supported in part by HHS 90TR0003-01, NSF CNS 13-30491, NSF CNS 12-23967 and NSF CNS 15-13939. The views expressed are those of the authors only. The authors are grateful to Hari Sundaram for his comments on advertising, Andrew Rice for sharing the Device Analyzer [49] dataset, and NDSS shepherd, Venkat Venkatakrishnan for his valuable assistance in improving the final version of this paper.
PY - 2016
Y1 - 2016
N2 - Many studies focused on detecting and measuring the security and privacy risks associated with the integration of advertising libraries in mobile apps. These studies consistently demonstrate the abuses of existing ad libraries. However, to fully assess the risks of an app that uses an advertising library, we need to take into account not only the current behaviors but all of the allowed behaviors that could result in the compromise of user data confidentiality. Ad libraries on Android have potential for greater data collection through at least four major channels: using unprotected APIs to learn other apps’ information on the phone (e.g., app names); using protected APIs via permissions inherited from the host app to access sensitive information (e.g. Google and Facebook account information, geo locations); gaining access to files which the host app stores in its own protection domain; and observing user inputs into the host app. In this work, we systematically explore the potential reach of advertising libraries through these channels. We design a framework called Pluto that can be leveraged to analyze an app and discover whether it exposes targeted user data-such as contact information, interests, demographics, medical conditions and so on-to an opportunistic ad library. We present a prototype implementation of Pluto, that embodies novel strategies for using natural language processing to illustrate what targeted data can potentially be learned from an ad network using files and user inputs. Pluto also leverages machine learning and data mining models to reveal what advertising networks can learn from the list of installed apps. We validate Pluto with a collection of apps for which we have determined ground truth about targeted data they may reveal, together with a data set derived from a survey we conducted that gives ground truth for targeted data and corresponding lists of installed apps for about 300 users. We use these to show that Pluto, and hence also opportunistic ad networks, can achieve 75% recall and 80% precision for selected targeted data coming from app files and inputs, and even better results for certain targeted data based on the list of installed apps. Pluto is the first tool that estimates the risk associated with integrating advertising in apps based on the four available channels and arbitrary sets of targeted data.
AB - Many studies focused on detecting and measuring the security and privacy risks associated with the integration of advertising libraries in mobile apps. These studies consistently demonstrate the abuses of existing ad libraries. However, to fully assess the risks of an app that uses an advertising library, we need to take into account not only the current behaviors but all of the allowed behaviors that could result in the compromise of user data confidentiality. Ad libraries on Android have potential for greater data collection through at least four major channels: using unprotected APIs to learn other apps’ information on the phone (e.g., app names); using protected APIs via permissions inherited from the host app to access sensitive information (e.g. Google and Facebook account information, geo locations); gaining access to files which the host app stores in its own protection domain; and observing user inputs into the host app. In this work, we systematically explore the potential reach of advertising libraries through these channels. We design a framework called Pluto that can be leveraged to analyze an app and discover whether it exposes targeted user data-such as contact information, interests, demographics, medical conditions and so on-to an opportunistic ad library. We present a prototype implementation of Pluto, that embodies novel strategies for using natural language processing to illustrate what targeted data can potentially be learned from an ad network using files and user inputs. Pluto also leverages machine learning and data mining models to reveal what advertising networks can learn from the list of installed apps. We validate Pluto with a collection of apps for which we have determined ground truth about targeted data they may reveal, together with a data set derived from a survey we conducted that gives ground truth for targeted data and corresponding lists of installed apps for about 300 users. We use these to show that Pluto, and hence also opportunistic ad networks, can achieve 75% recall and 80% precision for selected targeted data coming from app files and inputs, and even better results for certain targeted data based on the list of installed apps. Pluto is the first tool that estimates the risk associated with integrating advertising in apps based on the four available channels and arbitrary sets of targeted data.
UR - http://www.scopus.com/inward/record.url?scp=85014812568&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014812568&partnerID=8YFLogxK
U2 - 10.14722/ndss.2016.23082
DO - 10.14722/ndss.2016.23082
M3 - Conference contribution
AN - SCOPUS:85014812568
T3 - 23rd Annual Network and Distributed System Security Symposium, NDSS 2016
BT - 23rd Annual Network and Distributed System Security Symposium, NDSS 2016
PB - The Internet Society
T2 - 23rd Annual Network and Distributed System Security Symposium, NDSS 2016
Y2 - 21 February 2016 through 24 February 2016
ER -