TY - GEN
T1 - Needle in a haystack
T2 - 2018 Internet Measurement Conference, IMC 2018
AU - Tian, Ke
AU - Jan, Steve T.K.
AU - Hu, Hang
AU - Yao, Danfeng
AU - Wang, Gang
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/10/31
Y1 - 2018/10/31
N2 - Today's phishing websites are constantly evolving to deceive users and evade the detection. In this paper, we perform a measurement study on squatting phishing domains where the websites impersonate trusted entities not only at the page content level but also at the web domain level. To search for squatting phishing pages, we scanned five types of squatting domains over 224 million DNS records and identified 657K domains that are likely impersonating 702 popular brands. Then we build a novel machine learning classifier to detect phishing pages from both the web and mobile pages under the squatting domains. A key novelty is that our classifier is built on a careful measurement of evasive behaviors of phishing pages in practice. We introduce new features from visual analysis and optical character recognition (OCR) to overcome the heavy content obfuscation from attackers. In total, we discovered and verified 1,175 squatting phishing pages. We show that these phishing pages are used for various targeted scams, and are highly effective to evade detection. More than 90% of them successfully evaded popular blacklists for at least a month.
AB - Today's phishing websites are constantly evolving to deceive users and evade the detection. In this paper, we perform a measurement study on squatting phishing domains where the websites impersonate trusted entities not only at the page content level but also at the web domain level. To search for squatting phishing pages, we scanned five types of squatting domains over 224 million DNS records and identified 657K domains that are likely impersonating 702 popular brands. Then we build a novel machine learning classifier to detect phishing pages from both the web and mobile pages under the squatting domains. A key novelty is that our classifier is built on a careful measurement of evasive behaviors of phishing pages in practice. We introduce new features from visual analysis and optical character recognition (OCR) to overcome the heavy content obfuscation from attackers. In total, we discovered and verified 1,175 squatting phishing pages. We show that these phishing pages are used for various targeted scams, and are highly effective to evade detection. More than 90% of them successfully evaded popular blacklists for at least a month.
UR - http://www.scopus.com/inward/record.url?scp=85058190822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058190822&partnerID=8YFLogxK
U2 - 10.1145/3278532.3278569
DO - 10.1145/3278532.3278569
M3 - Conference contribution
AN - SCOPUS:85058190822
T3 - Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC
SP - 429
EP - 442
BT - IMC 2018 - Proceedings of the Internet Measurement Conference
PB - Association for Computing Machinery
Y2 - 31 October 2018 through 2 November 2018
ER -