Needle in a haystack: Tracking down elite phishing domains in the wild

Ke Tian, Steve T.K. Jan, Hang Hu, Danfeng Yao, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Today's phishing websites are constantly evolving to deceive users and evade the detection. In this paper, we perform a measurement study on squatting phishing domains where the websites impersonate trusted entities not only at the page content level but also at the web domain level. To search for squatting phishing pages, we scanned five types of squatting domains over 224 million DNS records and identified 657K domains that are likely impersonating 702 popular brands. Then we build a novel machine learning classifier to detect phishing pages from both the web and mobile pages under the squatting domains. A key novelty is that our classifier is built on a careful measurement of evasive behaviors of phishing pages in practice. We introduce new features from visual analysis and optical character recognition (OCR) to overcome the heavy content obfuscation from attackers. In total, we discovered and verified 1,175 squatting phishing pages. We show that these phishing pages are used for various targeted scams, and are highly effective to evade detection. More than 90% of them successfully evaded popular blacklists for at least a month.

Original languageEnglish (US)
Title of host publicationIMC 2018 - Proceedings of the Internet Measurement Conference
PublisherAssociation for Computing Machinery
Number of pages14
ISBN (Electronic)9781450356190
StatePublished - Oct 31 2018
Externally publishedYes
Event2018 Internet Measurement Conference, IMC 2018 - Boston, United States
Duration: Oct 31 2018Nov 2 2018

Publication series

NameProceedings of the ACM SIGCOMM Internet Measurement Conference, IMC


Other2018 Internet Measurement Conference, IMC 2018
Country/TerritoryUnited States

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications


Dive into the research topics of 'Needle in a haystack: Tracking down elite phishing domains in the wild'. Together they form a unique fingerprint.

Cite this