In this paper, we give an overview of knowledge discovery (KD) and information extraction (IE) techniques on the World Wide Web (WWW). We intend to answer the following questions: What kind of additional uncertainty challenges are introduced by the WWW setting to basic KD and IE techniques? What are the fundamental techniques that can be used to reduce such uncertainty and achieve reasonable KD and IE performance on the WWW? What is the impact of each novel method? What types of interactions can be conducted between these techniques and information networks to make them benefit from each other? In what way can we utilize the results in more interesting applications? What are the remaining challenges and what are the possible ways to address these challenges? We hope this can provide a road map to advance KD and IE on the WWW to a higher level of performance, portability and utilization.
- natural language processing
- text analysis
- text mining
ASJC Scopus subject areas
- Electrical and Electronic Engineering