TY - GEN
T1 - Segmentation of publication records of authors from the Web
AU - Zhang, Wei
AU - Yu, Clement
AU - Smalheiser, Neil
AU - Torvik, Vetle
PY - 2006
Y1 - 2006
N2 - Publication records are often found in the authors ' personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications. In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel "Split and Merge" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.
AB - Publication records are often found in the authors ' personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications. In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel "Split and Merge" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.
UR - http://www.scopus.com/inward/record.url?scp=33749625896&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749625896&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2006.137
DO - 10.1109/ICDE.2006.137
M3 - Conference contribution
AN - SCOPUS:33749625896
SN - 0769525709
SN - 9780769525709
T3 - Proceedings - International Conference on Data Engineering
SP - 120
BT - Proceedings of the 22nd International Conference on Data Engineering, ICDE '06
T2 - 22nd International Conference on Data Engineering, ICDE '06
Y2 - 3 April 2006 through 7 April 2006
ER -