TY - GEN

T1 - Variable selection for ad prediction

AU - Bhat, Suma Pallathadka

AU - Church, Kenneth

PY - 2008/12/1

Y1 - 2008/12/1

N2 - We consider the problem of predicting the probability of a click for an advertisement when the outcome of a click or no-click is expressed by means of a set of a large number of variables. Many, if not most, of these variables are very weakly related to the clicking of the ad. Thus, a traditional approach to address this problem that treats each variable on an equal and blind footing takes away the interpretability in explaining the underlying process of the outcome. Such an approach would be computationally expensive and, further, may suffer from poor generalization. We investigate the forward selection method for variable subset selection in the domain of advertisement click-through-rate prediction. The forward selection method proceeds sequentially in a way that rewards a set of variables by how much information it provides regarding the outcome, but penalizes the set based on the number of variables in it. Concretely, we propose a logistic regression model for estimating the conditional expectation between the outcome and the ensemble of variables. The model obtained compares favorably with that obtained via an exhaustive search through the model space. We also observe that the set of variables selected by the forward selection procedure has better predictive power than that selected by considering their individual statistical significance. Thus we show that the forward-selection method for subset selection serves to produce a good model for predicting ad click-through-rates.

AB - We consider the problem of predicting the probability of a click for an advertisement when the outcome of a click or no-click is expressed by means of a set of a large number of variables. Many, if not most, of these variables are very weakly related to the clicking of the ad. Thus, a traditional approach to address this problem that treats each variable on an equal and blind footing takes away the interpretability in explaining the underlying process of the outcome. Such an approach would be computationally expensive and, further, may suffer from poor generalization. We investigate the forward selection method for variable subset selection in the domain of advertisement click-through-rate prediction. The forward selection method proceeds sequentially in a way that rewards a set of variables by how much information it provides regarding the outcome, but penalizes the set based on the number of variables in it. Concretely, we propose a logistic regression model for estimating the conditional expectation between the outcome and the ensemble of variables. The model obtained compares favorably with that obtained via an exhaustive search through the model space. We also observe that the set of variables selected by the forward selection procedure has better predictive power than that selected by considering their individual statistical significance. Thus we show that the forward-selection method for subset selection serves to produce a good model for predicting ad click-through-rates.

KW - Click-through-rate

KW - Model selection

KW - Variable selection

KW - Web advertising

UR - http://www.scopus.com/inward/record.url?scp=70349152916&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349152916&partnerID=8YFLogxK

U2 - 10.1145/1517472.1517478

DO - 10.1145/1517472.1517478

M3 - Conference contribution

AN - SCOPUS:70349152916

SN - 9781605582771

T3 - Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD'08

SP - 45

EP - 49

BT - Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD'08

T2 - 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD'08

Y2 - 24 August 2008 through 24 August 2008

ER -