How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response Model

Institute Homepage

Institute Homepage Sign In

Back

Empirical Inference Conference Paper 2003

PDF

Empirical Inference

Hyunjung (Helen) Shin

[Abstract]: Various machine learning methods have made a rapid transition to response modeling in search of improved performance. And support vector machine (SVM) has also been attracting much attention lately. This paper presents an SVM response model. We are specifically focusing on the how-tos to circumvent practical obstacles, such as how to face with class imbalance problem, how to produce the scores from an SVM classifier for lift chart analysis, and how to evaluate the models on accuracy and profit. Besides coping with the intractability problem of SVM training caused by large marketing dataset, a previously proposed pattern selection algorithm is introduced. SVM training accompanies time complexity of the cube of training set size. The pattern selection algorithm picks up important training patterns before SVM response modeling. We made comparison on SVM training results between the pattern selection algorithm and random sampling. Three aspects of SVM response models were evaluated: accuracies, lift chart analysis, and computational efficiency. The SVM trained with selected patterns showed a high accuracy, a high uplift in profit and in response rate, and a high computational efficiency.

Author(s):	Shin, H. and Cho, S.
Links:	PDF
Journal:	Proc. of the Korean Data Mining Conference
Pages:	93-107
Year:	2003
Month:	December
Day:	0

Bibtex Type:	Conference Paper (inproceedings)

Event Name:	Korean Data Mining Conference
Event Place:	Seoul, Korea

Digital:	0
Electronic Archiving:	grant_archive
Institution:	Seoul National University, Seoul, Korea
Note:	Best Paper Award
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

BibTex

@inproceedings{2709,
  title = {How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response
  Model},
  journal = {Proc. of the Korean Data Mining Conference},
  abstract = {[Abstract]: Various machine learning methods have made a rapid transition to response modeling in search of improved
  performance. And support vector machine (SVM) has also been attracting much attention lately. This paper presents an
  SVM response model. We are specifically focusing on the how-tos to circumvent practical obstacles, such as how to
  face with class imbalance problem, how to produce the scores from an SVM classifier for lift chart analysis, and how
  to evaluate the models on accuracy and profit. Besides coping with the intractability problem of SVM training caused
  by large marketing dataset, a previously proposed pattern selection algorithm is introduced. SVM training accompanies
  time complexity of the cube of training set size. The pattern selection algorithm picks up important training patterns
  before SVM response modeling. We made comparison on SVM training results between the pattern selection algorithm and random sampling. Three aspects of SVM response models were evaluated: accuracies, lift chart analysis, and computational efficiency. The SVM trained with selected patterns showed a high accuracy, a high uplift in profit and
  in response rate, and a high computational efficiency.},
  pages = {93-107},
  organization = {Max-Planck-Gesellschaft},
  institution = {Seoul National University, Seoul, Korea},
  school = {Biologische Kybernetik},
  month = dec,
  year = {2003},
  note = {Best Paper Award},
  slug = {2709},
  author = {Shin, H. and Cho, S.},
  month_numeric = {12}
}