skip to main content
article

Double-bootstrapping source data selection for instance-based transfer learning

Published: 01 August 2013 Publication History

Abstract

Instance-based transfer is an important paradigm for transfer learning, where data from related tasks (source data) are combined with the data for the current learning task (target data) to train a learner for the current (target) task. However, in most application scenarios, the benefit of the source data is unclear. The source may contain both helpful and harmful instances to the target learning. Simply combining the source with the target data may result in performance deterioration (negative transfer). Selecting the instances from the source data that will benefit the target task is a key step for instance-based transfer learning. Most existing instance-based transfer methods lack such selection or mix source selection with the training for the target task. This leads to problems as the training may use source data harmful to the target. We propose a simple yet effective method for instance-based transfer learning in environments where the usefulness of the sources are unclear. The method employs a double-selection process, based on bootstrapping, to reduce the impact of irrelevant/harmful data in the source. Experiment results show that in most cases, our method produces more improvements through transfer than TrBagg (Kamishima et al., 2009) and TrAdaBoost (Dai et al., 2009). Our method can also deal with a wider range of transfer learning scenarios.

References

[1]
Kamishima, T., Hamasaki, M., Akaho, S., 2009. Trbagg: A simple transfer learning method and its application to personalization in collaborative tagging. In: Proc. of IEEE International Conference on Data Mining, pp. 219-228.
[2]
Dai, W., Yang, Q., Xue, G.-R., Yu, Y., 2009. Boosting for transfer learning. In: Proc. of International Conference on Machine Learning, pp. 193-200.
[3]
Thrun, S., 1995. Is learning the nth thing any easier than learning the first?. In: Proc. of Ann. Conf. Neural Information Processing Systems, pp. 640-646.
[4]
Multitask learning. Machine Learning. v28 i1. 41-75.
[5]
Jiang, J., Zhai, C., 2007. Instance weighting for domain adaptation in nlp. In: Proc. of Ann. Conf. for the Assoc. Computational Linguistics, pp. 264-271.
[6]
Huang, J., Smola, A., Gretton, A., Borgwardt, K., 2007. Correcting sample selection bias by unlabeled data. In: Proc. Ann. Conf. Neural Information Processing Systems, pp. 601-608.
[7]
A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. v55 i1. 119-139.
[8]
A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering. v22 i10. 1345-1359.
[9]
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y., 2007. Self-taught learning:transfer learning from unlabeled data. In: Proc. International Conference on, Machine Learning, pp. 759-766.
[10]
Ando, R., Zhang, T., 2005. A high-performance semi-supervised learning method for text chunking, In: Proc. Ann. Meeting on Assoc for Computational Linguistics, pp. 1-9.
[11]
Blitzer, J., McDonald, R., Pereira, F., 2006. Domain adaptation with structural correspondence learning. In: Proc. Conf. Empirical Methods in Natural, Language, pp. 120-128.
[12]
Lawrence, N., Platt, J., 2004. Learning to learn with the informative vector machine. In: Proc. International Conference on Machine Learning, pp. 65.
[13]
Bonilla, E., Chai, K., Williams, C., 2008. Multi-task gaussian process prediction. In: Proc. Ann. Conf. Neural Information Processing Systems, pp. 145-154.
[14]
Schwaighofer, A., Tresp, V., Yu, K., 2005. Learning gaussian process kernels via hierarchical bayes, In: Proc. Ann. Conf. Neural Information Processing Systems, 2005, pp. 1209-1216.
[15]
Evgeniou, T., Pontil, M., 2004. Regularized multi-task learning. In: Proc ACM SIGKDD Int'l Conf Knowledge Discovery and Data Mining, pp. 109-117.
[16]
Mihalkova, L., Huynh, T., Mooney, R., 2007. Mapping and revising markov logic networks for transfer learning. In: Proc. Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf, pp. 608-614.
[17]
Mihalkova, L., Mooney, R., 2008. Transfer learning by mapping with minimal target data. In: Proc. Assoc. for the Advancement of Artificial Intelligence, Workshop Transfer Learning for Complex Tasks, pp. 1163-1168.
[18]
UCI. <http://archive.ics.uci.edu/ml/>
[19]
WEKA. <http://www.cs.waikato.ac.nz/ml/weka/>

Cited By

View all
  1. Double-bootstrapping source data selection for instance-based transfer learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Pattern Recognition Letters
    Pattern Recognition Letters  Volume 34, Issue 11
    August, 2013
    109 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 August 2013

    Author Tags

    1. Bagging
    2. Instance-based transfer learning
    3. Source data selection
    4. Transfer learning

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media