Abstract |
The recently coined term \'learning from label proportions\' refers to a new learning paradigm where training data is given by groups (also denoted as \'bags\'), and the only known information is the label proportion of each bag. The aim is then to construct a classification model to predict the class label of an individual instance, which differentiates this paradigm from the one of multi-instance learning. This learning setting presents very different applications in political science, marketing, healthcare and, in general, all fields in relation with anonymous data. In this paper, two new strategies are proposed to tackle this kind of problems. Both proposals are based on the optimisation of pattern class memberships using the data distribution in each bag and the known label proportions. To do so, linear discriminant analysis has been reformulated to work with non-crisp class memberships. The experimental part of this paper sets different objetives: 1) study the difference in performance, comparing our proposals and the fully supervised setting, 2) analyse the potential benefits of refining class memberships by the proposed approaches, and 3) test the influence of other factors in the performance, such as the number of classes or the bag size. The results of these experiments are promising, but further research should be encouraged for studying more complex data configurations. © 2016 IEEE. |