Personal

Current works (tentative titles)

Y.C. Carranza Alarcón. Technical Report: Description of optimisation problems used to implement Python svm-label-ranking library. 2022. [in preparation.] [Source Code]

Arxiv

Y.C. Carranza Alarcón, Vu-Linh Nguyen Skeptical inferences in multi-label ranking with sets of probabilities ARXIV.2210.08576, Oct 2022. [PDF]

Abstract: In this paper, we consider the problem of making skeptical inferences for the multi-label ranking problem. We assume that our uncertainty is described by a convex set of probabilities (i.e. a credal set), defined over the set of labels. Instead of learning a singleton prediction (or, a completed ranking over the labels), we thus seek for skeptical inferences in terms of set-valued predictions consisting of completed rankings.

Journals

Y.C. Carranza Alarcón, Sébastien Destercke. Imprecise Gaussian Discriminant Classification. Pattern Recognition, Vol. 112, October 2020. [PDF]

Abstract: Gaussian discriminant analysis is a popular classification model, that in the precise case can produce unreliable predictions in case of high uncertainty (e.g., due to scarce or noisy data). While imprecise probability theory offers a nice theoretical framework to solve such issues, it has not been yet applied to Gaussian discriminant analysis. This work remedies this, by proposing a new Gaussian discriminant analysis based on robust Bayesian analysis and near-ignorance priors. The model delivers cautious predictions, in form of set-valued class, in case of limited or imperfect available information. We present and discuss results of experimentation on real and synthetic datasets, where for this latter we corrupt the test instance to see how our approach reacts to non i.i.d. samples. Experiments show that including an imprecise component in the Gaussian discriminant analysis produces reasonably cautious predictions, and that set-valued predictions correspond to instances for which the precise model performs poorly.
Y.C. Carranza Alarcón, Sébastien Destercke. Skeptical binary inferences in multi-label problems with sets of probabilities. 10.48550/ARXIV.2205.00662, May 2022. [PDF]

Abstract: In this paper, we consider the problem of making distributionally robust, skeptical inferences for the multi-label problem, or more generally for Boolean vectors. By distributionally robust, we mean that we consider a set of possible probability distributions, and by skeptical we understand that we consider as valid only those inferences that are true for every distribution within this set. Such inferences will provide partial predictions whenever the considered set is sufficiently big. We study in particular the Hamming loss case, a common loss function in multi-label problems, showing how skeptical inferences can be made in this setting. Our experimental results are organised in three sections; (1) the first one indicates the gain computational obtained from our theoretical results by using synthetical data sets, (2) the second one indicates that our approaches produce relevant cautiousness on those hard-to-predict instances where its precise counterpart fails, and (3) the last one demonstrates experimentally how our approach copes with imperfect information (generated by a downsampling procedure) better than the partial abstention [31] and the rejection rules.

International conferences

Y.C. Carranza Alarcón, Sébastien Destercke. Multi-label chaining using naive credal classifier. The Sixteenth European Conference on Symbolic and Quantitative Approaches with Uncertainty (ECSQARU). Sept 2021. [PDF] [Slider]

Abstract: We present two different strategies to extend the classical multi-label chaining approach to handle imprecise probability estimates. These estimates use convex sets of distributions (or credal sets) in order to describe our uncertainty rather than a precise one. The main reasons one could have for using such estimations are (1) to make cautious predictions (or no decision at all) when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining. We adapt both strategies to the case of the naive credal classifier, showing that this adaptations are computationally efficient. Our experimental results on missing labels, which investigate how reliable these predictions are in both approaches, indicate that our approaches produce relevant cautiousness on those hard-to-predict instances where the precise models fail.
Y.C. Carranza Alarcón, Sébastien Destercke. Distributionally robust, skeptical binary inferences in multi-label problems. Proceedings of Machine Learning Research, Volume 147 (ISIPTA2021). July 2021. [PDF] [Slider] [Poster]

Abstract: In this paper, we consider the problem of making distributionally robust, skeptical inferences for the multi-label problem, or more generally for Boolean vectors. By distributionally robust, we mean that we consider sets of probability distributions, and by skeptical we understand that we consider as valid only those inferences that are true for every distribution within this set. Such inferences will provide partial predictions whenever the considered set is sufficiently big. We study in particular the Hamming loss case, a common loss function in multi-label problems, showing how skeptical inferences can be made in this setting. We also perform some experiments demonstrating the interest of our results.
Y.C. Carranza-Alarcón, Soundouss Messoudi, Sébastien Destercke Cautious label-wise ranking with constraint satisfaction. 18th International Conference, IPMU 2020. June 2020 [PDF] [Slider] [BibTeX]

Abstract: Ranking problems are difficult to solve due to their combinatorial nature. One way to solve this issue is to adopt a decomposition scheme, splitting the initial difficult problem in many simpler problems. The predictions obtained from these simplified settings must then be combined into one single output, possibly resolving inconsistencies between the outputs. In this paper, we consider such an approach for the label ranking problem, where in addition we allow the predictive model to produce cautious inferences in the form of sets of rankings when it lacks information to produce reliable, precise predictions. More specifically, we propose to combine a rank-wise decomposition, in which every sub-problem becomes an ordinal classification one, with a constraint satisfaction problem (CSP) approach to verify the consistency of the predictions. Our experimental results indicate that our approach produces predictions with appropriately balanced reliability and precision, while remaining competitive with classical, precise approaches.
Y.C. Carranza-Alarcón, Sébastien Destercke. Imprecise Gaussian Discriminant Classification. Proceedings of Machine Learning Research, Volume 103 (ISIPTA2019). July 2019. [PDF] [Slider] [Poster]

Abstract: Gaussian discriminant analysis is a popular classification model, that in the precise case can produce unreliable predictions in case of high uncertainty. While imprecise probability theory offer a nice theoretical framework to solve this issue, it has not been yet applied to Gaussian discriminant analysis. This work remedies this, by proposing a new Gaussian discriminant analysis based on robust Bayesian analysis and near-ignorance priors. The model delivers cautious predictions, in form of set-valued class, in case of limited or imperfect available information. Experiments show that including an imprecise component in the Gaussian discriminant analysis produces reasonably cautious predictions, in the sense that the number of set-valued predictions is not too high, and that those predictions correspond to hard-to-classify instances, that is instances for which the precise classifier accuracy drops.

National (French) Conference

Y.C. Carranza-Alarcón, Soundouss Messoudi, Sébastien Destercke Apprentissage de rangements prudent avec satisfaction de contraintes. 29èmes Rencontres francophones sur la Logique Floue et ses Applications. October 2020. [PDF] [Slider] [Editor]

Abstract: Apprendre à prédire des rangements d’étiquettes est un problème difficile, en raison de leur nature combinatoire. Une façon de le contourner est de diviser le problème initial en plusieurs sous-problèmes plus simples. Les prédictions obtenues à partir de ces sous-problèmes simplifiés doivent ensuite être combinées en une seule sortie, résolvant les éventuelles incohérences entre les sorties. Dans ce travail, nous adoptons une telle approche en permettant aux sous-problèmes de produire des inférences prudentes sous la forme d’ensembles de rangs lorsque l’incertitude attachée aux données produit des prédictions peu fiables. Plus précisément, nous proposons de combiner une décomposition par rang, dans laquelle chaque sous-problème devient une régression ordinale prudente, avec les problème de satisfaction de contraintes (CSP) pour vérifier la cohérence des prédictions. Nos résultats expérimentaux indiquent que notre approche produit des prédictions avec une fiabilité et une précision équilibrée, tout en restant compétitive avec les approches classiques
Y.C. Carranza-Alarcón, Sébastien Destercke. Analyse discriminante imprécise basée sur l'inférence bayésienne robuste. 27èmes Rencontres francophones sur la Logique Floue et ses Applications. November 2018. [PDF] [Slider] [Editor]

Abstract: L’objectif de cet article est de proposer une nouvelle approche de classification prudente basée sur l’inférence Bayésienne robuste et l’analyse discriminante linéaire. Cette modélisation est conçue pour prendre en compte, dans ses inférences a posteriori, le manque d’information lié aux données. Le principe de cette approche est d’utiliser un ensemble de distributions a priori pour modéli- ser l’ignorance initiale, plutôt qu’une seule distribution (souvent dite “non-informative”) qui peut fortement influencer les résultats en cas de faible quantité de données. Des premières expériences montrent que l’ajout d’impré- cision permet d’être prudent en cas de doute sans pour autant diminuer la qualité du modèle, tout en gardant un temps de calcul raisonnable.

Workshops

Y.C. Carranza Alarcón, Sébastien Destercke A first glance at multi-label chaining using imprecise probabilities. Workshop on Uncertainty in Machine Learning (WUML). September 2020. [PDF] [Slider]

Abstract: In this paper, we present two different ways to extend the classical multi-label chaining approach to handle imprecise probability estimates. These estimates use convex sets of distributions (or credal sets) in order to describe our uncertainty rather than a precise one. The main reasons one could have for using such estimates are (1) to make cautious predictions (or no decision at all) when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining. We perform experiments on missing and noisy labels to investigate how accurate and how precise these predictions are in both approaches. Our experimental results indicate that while our approach produce relevant cautiousness (i.e., forget predictions likely to be erroneous), results regarding possible bias correction using a minimax approach are less encouraging, except when high adversarial noise affect the labels, in which case our approach outperform its precise counterpart.
Sébastien Destercke, Y.C. Carranza Alarcón Some results on cautious label-wise ranking with constraint satisfactions. From Multiple Criteria Decision Aid to Preference Learning (DA2PL). November 2018. [PDF] [Slider]

Abstract: Ranking problems are usually difficult to solve, due to their combinatorial nature. One way to circumvent this issue is to adopt a decomposition scheme, in which the initial difficult problem is split into a set of simpler problems. The predictions obtained from these simplified settings must then be combined into one single output, possibly resolving observed inconsistencies between the outputs. In this paper, we consider such an approach for the label ranking problem, where in addition we allow the predictive model to produce cautious inferences in the form of sets of rankings when it lacks information to produce reliable, precise predictions. More particularly, we propose to combine a rank-wise decomposition, in which every sub-problem becomes an ordinal classification one, with a constraint satisfaction problem (CSP) approach to verify the overall consistency of the predictions.
Y.C. Carranza Alarcón, Sébastien Destercke Some results on linear imprecise discriminant analysis 11th Workshop on Principles and Methods of Statistical Inference with Interval Probability (WPMSIIP) July 2018. [Slider] [Workshop]

Abstract: (Coming soon)
Y.C. Carranza-Alarcon, B. Liquet, L Baschet et S. Marque Evaluation of the Heemod package and implementation of a Shiny synthesis table for cost-effectiveness analysis Sixth Rencontres R June 2017. [PDF] [Slider] [Software]

Abstract:

PhD dissertation

Y.C. Carranza-Alarcon, Distributionally robust, skeptical inferences in supervised classification using imprecise probabilities. PhD Thesis at University of Technology of Compiègne. Supervised by Sébastien Destercke. December 2020 [PDF] [Slider]

Master's works

Y.C. Carranza-Alarcon, supervised by Benoit LIQUET and Louise BASCHET. Health-Economic modeling using Markov model. September 2017. [Slider]

Abstract: Markov models are powerful statistical tools for analyzing and assessing the cost and health consequences of new health-care interventions (i.e., a health-economic evaluation). These are often used by health economists who have tried to implement them using tools that are simple to use but not necessarily the most adapted or scalable. Thus, the search for the tools to implement this kind of model has become a major challenge because the use of spreadsheets (of the Microsoft Office Excel type) is a source of errors and limits the traceability and the quality control, in addition, they are not specialized in solving statistical problems. Capionis, a biostatistics consulting firm, is now seeking to improve their implementation process based on tools that can optimize and/or automate a subprocess of the global process (e.g. automated graphics reporting Sensitivity analysis). For this, the Heemod package of R is an alternative which has seemed to us interesting. We have therefore focused to evaluate and analyze the package, implementing several real cases of health-economic evaluation, already developed and validated by the team of biostatistics at Capionis. Despite the novelty of the package and the difficulties encountered in reproducing health-economic results due to the missing characteristics, we have not found any problem with the accuracy of numerical results, but its flexibility to implement or migrate real cases is not satisfactory in contrast to the Microsoft Excel tool. Then, we can not pretend to judge the efficiency and efficiency of the Heemod package compared to Microsoft Excel, since it has been used for over 20 years unlike the package (around 1 year). In addition, we have also developed a “simple” web platform for the end user and/or the community. This platform uses Heemod package, and was fully implemented in R with Shiny.
Y.C. Carranza Alarcón, supervised by Xavier BRY. Recherche de composante(s) explicative(s) par pénalisation. Novembre 2018. [PDF] [Slider]

Abstract: (Coming soon)
Y.C. Carranza Alarcón, supervised by Maximilien SERVAJEAN Modélisation des usages utilisateurs pour le Crowdsourcing à grande échelle. June 2015. [PDF] [Slider]

Abstract: Nowadays, Crowdsoucing services are increasingly used in a variety of applications, because they allow to publish small tasks to be performed by a large group of networked people at low-cost. Those services are structured in processes, or stages, to ensure the quality of work, such as assignment of tasks, workers’ skills estimation and quality estimation. This study focused on deepening the quality estimation stage of Crowdsourcing by proposing a model of the users behaviors enabling us to evaluate state-of-the art Crowdsourcing solutions. Therefore, we used the Confusion Matrix to represent the knowledge of the user in a multi-class classification problem and then model four different user profiles (e.g. expert and amateur). These profiles follow a discrete probability distribution (e.g. logarithmic distribution ) and knowledge of the user will be generated from this distribution and Monte-Carlo Simulation. We also explored another horizon by analyzing the real data of the Tela-Botanica web site in order to extract realistic profiles. Thus, with the help of this model, we will perform a set of random simulations in order to validate our profiles and to evaluate two methods, or inferences, of Crowdsoucing solutions.

Publications

Current works (tentative titles)

Arxiv

Journals

International conferences

National (French) Conference

Workshops

PhD dissertation

Master's works