Below, I present my publications and other current research works. I also put the information concerning them as well as consultation links.
Abstract: In this paper, we consider the problem of making skeptical inferences for the multi-label ranking problem. We assume that our uncertainty is described by a convex set of probabilities (i.e. a credal set), defined over the set of labels. Instead of learning a singleton prediction (or, a completed ranking over the labels), we thus seek for skeptical inferences in terms of set-valued predictions consisting of completed rankings.
Abstract: Gaussian discriminant analysis is a popular classification model, that in the precise case can produce unreliable predictions in case of high uncertainty (e.g., due to scarce or noisy data). While imprecise probability theory offers a nice theoretical framework to solve such issues, it has not been yet applied to Gaussian discriminant analysis. This work remedies this, by proposing a new Gaussian discriminant analysis based on robust Bayesian analysis and near-ignorance priors. The model delivers cautious predictions, in form of set-valued class, in case of limited or imperfect available information. We present and discuss results of experimentation on real and synthetic datasets, where for this latter we corrupt the test instance to see how our approach reacts to non i.i.d. samples. Experiments show that including an imprecise component in the Gaussian discriminant analysis produces reasonably cautious predictions, and that set-valued predictions correspond to instances for which the precise model performs poorly.
Abstract: In this paper, we consider the problem of making distributionally robust, skeptical inferences for the multi-label problem, or more generally for Boolean vectors. By distributionally robust, we mean that we consider a set of possible probability distributions, and by skeptical we understand that we consider as valid only those inferences that are true for every distribution within this set. Such inferences will provide partial predictions whenever the considered set is sufficiently big. We study in particular the Hamming loss case, a common loss function in multi-label problems, showing how skeptical inferences can be made in this setting. Our experimental results are organised in three sections; (1) the first one indicates the gain computational obtained from our theoretical results by using synthetical data sets, (2) the second one indicates that our approaches produce relevant cautiousness on those hard-to-predict instances where its precise counterpart fails, and (3) the last one demonstrates experimentally how our approach copes with imperfect information (generated by a downsampling procedure) better than the partial abstention [31] and the rejection rules.
Abstract: We present two different strategies to extend the classical multi-label chaining approach to handle imprecise probability estimates. These estimates use convex sets of distributions (or credal sets) in order to describe our uncertainty rather than a precise one. The main reasons one could have for using such estimations are (1) to make cautious predictions (or no decision at all) when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining. We adapt both strategies to the case of the naive credal classifier, showing that this adaptations are computationally efficient. Our experimental results on missing labels, which investigate how reliable these predictions are in both approaches, indicate that our approaches produce relevant cautiousness on those hard-to-predict instances where the precise models fail.
Abstract: In this paper, we consider the problem of making distributionally robust, skeptical inferences for the multi-label problem, or more generally for Boolean vectors. By distributionally robust, we mean that we consider sets of probability distributions, and by skeptical we understand that we consider as valid only those inferences that are true for every distribution within this set. Such inferences will provide partial predictions whenever the considered set is sufficiently big. We study in particular the Hamming loss case, a common loss function in multi-label problems, showing how skeptical inferences can be made in this setting. We also perform some experiments demonstrating the interest of our results.
Abstract: Ranking problems are difficult to solve due to their combinatorial nature. One way to solve this issue is to adopt a decomposition scheme, splitting the initial difficult problem in many simpler problems. The predictions obtained from these simplified settings must then be combined into one single output, possibly resolving inconsistencies between the outputs. In this paper, we consider such an approach for the label ranking problem, where in addition we allow the predictive model to produce cautious inferences in the form of sets of rankings when it lacks information to produce reliable, precise predictions. More specifically, we propose to combine a rank-wise decomposition, in which every sub-problem becomes an ordinal classification one, with a constraint satisfaction problem (CSP) approach to verify the consistency of the predictions. Our experimental results indicate that our approach produces predictions with appropriately balanced reliability and precision, while remaining competitive with classical, precise approaches.
Abstract: Gaussian discriminant analysis is a popular classification model, that in the precise case can produce unreliable predictions in case of high uncertainty. While imprecise probability theory offer a nice theoretical framework to solve this issue, it has not been yet applied to Gaussian discriminant analysis. This work remedies this, by proposing a new Gaussian discriminant analysis based on robust Bayesian analysis and near-ignorance priors. The model delivers cautious predictions, in form of set-valued class, in case of limited or imperfect available information. Experiments show that including an imprecise component in the Gaussian discriminant analysis produces reasonably cautious predictions, in the sense that the number of set-valued predictions is not too high, and that those predictions correspond to hard-to-classify instances, that is instances for which the precise classifier accuracy drops.
Abstract: Apprendre à prédire des rangements d’étiquettes est un problème difficile, en raison de leur nature combinatoire. Une façon de le contourner est de diviser le problème initial en plusieurs sous-problèmes plus simples. Les prédictions obtenues à partir de ces sous-problèmes simplifiés doivent ensuite être combinées en une seule sortie, résolvant les éventuelles incohérences entre les sorties. Dans ce travail, nous adoptons une telle approche en permettant aux sous-problèmes de produire des inférences prudentes sous la forme d’ensembles de rangs lorsque l’incertitude attachée aux données produit des prédictions peu fiables. Plus précisément, nous proposons de combiner une décomposition par rang, dans laquelle chaque sous-problème devient une régression ordinale prudente, avec les problème de satisfaction de contraintes (CSP) pour vérifier la cohérence des prédictions. Nos résultats expérimentaux indiquent que notre approche produit des prédictions avec une fiabilité et une précision équilibrée, tout en restant compétitive avec les approches classiques
Abstract: L’objectif de cet article est de proposer une nouvelle approche de classification prudente basée sur l’inférence Bayésienne robuste et l’analyse discriminante linéaire. Cette modélisation est conçue pour prendre en compte, dans ses inférences a posteriori, le manque d’information lié aux données. Le principe de cette approche est d’utiliser un ensemble de distributions a priori pour modéli- ser l’ignorance initiale, plutôt qu’une seule distribution (souvent dite “non-informative”) qui peut fortement influencer les résultats en cas de faible quantité de données. Des premières expériences montrent que l’ajout d’impré- cision permet d’être prudent en cas de doute sans pour autant diminuer la qualité du modèle, tout en gardant un temps de calcul raisonnable.
Abstract: In this paper, we present two different ways to extend the classical multi-label chaining approach to handle imprecise probability estimates. These estimates use convex sets of distributions (or credal sets) in order to describe our uncertainty rather than a precise one. The main reasons one could have for using such estimates are (1) to make cautious predictions (or no decision at all) when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining. We perform experiments on missing and noisy labels to investigate how accurate and how precise these predictions are in both approaches. Our experimental results indicate that while our approach produce relevant cautiousness (i.e., forget predictions likely to be erroneous), results regarding possible bias correction using a minimax approach are less encouraging, except when high adversarial noise affect the labels, in which case our approach outperform its precise counterpart.
Abstract: Ranking problems are usually difficult to solve, due to their combinatorial nature. One way to circumvent this issue is to adopt a decomposition scheme, in which the initial difficult problem is split into a set of simpler problems. The predictions obtained from these simplified settings must then be combined into one single output, possibly resolving observed inconsistencies between the outputs. In this paper, we consider such an approach for the label ranking problem, where in addition we allow the predictive model to produce cautious inferences in the form of sets of rankings when it lacks information to produce reliable, precise predictions. More particularly, we propose to combine a rank-wise decomposition, in which every sub-problem becomes an ordinal classification one, with a constraint satisfaction problem (CSP) approach to verify the overall consistency of the predictions.
Abstract: (Coming soon)
Abstract:
Abstract: Markov models are powerful statistical tools for analyzing and assessing the cost and health consequences of new health-care interventions (i.e., a health-economic evaluation). These are often used by health economists who have tried to implement them using tools that are simple to use but not necessarily the most adapted or scalable. Thus, the search for the tools to implement this kind of model has become a major challenge because the use of spreadsheets (of the Microsoft Office Excel type) is a source of errors and limits the traceability and the quality control, in addition, they are not specialized in solving statistical problems. Capionis, a biostatistics consulting firm, is now seeking to improve their implementation process based on tools that can optimize and/or automate a subprocess of the global process (e.g. automated graphics reporting Sensitivity analysis). For this, the Heemod package of R is an alternative which has seemed to us interesting. We have therefore focused to evaluate and analyze the package, implementing several real cases of health-economic evaluation, already developed and validated by the team of biostatistics at Capionis. Despite the novelty of the package and the difficulties encountered in reproducing health-economic results due to the missing characteristics, we have not found any problem with the accuracy of numerical results, but its flexibility to implement or migrate real cases is not satisfactory in contrast to the Microsoft Excel tool. Then, we can not pretend to judge the efficiency and efficiency of the Heemod package compared to Microsoft Excel, since it has been used for over 20 years unlike the package (around 1 year). In addition, we have also developed a “simple” web platform for the end user and/or the community. This platform uses Heemod package, and was fully implemented in R with Shiny.
Abstract: (Coming soon)
Abstract: Nowadays, Crowdsoucing services are increasingly used in a variety of applications, because they allow to publish small tasks to be performed by a large group of networked people at low-cost. Those services are structured in processes, or stages, to ensure the quality of work, such as assignment of tasks, workers’ skills estimation and quality estimation. This study focused on deepening the quality estimation stage of Crowdsourcing by proposing a model of the users behaviors enabling us to evaluate state-of-the art Crowdsourcing solutions. Therefore, we used the Confusion Matrix to represent the knowledge of the user in a multi-class classification problem and then model four different user profiles (e.g. expert and amateur). These profiles follow a discrete probability distribution (e.g. logarithmic distribution ) and knowledge of the user will be generated from this distribution and Monte-Carlo Simulation. We also explored another horizon by analyzing the real data of the Tela-Botanica web site in order to extract realistic profiles. Thus, with the help of this model, we will perform a set of random simulations in order to validate our profiles and to evaluate two methods, or inferences, of Crowdsoucing solutions.