Data is at the heart of digital transformations. However, between data and the business cases of companies there is still a big entry barrier :
the technology
With our ASGARD research project, Euranova wants to be present on the market within 2 or 3 years with the technologies and knowledge necessary to create value for our customers. The goal is to help the companies to create more value with less investments, more reliability and above all in full compliance with the legislation
Asgard’s research project was able to see the day due to the co-financing of the Walloon Region. It concentrates its research around the 4 main stages of the data value chain and address, through its research tracks, the following challenges :
Rune
The RUNE track answers to “Making data available” challenges by respecting the law and more specifically the GDPR. How to automate this access to data according to the legal basis of exploitation of the data scientist/analyst and according to the legal basis of data acquisition. This track covers NLP, Legal NLP, GPDR ontology, first order inference, …
Yggdrasil
The YGGDRASIL track answers to “Data exploitation and modelling” challenges. The 10 largest ML conferences generate around 2500 papers per year (ICML, IJCAI, NeurIPS, AAAI, ICLR, INTELISYS, ECML, KDD, ECAI, ACL, ICIP, Big Data), it is extremely difficult for a data scientist to stay up to date with the state of the art in his field, even more so when he is 100% of his time on industrial projects. In addition, we have a new category of problems that consist of having to analyse data from different media (image, text, database, graphs, etc.). So this track helps assist/automate machine learning tasks in these multi-media cases and covers the following topics autoML, Multi model representation learning, GAN-based feature extractor.
Mjolnir
The MJOLNIR track answers to “Deployment in production” challenges, they require the implementation of data pipelines generally composed of several data processing and storage frameworks. However, the optimisation of these frameworks is very complicated when one variable can positively or negatively influence the 500 others! We should therefore try to use ML to automatically find the configuration parameters that optimise the pipeline according to the jobs, data and budget available. Such an optimisation could improve the performance, and therefore reduce the cost, by 10 and 60. This track covers AutoTuning, Reinforcement learning, batch and stream processing, meta learning.
Vadgelmir
The VADGELMIR track answers to “Control in production, data quality” challenges. It is usually in production that we realise that the data is very different from our training data. A large part of the differences come from the quality of the data. Today there are a lot of tools for quality processing, but they are very expensive to implement and can sometimes take 3 to 5 years. Instead, we are looking for a solution that does not find the real data but limits the impact of the quality defect on the ML task. Just like an image that is blurred due to noise to which we could apply a denoising filter. We consider here that the quality defect is noise added to the initial data, so we only have to debug it This track covers GAN, missing data imputation, auto encoder, deep neural nets.
Mass Estimation of Planck Galaxy Clusters using Deep Learning
Galaxy cluster masses can be inferred indirectly using measurements from X-ray band, Sunyaev-Zeldovich (SZ) effect signal or optical observations. Unfortunately, all of them are affected by some bias. Alternatively, we provide an independent estimation of the cluster masses from the Planck PSZ2 catalogue of galaxy clusters using a machine-learning method.
Automatic Parameter Tuning for Big Data Pipelines
Big data frameworks generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task given the size of the search space. We propose to use a deep reinforcement learning algorithm to tune a fraud detection big data pipeline.
Multimodal Classifier For Space Target Recognition
We propose a multi-modal framework to tackle the SPARK Challenge by classifying satellites using RGB and depth images. Our framework is mainly based on Auto-Encoders to embed the two modalities in a common latent space in order to exploit redundant and complementary information between the two types of data.
AMI-Class: Towards a Fully Automated Multi-view Image Classifier
In this paper, we propose an automated framework for multi-view image classification tasks. The proposed framework is able to, all at once, train a model to find a common latent representation and perform data imputation, choose the best classifier and tune all necessary hyper-parameters.
Policy-based Automated Compliance Checking
Under the GDPR requirements and privacy-by-design guidelines, access control for personal data should not be limited to a simple role-based scenario. For the processing to be compliant, additional attributes, such as the purpose of processing or legal basis, should be verified against an established data processing agreement or policy.
Anomaly Detection: How to Artificially Increase your F1-Score with a Biased Evaluation Protocol
Anomaly detection is a widely explored domain in machine learning. Many models are proposed in the literature, and compared through different metrics measured on various datasets.
The most popular metrics used to compare performances are F1-score, AUC and AVPR.
DAEMA: Denoising Autoencoder with Mask Attention
Missing data is a recurrent and challenging problem, especially when using machine learning algorithms for real-world applications. For this reason, missing data imputation has become an active research area, in which recent deep learning approaches have achieved state-of-the-art results. We propose DAEMA: Denoising Autoencoder with Mask Attention.
Estimating Expected Calibration Errors
Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken.
Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent.
A Combined Rule-Based and Machine Learning Approach for Automated GDPR Compliance Checking
The General Data Protection Regulation (GDPR) requires data controllers to implement end-to-end compliance. Controllers must therefore ensure that the terms agreed with the data subject and their own obligations under GDPR are respected in the data flows from data subject to controllers, processors and sub-processors (i.e. data supply chain).
MIC: Multi-view Image Classifier using Generative Adversarial Networks for Missing Data Imputation
In this paper, we propose a framework for image classification tasks, named MIC, that takes as input multi-view images, such as RGB-T images for surveillance purposes. We combine auto-encoder and generative adversarial network architectures to ensure the multi-view embedding in a common latent space.
Privacy Policy Classification with XLNet
The popularisation of privacy policies has become an attractive subject of research in recent years, notably after the General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is…
Towards Privacy Policy Conceptual Modeling
After GDPR enforcement in May 2018, the problem of implementing privacy by design and staying compliant with regulations has been more prominent than ever for businesses of all sizes, which is evident from frequent cases against companies and significant fines paid due to non-compliance. Consequently, numerous research works have been emerging in this area….