NAVI GATIONSEARCH BOX
Join us on LinkedIn Follow us on Twitter
Eura Nova RD
Eura Nova

Activity

In this section you will find EURA NOVA’s latest news and activities.

Activity

11-06-2021

AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT

This paper explains our participation in task 1of the CASE 2021 shared task. This task is about multilingual event extraction from the news. We focused on sub-task 4, event information extraction. This sub-task has a small training dataset, and we fine-tuned a multilingual BERT to solve this sub-task. We studied the instability problem on the dataset and tried to mitigate it.

Léo Bouscarrat, Antoine Bonnefoy, Cécile Capponi, Carlos Ramisch, AMU-EURANOVA at CASE 2021 Task 1: Assessing the stability of multilingual BERT, In Proc. of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021). Association for Computational Linguistics (ACL), 2021.

The final paper will be published after the conference. 

Activity

10-05-2021

Our research director invited as PC member at IEEE Big Data

We are very proud of our research director Sabri Skhiri for joining the program committee of IEEE Big Data 2021!

He will be the only Belgian and one of the few Europeans to be on the program committee of this top tier research conference in Big Data.

Congratulation Sabri!

We look forward to this international collaboration!

 

Activity

05-03-2021

Our research director is co-chair at DEBS 2021 [Call for Paper]

Congratulations to our research director Sabri Skhiri on his appointment as industry co-chair of the international conference on distributed and event-based systems.

He will be alongside talented PC members: Martin Strohbach (AGT International), Dimitris Zissis (University of the Aegean), Bogdan Ghit (Databricks), Zbigniew Jerzak (Zalando SE), Romain Rouvoy (Université catholique de Lille), Julius Rückert (Abb Corporate Research Center), Timos Sellis (Facebook), Guozhang Wang (Confluent).

If you are a researcher or an industry practitioner working in distributed and event‐based computing, submit your latest work on the DEBS 2021 industrial track!

Check out the call for the papers for more details.

Activity

19-02-2021

DMMM: Data Management Maturity Model

The assessment of the digital transformation progress is essential to understand and undertake in order to evaluate the level of maturity of data-driven companies in terms of data capabilities and to plan for improvement actions. Maturity models evaluate the performance and the execution of processes in terms of the predefined goals and strategies that the organisation has set for its long-term alignment with the value and culture. For this purpose, we developed a maturity model assessment. The value proposition is to evaluate the current maturity state of an enterprise from a data and information management point of view and to draw the target maturity state that an organisation would like to reach based on the resources, goals, and ambitions. This model envisions and proposes an evolution path from the current state to the target state. This can be used as a compass to navigate throughout the digital transformation journey.

In this paper, we present a new perspective on how to construct maturity models to assess companies’ maturity in terms of data management and advanced analytics with a focus on building a set of tools to ease the application of our model and create a fact-based roadmap to evolve from the current state to the target maturity state, which is also defined by this same model. Our Data Management Maturity Model (DMMM) model was designed to support the digital transformation from an initial level to an optimised one. It covers the different aspects that can be encountered in the majority of organisations: the organisational structure, the systems, the data dimensions, and operations. This paper is also a representation of the technical tools we developed to ease their implementation through the DMMM user interface. It depicts the methodologies behind the development of the maturity scoring system, the model architecture, the assessment practice, as well as the maturity levels resulting from the evaluation of the different data dimensions present across organisations. Additionally, we set forth the technicalities behind the capabilities of the model, their mapping for a data-centric vision, and their linkage that brings consistency and traceability between the latter.

 

Syrine Ferjaoui, Oumaima Belghith, Cyrine Zitoun, Sabri Skhiri, DMMM: Data Management Maturity Model, Proc. of the International Conference on Advanced Enterprise Information System, 2021.

If you are interested in learning more about the drivers of AI success for your business, do not hesitate to reach out

The final paper will be published after the conference. 

 

Data Maturity Model

Data Maturity Model

Activity

19-02-2021

MIC: Multi-view Image Classifier using Generative Adversarial Networks for Missing Data Imputation

In this paper, we propose a framework for image classification tasks, named MIC, that takes as input multi-view images, such as RGB-T images for surveillance purposes. We combine auto-encoder and generative adversarial network architectures to ensure the multi-view embedding in a common latent space. Then, the resulting features are fed to the classification stage. The proposed framework is able to, all at once, train the multi-view embedding model to find a shared latent representation for the different views, perform data imputation (generate the missing views) and ensure the classification task by predicting the labels. Experiments on the MNIST dataset with a panoply of classifiers and several missingness ratios show the effectiveness of our solution.

 

Gianmarco Aversano, Mahmoud Jarraya, Maher Marwani, Ichraf Lahouli and Sabri Skhiri, MIC: Multi-view Image Classifier using Generative Adversarial Networks for Missing Data Imputation,  Proc. of the 18th IEEE International Multi-conference on Systems, Signals and Devices, 2021

Download file (.pdf)

Activity

26-01-2021

MDPI Data Journal, special issue – Paper Submission Opening

After the success of five international workshops co-located at IEEE Big Data, the MDPI Data Journal is dedicating a special issue to real-time stream analytics, stream mining, CER/CEP and stream data management in big data.

Data (ISSN 2306-5729) is a peer-reviewed open-access journal on data in science, with the aim of enhancing data transparency and reusability. The journal is now included in the Emerging Sources Citation Index – ESCI (Web of Science), Scopus, and Inspec (IET).

Data has received its first CiteScore 2.1, ranking Q2 in the Scopus category “Information Systems and Management” (Real-time CiteScore 3.2 based on CiteScoreTracker 2020).

We invite researchers in this field to submit papers about scalable online learning, incremental learning on stream processing infrastructures, complex event processing, and composite event recognition. We also encourage submissions on data stream management, data architecture using stream processing, and on Internet of Things (IoT) data streaming. Additionally, we appreciate submissions that deal with the usage of stream processing in new innovative architectures.

The full CFP can be found here : https://www.mdpi.com/journal/data/special_issues/bigdata20

 

Research Topics

The topics of interest include but are not limited to:

  • New stream processing architecture for big data.
  • Complex event processing (CEP) for big data, pattern matching engines for big data.
  • Composite event recognition (CER).
  • Stream reasoning.
  • Scalable real-time decision algorithms.
  • Scalable stream processing architecture, algorithms or models.
  • Stream mining.
  • Online and incremental learning.
  • Stream SQL and other continuous query languages on big data frameworks.
  • Data pipelines and data management with Streams.
  • Stream ETL and real-time data warehouses.
  • Stream mining and algorithms.
  • Online and incremental learning and algorithms.
  • New or innovative architecture patterns leveraging stream processing.
  • IoT analytics


Important Dates

The deadline for the manuscript submission is March 1st 2021

 

Special Issue Editors

Sabri Skhiri, EURA NOVA, BE

Albert Bifet, Télécom Paris Tech, FR

Alessandro Margara, Politecnico di Milano, IT

 

Activity

15-01-2021

Towards a continuous evaluation of calibration

For safety-critical systems involving AI components (such as in planes, cars, or healthcare), safety and associated certification tasks are one of the main challenges, which can become costly and difficult to address.

One key aspect is to ensure that the decisions a machine-learning classifier makes are properly calibrated. This Thursday, our engineer Nicolas presented at the MLSC workshop part of the research work on classifiers calibration carried out with our senior data scientist Antoine Bonnefoy.

The Machine Learning in Certified Systems workshop brought together machine learning researchers with international authorities and industry experts to present the main open questions and methods for verification and certification of critical software. The objective was also to define the future research agenda towards the medium-term goal of certifying critical systems involving AI components. The workshop included invited talks, a poster session and panel discussions.
Nicolas talked about improving the calibration of classifiers and its evaluation through the introduction of continuous estimators of related errors.

Watch him present his poster presentation on Youtube.

 

You can find the poster pdf below!

 

Download file (.pdf)

Activity

13-01-2021

Reinforcement Learning Course at ENSI

Reinforcement learning is one of the most active research areas in artificial intelligence and applies to a wide range of use cases in different sectors. What makes the technology unique in that it creates autonomous systems that learn from trial-and-error interaction to maximise the total amount of reward it receives while interacting with a complex, uncertain environment.
To provide students with the skills needed in a transforming AI landscape, the ENSI school (National School for Computer Science) invited us to give a training course on the subject. Last week, our research engineer Nourchène gave a 15-hour module aimed at final year engineering students in the Software Engineering programme and the Intelligent Systems master.

During the course, she gave an introduction to reinforcement learning and its main principles :

  • What makes it different from traditional machine learning techniques (supervised and unsupervised ML)
  • Modelling by Markovian processes
  • The three main families of RL algorithms
  • The RL taxonomy with examples of algorithms from each family
  • A practical part with the OpenAI Gym toolkit to familiarise students with the different environments available and test some of them.

 

If you wish to know more about RL, do not hesitate to reach out to research.euranova.eu!

Thank you to the ENSI team for the invitation. It was a pleasure for us to be able to exchange with students. If you are a student interested in the field of Reinforcement Learning, we propose graduation projects on the subject. You can find all the internship offers on our website.

Activity

13-01-2021

Padhoc: a computational pipeline for pathway reconstruction on the fly

Molecular pathway databases represent cellular processes in a structured and standardized way. These databases support the community-wide utilization of pathway information in biological research and the computational analysis of high-throughput biochemical data. Although pathway databases are critical in genomics research, the fast progress of biomedical sciences prevents databases from staying up-to-date. Moreover, the compartmentalization of cellular reactions into defined pathways reflects arbitrary choices that might not always be aligned with the needs of the researcher. Today, no tool exists that allow the easy creation of user-defined pathway representations.

Here we present Padhoc, a pipeline for pathway ad hoc reconstruction. Based on a set of user-provided keywords, Padhoc combines natural language processing, database knowledge extraction, orthology search and powerful graph algorithms to create navigable pathways tailored to the user’s needs. We validate Padhoc with a set of well-established Escherichia coli pathways and demonstrate usability to create not-yet-available pathways in model (human) and non-model (sweet orange) organisms.

Salvador Casaní-Galdón, Cecile Pereira, Ana Conesa, Padhoc: a computational pipeline for pathway reconstruction on the fly, Bioinformatics, Volume 36 (2):i795–i803, December 2020.

DOI : https://doi.org/10.1093/bioinformatics/btaa811

Download file (.pdf)

Activity

13-01-2021

2Be3-Net : Combining 2D and 3D convolutional neural networks for 3D PET scans predictions

Radiomics – high-dimensional features extracted from clinical images – is the main approach used to develop predictive models based on 3D Positron Emission Tomography (PET) scans of patients suffering from cancer. Radiomics extraction relies on an accurate segmentation of the tumoral region, which is a time-consuming task subject to inter-observer variability. On the other hand, data-driven approaches such as deep convolutional neural networks (CNN) struggle to achieve great performances on PET images due to the absence of available large PET datasets combined to the size of 3D networks. In this paper, we assemble several public datasets to create a PET dataset large of 2800 scans and propose a deep learning architecture named “2Be3-Net” associating a 2D feature extractor to a 3D CNN predictor. First, we take advantage of a 2D pre-trained model to extract feature maps out of 2D PET slices. Then we apply a 3D CNN on top of the concatenation of the previously extracted feature maps to compute patient-wise predictions. Experiments suggest that 2Be3-Net has an improved ability to exploit spatial information compared to 2D or 3D only CNN solutions. We also evaluate our network on the prediction of clinical outcomes of head-and-neck cancer. The proposed pipeline outperforms PET radiomics approaches on the prediction of loco-regional recurrences and overall survival. Innovative deep learning architectures combining a pre-trained network with a 3D CNN could therefore be a great alternative to traditional CNN and radiomics approaches while empowering small and medium-sized datasets. 

Ronan Thomas, Elsa Schalck, Damien Fourure, Antoine Bonnefoy and Inaki Cervera-Marzal, 2Be3-Net : Combining 2D and 3D convolutional neural networks for 3D PET scans predictions, Proc. of the 2nd International Conference on Medical Imaging and Computer-Aided Diagnosis, 2021.

 

Watch Ronan present the paper in the video below:

https://www.youtube.com/watch?v=e11A7ikxS9c&t=5308s

 

The final paper will be published after the conference. 

Activity

04-12-2020

Talking graph analytics with students

Last Saturday, our Tunisian team Safa, Ichraf Hamza and Amine took part in the ENSI (Ecole Nationale des Sciences de l’Informatique) virtual forum to share their experience and meet the students! 

Our graph specialist Amine Ghrab talked to students about the power of graph analytics. 

Did you know that domains such as social networks, transportation, and biological networks are naturally modelled as graphs? He explained how a multitude of emerging problems can be represented using graph models and are efficiently solved using graph algorithms: 

“Over the past decade, business and social environments have become increasingly complex and interconnected. As a result, graphs have emerged as a widespread abstraction tool at the core of the information infrastructure that supports these environments. In the presentation, I discussed

  • The value of Graphs and their emergence in a multitude of domains
  • The growing graph ecosystem of industrial graph tools
  • Data analytics beyond the euclidean space: with examples of graph querying, mining, and Graph ML 
  • The integration of graphs within established BI systems, where graph warehouses extend current information systems with graphs management and analysis capabilities.”

If you wish to know more about graphs or have access to the slide, do not hesitate to reach out to research.euranova.eu!

Kudo to the ENSI team for the organisation. It was a pleasure for us to be able to exchange with students. If you are a student interested in the field of graphs, Amine proposes a graduation project on the subject. You can find all the internship offers on our website.

 

Academic Programmes Activity

19-11-2020

INTERNSHIPS 2021

This document presents internships supervised by our software engineering department or by our research & development department. Each project is an opportunity to feel both empowered and responsible for your own professional development and for your contribution to the company.

 

If you are interested in one of our offers, please send us your application to career@euranova.eu, including your CV and motivation regarding your top three internship positions (described in the document).

 

If you wish to read the testimonies of students who have done an internship at Eura Nova, visit our blog, or read directly their experiences:

If you are interested in working on a topic that is not in our range of offers, we would be delighted to hear your proposition and invite you to get in touch.

Internship subjects and application guidelines are available here: Internship Offers.

Download file (.pdf)

Academic Programmes Activity

19-11-2020

MASTER THESIS & PFE 2021

This document introduces you to master thesis and graduation projects supervised by our research & development department. Each project offers you the chance to be actively involved in the development of solutions to address tomorrow’s challenges in ICT and implementing them today!

If you are interested in one of our offers, please send us your application to career@euranova.eu, including your CV and motivation regarding your favourite master thesis subject (described in the document).

If you are interested in working on a topic that is not in our range of offers, we would be delighted to hear your proposition and invite you to get in touch.

Master thesis subjects and application guidelines are available here: Master Thesis Offers.

Download file (.pdf)

Activity

05-11-2020

ECML 2020 – The Keynotes

A few weeks ago, the biggest European conference on machine learning was held: ECML 2020. Our research engineer Nourchène, our R&D consultant Gianmarco, and our data scientist Ronan attended the event from Tunisia, Belgium and Marseille. In this article, they tell you about the different keynote talks they attended. 

Gemma Galdon-Clavell – Algorithmic Auditing: how to open the black-box of ML

Nourchène says: “I loved the talk given by Gemma Galdon-Clavell during which she addressed the problem of ethics in AI, as computer science engineers do not often question what they are producing from a moral standpoint. In her talk, Gemma points out the importance of data used to train a machine learning model. Data are provided by humans, but people are not perfect, they are likely to make wrong decisions. The model will then learn to behave the same way. So we might end up creating an unethical model. This can lead to two different behaviours: users either will follow the system’s recommendations at any cost or decide not to if they find the decisions not reasonable. Data will then continue to be biased, which creates a sort of deadlock.”

 

Ronan adds: “Algorithms do not produce biases from anywhere; they reproduce and amplify biases they can find in the data they ingest. As a result, we have to pay attention first to the quality of the data we use. Gemma emphasizes that algorithmic auditing is the key to understanding if the algorithm meets the expectations and if it complies with the regulations. The audit does not only cover the technical part and the way the algorithm was coded. It also focuses on how the problem was approached and the means deployed to solve it.”

 

Nourchène explains: “The speaker suggests that before creating a product, computer science engineers and developers need to ask the following questions: Is the product desirable and what is the problem that it tries to solve? Is it acceptable and does it involve users? Is it legal? Finally, does it use the right data? Gemma also suggests that ethics be taught in engineering schools. I totally agree with that because nowadays technology does not always seek to solve real problems, its goal is rather to make a fortune out of the proposed product.”

 

Max Welling – Amortized and Neural Augmented Inference

Gianmarco says: ‘My favourite talk was the one held by Max Welling. It clearly showed and unified the underlying theoretical grounds of many superficially different models, without failing to provide real-world applications. More concretely, the talk showed how to develop hybrid amortized methods that combine classical learning, inference and optimization algorithms with learned neural networks, which is of strong interest, especially in physics-related fields.

It provided a comprehensive and complete exposition of the topic of amortized neural inference and, as a consequence, it did not fail in bringing the spectator up-to-date with applications in that regard. Max Welling presented how a learned neural network can augment or correct a classical solution (attained by means of expert-knowledge or classical equations), or reversely, how a neural network can be fed useful information computed by a classical method.”

 

Been Kim – Interpretability for everyone

Gianmarco says:  “I was exposed to many new topics and applications I was not familiar with. Talks like Interpretability for everyone that offered more abstract research were the ones that struck my attention the most. The talk presented the latest discoveries and tools in terms of interpretability quantification. It also introduces how to extract interpretability from a black-box end-to-end model, which I find very important for the construction of more robust models and model diagnosis.”

 

Doina Precup – Building Knowledge For AI Agents With Reinforcement Learning

Ronan says: “I really liked the talk given by Doina Precup on how to build knowledge in the field of reinforcement learning. I only had little knowledge of this field. Thankfully, Doina introduced us quickly to the key concepts of reinforcement learning. She also presented us with some big successes of RL, presented different RL mechanisms and went towards the problem of using existing knowledge to build a life-long learning agent. Doina concluded her talk with a lot of open and inspiring questions: How can we exploit previously learned knowledge and apply it to new environments not related in any manner to the previous ones? How well is an agent preserving and enhancing its knowledge? These questions might not have definitive answers or just answers at all but I found very relevant and interesting the interrogations she raises on how we can represent knowledge.

 

Stephan Günnemann about Certifiable Robustness of ML Models for Graphs

Ronan says: In this technical talk, Stephan presented us different methods to assess GNN robustness. To certificate the robustness of a GNN, an evaluation of its sensitivity to perturbations needs to be conducted. For example, you can search for a worst-case scenario, and verify that the margin is positive to ensure the model is robust. Stephan’s talk was very pleasant to listen to, as he accompanied it with several examples and applications of the methods he presented us. Finally, he concluded that ML models for graphs aren’t reliable but that we can apply certificates and robustification principles to provide guarantees for a reliable use of GNNs.

 

Watch the talks: 

If you wish to catch up on talks we mentioned or those you missed, all the sessions, paper and presentation recordings are available (for a limited time) from the ECML website.

Gemma Galdon-Clavell

Max Welling : 

Been Kim

Doina Precup

 

Stephan Günnemann 

Activity

05-11-2020

ECML 2020 – A Summary

A few weeks ago, the biggest European conference on machine learning was held: ECML 2020. Our research engineer Nourchène, our R&D consultant Gianmarco, and our data scientist Ronan attended the event from Tunisia, Belgium and Marseille. What were the big trends and their favourite talks? What did they think of the online remote format? Let’s find out with them!

 

The Big Trends

The overall conference was very well up-to-date with the outside world’s latest trends and needs. Gianmarco explains: “The conference was rich in presentations which covered nearly all possible topics in machine learning. However, I had the impression that Graph Neural Networks and Generative Models had a little more presence than other models. Transfer learning was also another topic that seemed to be very relevant throughout the conference.”

 

Remote Format For The First Time

Due to the COVID-19 pandemic, the conference was fully virtual. The talks were pre-recorded and made available prior to the conference. The live sessions were dedicated to questions and answers, with a very brief presentation at the beginning of the session. 

Nourchène explains: “The downside was that we had to watch the whole presentation beforehand, otherwise it was difficult to follow the discussion and to interact with the speaker. Fun fact: there was a session where even the moderator was not aware of this Q&A aspect and asked the speaker why the presentation was so short! The good thing is that, since the presentations were pre-recorded, it was possible to watch the presentations from sessions running in parallel.”

Gianmarco adds: “I have not had many remote conferences in my life, but I was genuinely surprised to see how well-organised this one was. The remote framework was very well-designed, the web interface was fully functional, and they took advantage of all the benefits that a remote event can have like re-watchable presentations.”

Kudos to the organising committee for pulling it off!

 

The Keynotes

We wrote an article with more details about different keynotes that you can find on this link, but here is a teaser: 

Gemma Galdon-Clavell – Algorithmic Auditing: how to open the black-box of ML

In her talk, Gemma points out the importance of data used to train a machine learning model. According to her, algorithmic auditing is the key to understanding if the algorithm meets the expectations and if it complies with the regulations. This audit does not only cover the technical part and the way the algorithm was coded. It also focuses on how the problem was approached and the means deployed to solve it. Read our detailed review here

 

Max Welling – Amortized and Neural Augmented Inference

The talk showed and unified the underlying theoretical grounds of many superficially different models, without failing to provide real-world applications. It provides a comprehensive and complete exposition of the topic of amortized neural inference and, as a consequence, it did not fail in bringing the spectator up-to-date with applications in that regard. Read more here

 

Been Kim – Interpretability for everyone

The talk presented the latest discoveries and tools in terms of interpretability quantification. It also introduces how to extract interpretability from a black-box end-to-end model. Read more in our article.

 

Doina Precup – Building Knowledge For AI Agents With Reinforcement Learning

Doina Precup talks on how to build knowledge in the field of reinforcement learning. She also presents some big successes of RL, presented different RL mechanisms and went towards the problem of using existing knowledge to build a life-long learning agent. Discover more!

 

Stephan Günnemann – Certifiable Robustness of ML Models for Graphs

Stephan presented different methods to assess GNN robustness: an evaluation of its sensitivity to perturbations needs to be conducted. Learn more with Ronan here.

 

Interesting Paper?

Si-An Chen; Voot Tangkaratt; Hsuan-Tien Lin; Masashi Sugiyama – Active deep Q-learning with demonstration

Nourchène says: “The authors presented their paper proposing different groups of techniques for learning from demonstration in Reinforcement Learning, like RL Expert Demonstration (RLED) or Active RL Demonstration (ARLD). These techniques can be used to fasten the learning process of an RL agent. They also propose an uncertainty-based query strategy named Active Deep Q-Network, based on DQN, to dynamically estimate the uncertainty of recent states and use the queried demonstration data.“

 

Favourite tutorial

Learning With Imbalanced Domains and Rare Event Detection

Ronan says: “This tutorial was interesting and well-structured. Imbalance domains and rare-events prediction concern a lot of domains: financial, medical, data distribution… and will always remain a centre of attention in designing the appropriate solution to a problem. As a consequence, it will remain a core problem in the research. I particularly liked this tutorial as it covered a lot of different approaches: unsupervised (statistical-based, proximity-based, clustering-based), supervised and semi-supervised and compared them. As there is no ideal solution that can be applied to every problem, you have to know what exists before choosing the one that better fits your problem. The tutorial also covered different methods to properly evaluate the performance of an algorithm on an imbalanced task. ”

 

Conclusion

The conference provided a wide range of machine learning topics in the form of presentations about the latest trends, technologies and applications. As Nourchène says:  “it is an optimal platform to stay up-to-date, to widen one’s perspectives and/or dig deeper into a specific topic.

 

Watch the talks: 

If you wish to catch up on talks we mentioned or those you missed, all the sessions, paper and presentation recordings are available (for a limited time) from the ECML website.

 

Gemma Galdon-Clavell

 

Max Welling 

 

Been Kim

 

Doina Precup

 

Stephan Günnemann

 

Active deep Q-learning with demonstration: Read the paper 

Activity

30-10-2020

Our engineer Amine Ghrab presented his PhD public defense on the BI on Graph Project

Last Thursday, our engineer Amine Ghrab presented the BI on Graph project during his PhD public defense. Amine did an amazing job at the edge between Industry & Academia. Amine’s thesis was done in collaboration with the CODE/WIT Lab of the Université Libre de Bruxelles and the Universitat Politècnica de Catalunya, with the support of Prof. Oscar Romero & Prof. Esteban Zimanyi!

In his PhD thesis, Amine defined how BI environments can be enriched with Graph Data structures. Over the past decade, business and social environments have become increasingly complex and interconnected. As a result, graphs have emerged as a widespread abstraction tool at the core of the information infrastructure that supports these environments. In particular, the integration of graphs into data warehouse systems has appeared as a way to extend current information systems with graphs management and analysis capabilitiesGoing forward, Amine redefined the concepts of multidimensional cube on graph and showed how it can open new doors for data analysts. Finally, he showed how a graph data warehouse architecture can be defined.

Congratulation for your achievements!

You can find below a list of related publications:

Activity

24-08-2020

Privacy Policy Classification with XLNet

The popularisation of privacy policies has become an attractive subject of research in recent years, notably after the General Data Protection Regulation came into force in the European Union. While GDPR gives Data Subjects more rights and control over the use of their personal data, length and complexity of privacy policies can still prevent them from exercising those rights. An accepted way to improve the interpretability of privacy policies is through assigning understandable categories to every paragraph or segment in said documents. The current state of the art in privacy policy analysis has established a baseline in multi-label classification on the dataset containing 115 privacy policies, using BERT Transformers. In this paper, we propose a new classification model based on the XLNet. Trained on the same dataset, our model improves the baseline F1 macro and micro averages by 1-3% for both majority vote and union-based gold standards. Moreover, the results reported by our XLNet-based model have been achieved without fine-tuning on domain-specific data, which reduces the training time and complexity, compared to the BERT-based model. To make our method reproducible, we report our hyper-parameters and provide access to all used resources, including code. This work may, therefore, be considered as a first step to establishing a new baseline for privacy policy classification.

 

Majd Mustapha, Katsiaryna Krasnashchok, Anas Al Bassit and Sabri Skhiri, Privacy Policy Classification with XLNet, Proc. of the 15th DPM International Workshop on Data Privacy Management, Surrey, UK, 2020.

Download file (.pdf)

Activity

21-08-2020

Towards Privacy Policy Conceptual Modeling

After GDPR enforcement in May 2018, the problem of implementing privacy by design and staying compliant with regulations has been more prominent than ever for businesses of all sizes, which is evident from frequent cases against companies and significant fines paid due to non-compliance. Consequently, numerous research works have been emerging in this area. Yet, to this moment, no publicly available model can offer a comprehensive representation of privacy policies written in natural language, that is machine-readable, interoperable and suitable for automatic compliance checking. Meanwhile, privacy policies stay one of the main means of communication between a business (Data Controller) and a Data Subject, when it comes to the use of personal data. In this paper, we propose a conceptual model for fine-grained representation of privacy policies. We reuse and adapt existing Semantic Web resources in the spirit of interoperability. We represent our model as an ODRL profile and demonstrate how existing privacy policies can be translated into ODRL-like policies, consisting of deontic rules. We enrich our model with vocabularies for describing personal data processing in great detail, making it suitable for further usage in downstream applications, such as access control tools, to support adoption and implementation of privacy by design. We also demonstrate our model’s capability of handling personal data processing rules in other types of documents, namely data processing agreements, essential for controlling data privacy in a relationship between a Controller and a Processor.

The paper is available online on Springer. Currently, it is unfortunately freely available only to subscribers, but do not hesitate to reach out to us for more information!

Krasnashchok K., Mustapha M., Al Bassit A., Skhiri S. Towards Privacy Policy Conceptual Modeling. In Dobbie G., Frank U., Kappel G., Liddle S.W., Mayr H.C. (eds), Proc. of the 39th International Conference on Conceptual Modeling, LNCS 12400, 2020. Springer, Cham.

DOI : https://doi.org/10.1007/978-3-030-62522-1_32

Activity

30-03-2020

TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes

Graphs are a fundamental structure that provides an intuitive abstraction for modelling and analyzing complex and highly interconnected data. Given the potential complexity of such data, some approaches proposed extending decision-support systems with multidimensional analysis capabilities over graphs. In this paper, we introduce TopoGraph, an end-to-end framework for building and analyzing graph cubes. TopoGraph extends the existing graph cube models by defining new types of dimensions and measures and organizing them within a multidimensional space that guarantees multidimensional integrity constraints. This results in defining three new types of graph cubes: property graph cubes, topological graph cubes, and graph-structured cubes. Afterwards, we define the algebraic OLAP operations for such novel cubes. We implement and experimentally validate TopoGraph with different types of real-world datasets.

 

The paper will be published soon in Information Systems Frontiers, and is already available online on Springer. Currently, it is unfortunately available only to subscribers, but do not hesitate to reach out to us for more information!

 

Amine Ghrab, Oscar Romero, Sabri Skhiri, Esteban Zimányi, TopoGraph: an End-To-End Framework to Build and Analyze Graph Cubes, published in Information Systems Frontiers (2020).

 

 

Page 1 of 812345...Last »