We propose a multi-modal framework to tackle the SPARK Challenge by classifying satellites using RGB and depth images. Our framework is mainly based on Auto-Encoders to embed the two modalities in a common latent space in order to exploit redundant and complementary information between the two types of data.
In this paper, we propose an automated framework for multi-view image classification tasks. The proposed framework is able to, all at once, train a model to find a common latent representation and perform data imputation, choose the best classifier and tune all necessary hyper-parameters.
In this paper, we propose a framework for image classification tasks, named MIC, that takes as input multi-view images, such as RGB-T images for surveillance purposes. We combine auto-encoder and generative adversarial network architectures to ensure the multi-view embedding in a common latent space.