Sviluppo e test di software di classificazione di immagini

Tosin, Elia (2022) Sviluppo e test di software di classificazione di immagini. Bachelor thesis, Scuola universitaria professionale della Svizzera italiana.

Text
DOC_TOSIN.pdf
Download (1MB)

Abstract

Questo lavoro ha come obbiettivo la realizzazione di un sistema di riconoscimento dei loghi delle automobili. Il logo rappresenta il produttore univocamente, rendendolo un punto di riferimento per identificare un veicolo. Per fare ciò useremo una rete neurale, ossia un insieme complesso di funzioni matematiche che processando dei dati riescono ad elaborarli e produrre dei risultati. In questo caso andremo a realizzare il classificatore tramite l’utilizzo dell’architettura definita come rete neurale convolutiva, particolarmente adatta per questo compito. Come architettura standard ho utilizzato una ResNet18 e le immagini per l’allenamento sono state estratte da un dataset pubblico, contenente 8 diversi costruttori. A queste immagini ‘standard’ sono state aggiunte delle varianti delle stesse tramite il processo di augmentation. Riguardo le modifiche sono state applicate al fine di rendere le immagini di partenza più realistiche per un’inquadratura generica, supponendo che ci possano essere rotazioni, che la luminosità non sia perfetta o che il logo non sia centrato. Inoltre sono state applicate altre augmentation come rumore e specchiamento al fine di rendere la rete più robusta ad eventuali errori. Dopo un allenamento di 25 epoche, abbiamo eseguito dei test sul dataset producendo dei risultati in termini di accuratezza. Queste accuratezze sono state confrontate con architetture diverse (LeNet 5) o con varianti della stessa ResNet, scegliendo il miglior modello generato che ha ottenuto un’accuratezza media del 55%. Con questo modello costruito, sono stati effettuati dei test realistici processando dei video registrati o in tempo reale, andando ad inquadrare i loghi e producendo a video un grafico a barre che indichi la scelta effettuata, ottenendo dei buoni risultati. Durante questi test i riflessi hanno dato particolarmente fastidio nell’effettuare la predizione, quindi si potrebbe pensare di aggiungere immagini con riflessi al dataset o di produrne di nuove con augmentation adatte, modificando la luminosità e simulando questo riflesso colorando il logo. Inoltre si potrebbe pensare di rendere più inclusivo il dataset, aggiungendo altri costruttori per non essere limitati solamente a questi otto marchi. -- This work aims to develop a system for recognizing car logos. The logo uniquely represents the manufacturer, making it a reference point for identifying a vehicle. We will use a neural network, i.e., a complex set of mathematical functions that process data and produce results. In this case, we will build the classifier using the architecture defined as a convolutional neural network, which is particularly suitable for this task. As a standard architecture, I used a ResNet18, and the images for training were extracted from a public dataset containing eight different constructors. Variants of these 'standard' images were added to them via the augmentation process. Modifications were applied in order to make the starting images more realistic for a generic shot, assuming that there could be rotations, that the brightness was not or that the logo was not centered. In addition, other augmentations such as noise and mirroring were applied to make the network more robust to possible errors. After training 25 epochs, we ran tests on the dataset measuring accuracy scores. These accuracy values were compared with different architectures (LeNet 5) or variants of the same ResNet, with the best model generated being chosen, which achieved an average accuracy of 55%. With this model built, realistic tests were carried out by processing recorded or real-time videos, framing the logos, and producing a bar graph on the screen indicating the choice made, with good results. During these tests, reflections were particularly annoying in making the prediction, so one might think of adding images with reflections to the dataset or producing new ones with suitable augmentation, modifying the brightness, and simulating this reflection by coloring the logo. You could also consider making the dataset more inclusive by adding other manufacturers so that you are not limited to just these eight brands.

Item Type:	Thesis (Bachelor)
Corso:	UNSPECIFIED
Supervisors:	Giusti, Alessandro and Chavez-Garcia, Ricardo Omar
Subjects:	Informatica
Divisions:	Dipartimento tecnologie innovative > Bachelor in Ingegneria informatica
URI:	http://tesi.supsi.ch/id/eprint/4613

Actions (login required)

View Item