Opportunities Abroad

Thesis Proposals

Compressed Representation of Darknet Traffic for Cybersecurity

Darknets, distinctive subnetworks on the Internet, function as passive observers by recording all received packets without hosting any devices. Defined by its unsolicited nature, the traffic they capture makes darknets akin to "network telescopes," offering insights into cybersecurity events like network scans and exploit attempts.

Handling the substantial volume of data collected by darknets poses significant challenges due to its size. Despite this, the data often exhibits patterns of repetition and similarity, suggesting a potential for effective compression or summarization.

The primary goal of this thesis is to design and implement advanced algorithms capable of extracting compressed representations from darknet traffic. Leveraging recent machine learning advancements, particularly Autoencoders, Generative Adversarial Networks, and Diffusion Models, the aim is to obtain compressed yet informative representations. These representations can be utilized for compression purposes or as generative modules.

The study will focus on diverse datasets obtained from medium-sized operational darknets deployed in different countries. These datasets, spanning multiple years, will be processed using Big Data techniques and infrastructures due to their size.

Clustering webpages for realistic experiments on the Internet 

Experimenting networked systems is fundamental for the development of novel techniques, assessing the impact of design choices and improve users' Quality of Experience. Testing the Web is typically done using lists of popular websites -- e.g., the Alexa rank (https://www.alexa.com/topsites), which however only offer a list of homepages of the target websites. This is a strong limitation, as websites are known to have a diverse webpage structure depending for example, on the subsections in which content is organized. The goal of this thesis is to develop a system able to select a subset of the pages of a website so that they are representative of the diversity of the internal structure. To this end, it is necessary to leverage Data Science and Machine Learning techniques, clustering among all, to group together similar pages and choose the right (and right number of) representatives. Using open datasets, and collecting additional if needed, the student will apply Machine Learning tools to achieve this goal, using Big Data approaches if the size of the dataset becomes large.

Analysis and Correlation of Behaviour on Online Social Networks

Online social networks, such as Instagram and Facebook, allow users to interact and debate with each other. In this research, the candidate will collect large quantities of data from social networks and from public repositories such as Wikidata. The data will be organized and analyzed using big data techniques (such as Pyspark). Then, the student will characterize the behaviour of different classes of users on the social network (e.g., nationality, activity, language, age, etc.). The student will analyze possible bias in the categories and the dynamic of the changes. The student will possibly use machine learning techniques, forecasting methods and graphs.

Thesis Fast Track

A Bachelor Fast Track Thesis consists in producing an extended summary of a scientific article published in high-quality international conferences or journals. A description of what a Fast Track Thesis means can be found here.

The available articles are: