Opportunities Abroad

Thesis Proposals

Network Trace Compression for ML-based Traffic Classification

Knowing domain names associated with traffic allows eavesdroppers to profile users without accessing packet payloads. Encrypting domain names transiting the network is, therefore, a key step to increase network confidentiality. The latest efforts include encrypting the TLS Server Name Indication (Encrypted Client Hello extension) and encrypting DNS traffic, with DNS over HTTPS (DoH) representing a prominent proposal.

Nevertheless, recent work shows that, by using simplistic features and off-the-shelf machine learning models, the network administrator or even an attacker can uncover the domain names of users relying on eSNI or DoH. The biggest challenge, however, is the storage and processing of large-scale network traces, as their volume can easily account for several GB per day even for a middle-sized organization.

The goal of the thesis is to design, implement, and evaluate compression techniques which can reduce the size of network traces, while at the same time preserving good accuracy in the domain classification task. The thesis will leverage operational per-TCP connection log files, including rich features such as packet size and timing. Lossy techniques based on quantization will be explored as a first choice, but more advanced approaches based on NN models can be considered.



Project details: part of the COMPACT PRIN project and in collaboration with the PoliMI AntLab Research Group

Clustering webpages for realistic experiments on the Internet 

Experimenting networked systems is fundamental for the development of novel techniques, assessing the impact of design choices and improve users' Quality of Experience. Testing the Web is typically done using lists of popular websites -- e.g., the Alexa rank (https://www.alexa.com/topsites), which however only offer a list of homepages of the target websites. This is a strong limitation, as websites are known to have a diverse webpage structure depending for example, on the subsections in which content is organized. The goal of this thesis is to develop a system able to select a subset of the pages of a website so that they are representative of the diversity of the internal structure. To this end, it is necessary to leverage Data Science and Machine Learning techniques, clustering among all, to group together similar pages and choose the right (and right number of) representatives. Using open datasets, and collecting additional if needed, the student will apply Machine Learning tools to achieve this goal, using Big Data approaches if the size of the dataset becomes large.

Thesis Fast Track

A Bachelor Fast Track Thesis consists in producing an extended summary of a scientific article published in high-quality international conferences or journals. A description of what a Fast Track Thesis means can be found here.

The available articles are: