Opportunities Abroad
Impact of novel network protocols (QUIC, MASQUE and HTTP/3): @École Normale Supérieure de Lyon, more details here
Other topics include:
Performance Evaluation of Admission Policies for Edge Compute Systems
Stressing systems security through on-the-fly network traffic generation using generative models
Measurements of the impact on performance of Apple Private Relay on mobile users
Video streaming quality inference at 100gbps using subsets of flows
Towards Scalable and Robust Solutions for Complex Distributed Systems: Machine Learning, Coordination Semantics, and Infrastructure Challenges: @TU Wien, more details here and here
Other topics include:
ML for predictive/Proactive scaling in the cloud
Edge ML
Serverless Computing
Security for distributed systems
Fairness, accountability and transparency in ML
Novel Algorithms for Full-Stack Observability: @Cisco Systems, more details here
Applications of Large Language Models to solve complex reasoning tasks: @NEC Labs (Heidelberg), more details here and here
Other topics include:
Development of advanced interfaces for cyber threat intelligence leveraging generative AI and advanced machine learning techniques for data visualisation and analysis
Application of AI agent technologies to the investigation of cybersecurity incidents and intelligence or employment of AI Agents for red team and penetration testing activities
Development of algorithmic and optimisation techniques for machine learning frameworks, such as PyTorch
Thesis Proposals
Network Trace Compression for ML-based Traffic Classification
Knowing domain names associated with traffic allows eavesdroppers to profile users without accessing packet payloads. Encrypting domain names transiting the network is, therefore, a key step to increase network confidentiality. The latest efforts include encrypting the TLS Server Name Indication (Encrypted Client Hello extension) and encrypting DNS traffic, with DNS over HTTPS (DoH) representing a prominent proposal.
Nevertheless, recent work shows that, by using simplistic features and off-the-shelf machine learning models, the network administrator or even an attacker can uncover the domain names of users relying on eSNI or DoH. The biggest challenge, however, is the storage and processing of large-scale network traces, as their volume can easily account for several GB per day even for a middle-sized organization.
The goal of the thesis is to design, implement, and evaluate compression techniques which can reduce the size of network traces, while at the same time preserving good accuracy in the domain classification task. The thesis will leverage operational per-TCP connection log files, including rich features such as packet size and timing. Lossy techniques based on quantization will be explored as a first choice, but more advanced approaches based on NN models can be considered.
Prerequisites:
Python programming
Machine learning essentials
Internet Protocol Stack
References:
Trevisan, M., Soro, F., Mellia, M., Drago, I., & Morla, R. (2020). Does domain name encryption increase users' privacy?. ACM SIGCOMM Computer Communication Review, 50(3), 16-22.
Trevisan, M., Soro, F., Mellia, M., Drago, I., & Morla, R. (2023). Attacking DoH and ECH: Does Server Name Encryption Protect Users’ Privacy?. ACM Transactions on Internet Technology, 23(1), 1-22.
Project details: part of the COMPACT PRIN project and in collaboration with the PoliMI AntLab Research Group
Clustering webpages for realistic experiments on the Internet
Experimenting networked systems is fundamental for the development of novel techniques, assessing the impact of design choices and improve users' Quality of Experience. Testing the Web is typically done using lists of popular websites -- e.g., the Alexa rank (https://www.alexa.com/topsites), which however only offer a list of homepages of the target websites. This is a strong limitation, as websites are known to have a diverse webpage structure depending for example, on the subsections in which content is organized. The goal of this thesis is to develop a system able to select a subset of the pages of a website so that they are representative of the diversity of the internal structure. To this end, it is necessary to leverage Data Science and Machine Learning techniques, clustering among all, to group together similar pages and choose the right (and right number of) representatives. Using open datasets, and collecting additional if needed, the student will apply Machine Learning tools to achieve this goal, using Big Data approaches if the size of the dataset becomes large.
Thesis Fast Track
A Bachelor Fast Track Thesis consists in producing an extended summary of a scientific article published in high-quality international conferences or journals. A description of what a Fast Track Thesis means can be found here.
The available articles are:
Griffioen, H., Koursiounis, G., Smaragdakis, G., & Doerr, C. (2024). Have you SYN me? Characterizing Ten Years of Internet Scanning. In 2024 ACM Internet Measurement Conference.
Dong, H., Zhang, Y., Lee, H., Huque, S., & Sun, Y. (2024). Deciphering the Digital Veil: Exploring the Ecosystem of DNS HTTPS Resource Records. In 2024 ACM Internet Measurement Conference.
Sun, D., Chen, J. Q., Gong, C., Wang, T., & Li, Z. (2024). NetDPSyn: Synthesizing Network Traces under Differential Privacy. In 2024 ACM Internet Measurement Conference.
Browsing without Third-Party Cookies: What Do You See? M. Lin, S. Lin, H. Wu, K. Wang, X. Yang.. In 2024 ACM Internet Measurement Conference.