we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs. Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.
Category: big data
Are open source projects governed by rich clubs?
we analyze open source projects to determine whether they exhibit a rich-club behavior, that is a phenomenon where contributors with a high number of collaborations are likely to cooperate with other well-connected individuals. The presence or absence of a rich-club has an impact on the sustainability and robustness of the project. We build and study a dataset with the 100 most popular projects in GitHub.
Brand Community Analysis using Graph Representation Learning on Social Networks – with a Fashion Case
We exploit the network that builds around the brands by encoding it into a graph model. We build a social network graph, considering user nodes and friendship relations; then we compare it with a heterogeneous graph model.
Possible Theses in Data Science
Here is a presentation that summarizes some of the relevant topics currently available for theses within the Data Science Lab under my supervision. [slideshare id=133615547&doc=brambilla-datascience-thesis-proposals-feb2019-190227221103] Feel free to get in touch in case you are interested.
Data Cleaning for Knowledge Extraction and Understanding on Social Media
Social media platforms let users share their opinions through textual or multimedia content. In many settings, this becomes a valuable source of knowledge that can be exploited for specific business objectives. Brands and companies often ask to monitor social media as sources for understanding the stance, opinion, and sentiment of their customers, audience and … Continue reading Data Cleaning for Knowledge Extraction and Understanding on Social Media
IEEE Big Data Conference 2017: take home messages from the keynote speakers
I collected here the list of my write-ups of the first three keynote speeches of the conference: Human in the Loop Machine Learning (Carla E. Brodley, Northeastern Univ.) Enhancing Human Perception via Text Mining and IR (Cheng Zhai, Univ. Illinois) Graph Representation Learning (Jure Leskovec, Stanford and Pinterest)
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information through Unsupervised Learning
Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage "better" driving behaviour through immediate feedback while driving, or by scaling auto insurance … Continue reading Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information through Unsupervised Learning
How Fashionable is Digital Data-Driven Fashion?
FaST – Fashion Sensing Technology - is a project meant to design, experiment with, and implement an ICT tool that could monitor and analyze the activity of Italian emerging Fashion brands on social media.
A Curated List of WWW 2017 Papers for Data Science and Web Science
This year the WWW conference 2017 is definitely focusing a lot of emphasis on Web Science and Data Science. I'm recording here a list of papers I found interesting at the conference, related to this topic. Disclaimer: the list may be incomplete, as I did not go through all the papers. So in case you want … Continue reading A Curated List of WWW 2017 Papers for Data Science and Web Science
Myths and Challenges in Knowledge Extraction and Big Data Analysis
The knowledge we may try to extract from human-generated content, IoT and Web sources can be dispersed, informal, contradicting, unsubstantiated and ephemeral today, while already tomorrow it may be commonly accepted.
The challenge is to capture and create consolidated knowledge that is new, has not been formalized yet in existing knowledge bases, and is buried inside a big, moving target (the live stream of online data).
The myth is that existing tools (spanning fields like semantic web, machine learning, statistics, NLP, and so on) suffice to the objective.
I explore the problem that one can face along this path.