Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs. Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.

Are open source projects governed by rich clubs?

we analyze open source projects to determine whether they exhibit a rich-club behavior, that is a phenomenon where contributors with a high number of collaborations are likely to cooperate with other well-connected individuals. The presence or absence of a rich-club has an impact on the sustainability and robustness of the project. We build and study a dataset with the 100 most popular projects in GitHub.

Data Cleaning for Knowledge Extraction and Understanding on Social Media

  Social media platforms let users share their opinions through textual or multimedia content. In many settings, this becomes a valuable source of knowledge that can be exploited for specific business objectives. Brands and companies often ask to monitor social media as sources for understanding the stance, opinion, and sentiment of their customers, audience and … Continue reading Data Cleaning for Knowledge Extraction and Understanding on Social Media

IEEE Big Data Conference 2017: take home messages from the keynote speakers

I collected here the list of my write-ups of the first three keynote speeches of the conference: Human in the Loop Machine Learning (Carla E. Brodley, Northeastern Univ.) Enhancing Human Perception via Text Mining and IR (Cheng Zhai, Univ. Illinois) Graph Representation Learning (Jure Leskovec, Stanford and Pinterest)

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information through Unsupervised Learning

Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage "better" driving behaviour through immediate feedback while driving, or by scaling auto insurance … Continue reading Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information through Unsupervised Learning

A Curated List of WWW 2017 Papers for Data Science and Web Science

This year the WWW conference 2017 is definitely focusing a lot of emphasis on Web Science and Data Science. I'm recording here a list of papers I found interesting at the conference, related to this topic. Disclaimer: the list may be incomplete, as I did not go through all the papers. So in case you want … Continue reading A Curated List of WWW 2017 Papers for Data Science and Web Science

Myths and Challenges in Knowledge Extraction and Big Data Analysis

The knowledge we may try to extract from human-generated content, IoT and Web sources can be dispersed, informal, contradicting, unsubstantiated and ephemeral today, while already tomorrow it may be commonly accepted.

The challenge is to capture and create consolidated knowledge that is new, has not been formalized yet in existing knowledge bases, and is buried inside a big, moving target (the live stream of online data).

The myth is that existing tools (spanning fields like semantic web, machine learning, statistics, NLP, and so on) suffice to the objective.

I explore the problem that one can face along this path.