Weblogs represent the navigation activity generated by a specific amount of users on a given website. This type of data is fundamental because it contains information on the behaviour of users and how they interface with the company’s product itself (website or application). If a company could have a realistic weblog before the release of its product, it would have a significant advantage because it can use the techniques explained above to see the less navigated web pages or those to put in the foreground.

A large audience of users and typically a long time frame are needed to produce sensible and useful log data, making it an expensive task.
To address this limit, we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs .
Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.
The generation has been implemented using deep learning methods for generating more realistic navigation activities, namely
- Recurrent Neural Network, which are very well suited to temporally evolving data
- Generative Adversarial Network: neural networks aimed at generating new data, such as images or text, very similar to the original ones and sometimes indistinguishable from them, that have become increasingly popular in recent years.
We run experiments using open data sets of weblogs as training, and we run tests for assessing the performance of the methods. Results in generating new weblog data are quite good, as reported in this summary table, with respect to the two evaluation metrics adopted (BLEU and Human evaluation).
Comparison of performance of baseline statistical approach, RNN and GAN for generating realistic web logs. Evaluation is done using human assessments and BLEU metrics
Our study is described in detail in the paper published at ICWE 2020 – International Conference on Web Engineering with DOI: 10.1007/978-3-030-50578-3. It’s available online on the Springer Web site. and can be cited as:
Pavanetto S., Brambilla M. (2020) Generation of Realistic Navigation Paths for Web Site Testing Using Recurrent Neural Networks and Generative Adversarial Neural Networks. In: Bielikova M., Mikkonen T., Pautasso C. (eds) Web Engineering. ICWE 2020. Lecture Notes in Computer Science, vol 12128. Springer, Cham
The slides are online too:
[slideshare id=235415400&doc=silvio-pavanetto-presentation-short-no-audio-200611141557]
Together with a short presentation video: