Our research o Hierarchical Transformers for User Semantic Similarity has been presented at ICWE 2023.
We discuss the use of hierarchical transformers for user semantic similarity in the context of analyzing users’ behavior and profiling social media users. The objectives of the research include finding the best model for computing semantic user similarity, exploring the use of transformer-based models, and evaluating whether the embeddings reflect the desired similarity concept and can be used for other tasks.
The full paper is published online by Springer in the official Conference Proceedings at this url:
https://link.springer.com/chapter/10.1007/978-3-031-34444-2_11
This work aims to compute accurate user similarities on Twitter just using the textual content shared by users, a feature known to be easy and quick to collect. We design and train a 2-stages hierarchical Transformer-based model, whose first stage independently elaborates single tweets, and its second stage combines the embeddings of the tweets to obtain user-level representations. To evaluate our model we design a ranking task involving many accounts, automatically collected and labeled without the need for human annotators. We extensively investigate hyper-parameters to obtain the best model configuration.
The slides about the work are available here:
We use a large dataset of Twitter users and apply an automatic labeling approach. The dataset consists of English tweets posted in November and December 2020, totaling about 27GB of compressed data. Preprocessing steps include filtering out short texts, cleaning user connections, and selecting a benchmark set of users for evaluation.
The models used in the study include hierarchical transformers, and the tweet embeddings are obtained using four Transformer-based models: RoBERTa2, BERTweet3, Sentence BERT4, and Twitter4SSE5. We test different techniques for processing tweet embeddings to generate accurate user embeddings, including mean pooling, recurrence over BERT (RoBERT), and transformer over BERT (ToBERT).
Since Transformer architectures are known to work well on short text, we cannot use them on extensive collections of tweets describing the activity of a user. Therefore, we propose a hierarchical structure of transformer models to be used as shown in this schema:

The evaluation of the models is done on a set of 5,000 users, comparing user similarities with 30 other candidate users, 5 of which are considered similar and 25 considered dissimilar. The evaluation metrics used include mean average precision (MAP), mean reciprocal rank (MRR) at 10, and normalized discounted cumulative gain (nDCG).
The optimization process involves selecting a loss function and using the AdamW optimizer with specific hyperparameters.
Hierarchical Transformers for User Semantic Similarity
We also check whether the obtained embeddings reflect our idea of similarity by testing them on further tasks, including community visualization, outlier detection, and polarization quantification.
The results show that the hierarchical approach with a Stage-1 Twitter4SSE model and a Stage-2 Transformer model performs the best among the alternatives.
In conclusion, the research provides a large unbiased dataset for user similarity analysis, presents a hierarchical language model optimized for accurate user similarity computation, and validates the models’ performance on similarity tasks, with potential applications to related problems.
The future work includes investigating the impact of time and topic drift on the models’ performance.
The paper can be cited as:
Marco Di Giovanni, Marco Brambilla (2023). Hierarchical Transformers for User Semantic Similarity. In: Garrigós, I., Murillo Rodríguez, J.M., Wimmer, M. (eds) Web Engineering. ICWE 2023. Lecture Notes in Computer Science, vol 13893. Springer, Cham. https://doi.org/10.1007/978-3-031-34444-2_11
