Online services are increasingly dependent on user participation. Whether it is online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this project, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users’ click events), and visualize the detected behaviors in an intuitive manner. Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity). The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors.
This demo presents the clustering result on a large-scale clickstream traces from an anonymous social network, Whisper. Our system effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters. In addition, we have successfully applied clickstream-based behavior model to detect new attacks in real-world online social networks including Renren and LinkedIn.
The project source code is available for download. This zip file contains a set of scripts that perform recursive hierarchical clustering on clickstream data, and generate clusters of user behaviors.
For details about input/output format, and system configurations, please refer to the documentation. The algorithm itself is detailed in our paper.
A quick example is shown as follows.
$> python recursiveHierarchicalCustering.py input.txt output/
user_id \t A(1)G(10)
A
and G
are action patterns, and 1
and 10
represent how many times the respective pattern appears in the user's clickstream.
output/result.json
will be the output file for the clustering results.We are a research team from the Department of Computer Science in Univ of Chicago. If you have any questions, please don't hesitate to contact us.