Darknet marketplaces, or cryptomarkets, are Tor hidden services where users trade illicit drugs and a myriad of illegal goods. During the past few years, law enforcement agencies have started to study the networks of darknet vendors in an attempt to link them to their real world identities. Nevertheless, cryptomarket vendors usually use multiple accounts, which render it extremely difficult to uncover their identities.

A recently published research paper proposes an approach that relies on stylometry (analysis of writing styles) and photo analytics in order to find links between multiple accounts of the same vendors on darknet marketplaces. Throughout this article, we will overview the paper’s proposed approach and its effectiveness in identifying darknet marketplace vendors.

Stylometry analysis:

Stylometry analysis is a technique that aims at finding the author of anonymous text via means of analysis of its writing style. When darknet marketplaces are considered, vendors’ texts represent product descriptions authored by the vendor. Nevertheless, there are several challenges associated with applying stylometry analysis techniques to darknet markets. Firstly, most product descriptions are very short. For instance, the median length of product descriptions on Agora’s marketplace was only 118 words. Moreover, product descriptions are often created via following special templates, and vendors often use similar descriptions for multiple product listings. Also, most darknet marketplaces are global marketplaces where vendors often use multiple languages. All these challenges render it difficult to identify vendors’ unique writing styles.

To use stylometry analysis to identify darknet vendors, authors of the paper extracted a list of characteristics to model the unique writing styles of vendors. These characteristics included the percentage of words starting with an uppercase letter, the average word length, the overall percentage of uppercase letters, stop-word frequency, punctuation frequency, histogram of word length, part of speech unigram/bigram/trigram, character unigram/bigram/trigram, and digit unigram/bigram/trigram.

The NLTK library was used to perform sentence and word tokenization. The Stanford Log Linear Part of Speech Tagger was applied in order to obtain features of parts of speech. Due to feature vector’s high dimensionality (around 100k), dimension reduction was also performed via means of stochastic singular value decomposition to lower the feature vector size down to 1000.

In order to create ground truth data based on stylometry analysis, vendors with product descriptions including more than 2 ×Tr ′ words were split. Vendors whose descriptions are longer than Tr ′ words were added as the training set’s distractors. Two versions for the ground truth datasets were created – one that considers all product descriptions (one description for each product listing) and includes duplicated sentences, and the other dataset omits duplicated sentences. The non-duplicated dataset aims at forcing the classifiers to identify the writing style rather than matching identical sentences. Only English text was considered, and HTML entities and Unicode symbols were removed.

Results are shown in Table (1), proving that stylometry analysis can yield high levels of accuracy when duplicated sentences are included (0.936-0.990). Nevertheless, when duplicated sentences are omitted, the accuracy drops considerably to 0.580-0.846. This significant decline in accuracy denotes that the previous high levels of accuracy are mostly secondary to the matching of duplicated sentences, rather than due to the extraction of the unique writing styles.

Table (1): Accuracy of using stylometry analysis to match ground-truth vendors

These results prove that the same approach that worked well in darknet forums in previous studies has obvious limitation when used on darknet marketplaces. This is mostly because vendors usually use templates to create product descriptions.

Analyzing product photos:

Authors of the paper introduce a novel approach that can link multiple accounts in darknet marketplaces via means of analysis of product photos. The goal of the analysis is to establish reliable fingerprints in order to identify darknet marketplace vendors on the basis of their photos within the same marketplace or across various marketplace platforms. The concept is inspired by the fact that darknet vendors usually take photos of their products to prove that they actually possess the items they are selling. Such photos can reveal personal photography styles. To construct precise fingerprints, authors of the paper formulated a special system where a set of deep neural networks were utilized to harvest distinct features automatically from vendors’ photos. Furthermore, to fingerprint vendors who post a small number of photos, transfer learning was applied in order to train the deep neural network using large datasets of generic images and then fine tune the model using the vendor’s own photos.

The proposed system was evaluated using datasets from three major darknet marketplaces (SilkRoad2, Evolution, and Agora), which included 7,641 vendors, as well as 197,682 photos. First, a ground truth evaluation was conducted via dividing a vendor’s photo into two random segments and testing how the system can accurately link the two parts back together. The study’s best performing model yielded an accuracy of around 97.5% across all three marketplaces. Moreover, the approach was compared with the used stylometry analysis method that modeled the writing styles of vendors on the basis of their product descriptions. It was proven that the image based model excelled in the accuracy of classification, as well as the coverage of vendors that could be fingerprinted.

Testing the system in the wild:

To examine the effectiveness of the proposed model, authors of the paper applied their method to identify previously unrecognized Sybil accounts on the live network (i.e. in the wild). Using external evidence and manual examination, authors of the paper prove that their proposed system managed to identify 715 sybil account pairs across various marketplaces, as well as 23 sybil accounts within the same marketplaces. Further case studies show insights into the collaborative activities of sybil accounts, that range from manipulating prices and scamming buyers, to product reselling and stocking, and plagiarizing photos. For instance, authors of the paper managed to identify vendors on SilkRoad2 and Evolution who use Sybil accounts solely for selling a small number of products at very low prices. Some of these sybil vendors were proven to have scammed customers as shown by external evidence. Moreover, the identified sybil pairs also unmasked the relationship between some vendors (e.g. retailers and suppliers) which aids in the identification of the market’s stakeholders.