The University of Southampton
University of Southampton Institutional Repository

Detection method for randomly generated user IDs: lift the curse of dimensionality

Detection method for randomly generated user IDs: lift the curse of dimensionality
Detection method for randomly generated user IDs: lift the curse of dimensionality
Internet services are essential to our daily life in these days, and user accounts are usually required for downloading or browsing for multimedia contents from service providers such as Yahoo, Google, YouTube and so on. Attackers who perform malicious actions against these services use fake user accounts to hide their identity, or use them to continue malicious actions even after being caught by the service’s detection system. Using a random string generation algorithm for user identification (ID) string is one of the common method to create and obtain a large number of fake user accounts. To detect IDs and to defend against such attacks, some researchers have proposed the models that detect randomly generated IDs. Among these detection models, the n-gram-based using term frequency-inverse document frequency model is regarded as a state-of-the-art model to detect randomly generated IDs, but n-gram-based approaches have the problem of the curse of dimensionality because the sparsity of feature vector increases exponentially with the increase of size n. As a result, the improvement of the detection accuracy is limited since size n cannot be increased. This paper proposes two methods to detect randomly generated IDs more accurately. The first is to avoid the curse of dimensionality with the compression of feature dimension size. The second is a technique to reduce false positives by using pattern matching and Bhattacharyya distance. We tested our method with about 3 million normal user IDs collected from the real portal service, 1 million IDs generated by a random string generation algorithm, and 8,541 IDs found after being used for malicious behavior in real portal services. The experimental results showed that the proposed method can improve detection accuracy as well as inference performance.
Authentication, Computer crime, Identity management systems, Web sites
2169-3536
86020-86028
Ro, Inwoo
4c435924-7011-4a84-ab7f-1556e8cfb882
Kang, Boojoong
cfccdccd-f57f-448e-9f3c-1c51134c48dd
Seo, Choonghyun
debba14d-4399-4494-a41a-4dcb6e97e772
Im, Eul Gyu
d2038638-8034-4dfc-bb6d-01cd070cc6a7
Ro, Inwoo
4c435924-7011-4a84-ab7f-1556e8cfb882
Kang, Boojoong
cfccdccd-f57f-448e-9f3c-1c51134c48dd
Seo, Choonghyun
debba14d-4399-4494-a41a-4dcb6e97e772
Im, Eul Gyu
d2038638-8034-4dfc-bb6d-01cd070cc6a7

Ro, Inwoo, Kang, Boojoong, Seo, Choonghyun and Im, Eul Gyu (2022) Detection method for randomly generated user IDs: lift the curse of dimensionality. IEEE Access, 10, 86020-86028. (doi:10.1109/ACCESS.2022.3198687).

Record type: Article

Abstract

Internet services are essential to our daily life in these days, and user accounts are usually required for downloading or browsing for multimedia contents from service providers such as Yahoo, Google, YouTube and so on. Attackers who perform malicious actions against these services use fake user accounts to hide their identity, or use them to continue malicious actions even after being caught by the service’s detection system. Using a random string generation algorithm for user identification (ID) string is one of the common method to create and obtain a large number of fake user accounts. To detect IDs and to defend against such attacks, some researchers have proposed the models that detect randomly generated IDs. Among these detection models, the n-gram-based using term frequency-inverse document frequency model is regarded as a state-of-the-art model to detect randomly generated IDs, but n-gram-based approaches have the problem of the curse of dimensionality because the sparsity of feature vector increases exponentially with the increase of size n. As a result, the improvement of the detection accuracy is limited since size n cannot be increased. This paper proposes two methods to detect randomly generated IDs more accurately. The first is to avoid the curse of dimensionality with the compression of feature dimension size. The second is a technique to reduce false positives by using pattern matching and Bhattacharyya distance. We tested our method with about 3 million normal user IDs collected from the real portal service, 1 million IDs generated by a random string generation algorithm, and 8,541 IDs found after being used for malicious behavior in real portal services. The experimental results showed that the proposed method can improve detection accuracy as well as inference performance.

Text
Detection_Method_for_Randomly_Generated_User_IDs_Lift_the_Curse_of_Dimensionality - Version of Record
Available under License Creative Commons Attribution.
Download (5MB)

More information

e-pub ahead of print date: 16 August 2022
Published date: 16 August 2022
Additional Information: Publisher Copyright: © 2013 IEEE.
Keywords: Authentication, Computer crime, Identity management systems, Web sites

Identifiers

Local EPrints ID: 469986
URI: http://eprints.soton.ac.uk/id/eprint/469986
ISSN: 2169-3536
PURE UUID: 2cf3de6a-3295-4eda-856f-53e67a1422a8
ORCID for Boojoong Kang: ORCID iD orcid.org/0000-0001-5984-9867

Catalogue record

Date deposited: 29 Sep 2022 16:48
Last modified: 17 Mar 2024 04:05

Export record

Altmetrics

Contributors

Author: Inwoo Ro
Author: Boojoong Kang ORCID iD
Author: Choonghyun Seo
Author: Eul Gyu Im

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×