DeltaShield: Information Theory for Human- Trafficking Detection
Abstract
1 Introduction
1.1 Application to the HT Domain
1.2 Application to Twitter Bot Detection
1.3 Our Method
2 Background and Related Work
2.1 HT Detection
2.2 Social Media Bot Detection
2.3 Document Embedding and Clustering
2.4 Multiple-Sequence Alignment
2.5 Minimum Description Length
3 Proposed Method: Theory
3.1 Intuition and Theory
Doc | Text |
---|---|
#1 | Hi gentlemen, Korea super model just arrived...Alma and Joan specially selected... |
#2 | Hi gentlemen, Korea super model just arrived...Paula and Miya specially selected... |
#3 | Hi gentlemen, Korean super model just arrived...Paula specially selected... |
Doc | Text |
---|---|
#4 | Gentlemen, Korea super model just arrived...Miya is specially selected... |
#5 | I made 30K working on this job - call 123-456.7890 or visit scam.com |
#6 | I made 30K working from home - call 123-456.7890 or visit fraud.com |
#7 | Hello, Anna here! My hours are... |
3.2 Data Compression and Summarization
3.2.1 Template Encoding.
3.2.2 Alignment Encoding.
Doc | Temp. | Slots | Ins. | Del. | Sub. |
---|---|---|---|---|---|
#1 | \(T_1\) | {“Alma and Joan”} | |||
#2 | \(T_1\) | {“Paula and Miya”} | |||
#3 | \(T_1\) | {“Paula”} | 3: “Korean” | ||
#4 | \(T_1\) | {“Miya”} | 1 | ||
#5 | \(T_2\) | {“on this job”, “scam.com”} | |||
#6 | \(T_2\) | {“from home”, “fraud.com”} | |||
#7 | N/A | “Happy birthday to my dear friend Mike” |
Symbol | Definition |
---|---|
N | Total number of documents in D |
t | Total number of templates |
V | Number of words in vocabulary |
\(T_i\) | i-th template |
\(l_i\) | Length of template \(T_i\) |
\(s_{i}\) | Number of slots in \(T_i\) |
\(\hat{l}_d\) | Alignment length of data d |
\(w_{d, j}\) | Number of words in the j-th slot in aligned data d |
\(e_{d}\) | Number of unmatched words in aligned data d |
\(u_{d}\) | Number of substituted/inserted words in aligned data d |
\({\left\lt n\right\gt }\) | \(\approx 2\lg {n}+1\): universal code length for a non-negative integer |
\(\lg (L)\) | \(=\log _2(L)\): code length for integer i (\(1 \le i \le L\)) |
3.2.3 Overall Encoding.
4 Proposed Method: Algorithms
4.1 InfoShield-coarse
4.1.1 Document Embeddings.
4.1.2 Clustering.
4.2 InfoShield-fine
4.2.1 Candidate Alignment.
4.2.2 Consensus Search.
4.2.3 Slot Detection.
4.2.4 Relative Length.
4.2.5 Overall Algorithm.
4.3 Complexity Analysis
5 Proposed Method: Incremental
5.1 DeltaShield-coarse
5.2 DeltaShield-fine
5.2.1 Preprocess.
5.2.2 Template Update.
6 Experiments
6.1 Description
6.1.1 Twitter Bot Data.
Dataset | Accounts | Tweets |
---|---|---|
Genuine accounts | 3,474 | 8,377,522 |
Social spambots #1 | 991 | 1,610,176 |
Social spambots #3 | 464 | 1,418,626 |
Test set #1 (spambots #1) | 1,982 | 4,061,598 |
Test set #2 (spambots #3) | 928 | 2,628,181 |
6.1.2 HT Data: Trafficking10k Dataset.
6.1.3 HT Data: Cluster Trafficking.
6.1.4 Baselines.
6.1.5 Metrics.
6.2 Results
6.3 Q1: Practical
6.4 Q2: Interpretable
6.4.1 Twitter Data.
6.4.2 HT Data.
6.4.3 Relative Length.
6.5 Q3: Robust
6.6 Q4: Incremental
6.6.1 DeltaShield-coarse .
6.6.2 DeltaShield-fine .
7 DISCUSSION AND DISCOVERIES: InfoShield AT WORK
8 Conclusion
Footnote
References
Index Terms
- DeltaShield: Information Theory for Human- Trafficking Detection
Recommendations
Using technology in human trafficking: international law perspective and reflections within Middle Eastern countries
Human trafficking represents a serious violation of human rights, dignity and freedom. Many states have attempted to develop effective policies to combat human trafficking. The UN, as well as the European Union and the European Council, strived to ...
Malware Detection by Static Checking and Dynamic Analysis of Executables
The advanced malware continue to be a challenge in digital world that signature-based detection techniques fail to conquer. The malware use many anti-detection techniques to mutate. Thus no virus scanner can claim complete malware detection even for ...
Comments
Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Funding Sources
- National Science Foundation Graduate Research Fellowship
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 1,364Total Downloads
- Downloads (Last 12 months)660
- Downloads (Last 6 weeks)73
Other Metrics
Citations
View Options
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderHTML Format
View this article in HTML Format.
HTML FormatLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in