Polymorphic code: Difference between revisions

Content deleted Content added
Restored revision 1219374385 by A Shortfall Of Gravitas (talk): Unsourced sample
 
(33 intermediate revisions by 28 users not shown)
Line 1:
{{Short description|Self-modifying program code designed to defeat anti-virus programs or reverse engineering}}
{{distinguish|Polymorphism (computer science)}}
{{refimprove|date=November 2010}}
In computer terminologycomputing, '''polymorphic code''' is code that uses a [[polymorphic engine]] to mutate while keeping the original [[algorithm]] intact. That- that is, the ''code'' changes itself eachevery time it runs, but the ''function'' of the code (its [[semantics]]) willstays not changethe at allsame. For example, 1+the simple math expressions 3+1 and 6-2 both achieve the same result, whileyet usingrun with different values[[machine andcode]] operationsin a [[Central processing unit|CPU]]. This technique is sometimes used by [[computer virus]]es, [[shellcode]]s and [[computer worm]]s to hide their presence.<ref name="rugha">{{cite thesis |last=Raghunathan, |first=Srinivasan (|date=2007). ''|title=Protecting anti-virus software under viral attacks''. |type=M.Sc. Thesis, |publisher=Arizona State University |citeseerx=10.[https://pdfs1.semanticscholar1.org/0676/da6041ea51a3e8d80c597e503680e925aed093.pdf]796}}</ref>
 
[[Encryption]] is the most common method to hide code. With encryption, the main body of the code (also called its [[Payload (computing)|payload]]) is encrypted and will appear meaningless. For the code to function as before, a decryption function is added to the code. When the code is ''executed'', this function reads the payload and decrypts it before executing it in turn.
 
Encryption alone is not polymorphism. To gain polymorphic behavior, the encryptor/decryptor pair areis mutated with each copy of the code. This allows different versions of some code which all function the same.<ref name="wongstamp">{{cite journal |last1=Wong, |first1=Wing; |last2=Stamp, |first2=M. (2006). ''|title=Hunting for Metamorphic Engines''. |journal=Journal in Computer Virology. Department|volume=2 of|issue= Computer3|pages=211–229 Science, San|date=2006 Jose State University|doi=10. [http:/1007/wwws11416-006-0028-7 |citeseerx=10.truststc1.org/pubs/237/hunting1.pdf]108.3878|s2cid=8116065 }}</ref>
 
== Malicious code ==
Line 13 ⟶ 14:
Malicious [[programmer]]s have sought to protect their encrypted code from this virus-scanning strategy by rewriting the unencrypted decryption engine (and the resulting encrypted payload) each time the virus or worm is propagated. Anti-virus software uses sophisticated pattern analysis to find underlying patterns within the different mutations of the decryption engine, in hopes of reliably detecting such [[malware]].
 
Emulation may be used to defeat polymorphic obfuscation by letting the malware demangle itself in a virtual environment before utilisingutilizing other methods, such as traditional signature scanning. Such a virtual environment is sometimes called a [[Sandbox (computer security)|sandbox]]. Polymorphism does not protect the virus against such emulation if the decrypted payload remains the same regardless of variation in the decryption algorithm. [[Metamorphic code]] techniques may be used to complicate detection further, as the virus may execute without ever having identifiable code blocks in memory that remainremains constant from infection to infection.
 
The first known polymorphic virus was written by Mark Washburn. The virus, called [[1260 (computer virus)|1260]], was written in 1990. A better-known polymorphic virus was created in 1992 by the hacker [[Dark Avenger]] (a [[pseudonym]]) as a means of avoiding pattern recognition from antivirus software. A common and very virulent polymorphic virus is the file infecter [[Virut]].
 
== Example ==
This example is not really a polymorphic code but will serve as an introduction to the world of encryption via the [[xor|XOR]] operator.
For example, in an algorithm using the variables A and B but not the variable C, there could be a large amount of code that changes C, and it would have no effect on the algorithm itself, allowing it to be changed endlessly and without heed as to what the final product will be.
 
lots of encrypted code
...
Decryption_Code:
C = C + 1
A = Encrypted
Loop:
B = *A
C = 3214 * A
B = B XOR CryptoKey
*A = B
C = 1
C = A + B
A = A + 1
GOTO Loop IF NOT A = Decryption_Code
C = C^2
GOTO Encrypted
CryptoKey:
some_random_number
 
The encrypted code is the payload. To make different versions of the code, in each copy the garbage lines which manipulate C will change. The code inside "Encrypted" ("lots of encrypted code") can search the code between Decryption_Code and CryptoKey and each algorithm for new code that does the same thing. Usually the coder uses a zero key (for example; A [[xor]] 0 = A) for the first generation of the virus, making it easier for the coder because with this key the code is not encrypted. The coder then implements an incremental key algorithm or a random one.
 
==- Polymorphic encryption ==
Polymorphic code can be also used to generate encryption algorithm. This code was generated by the online service StringEncrypt.<ref name="stringencrypt">[https://www.stringencrypt.com Wójcik, Bartosz (2015). ''String & File Encryption'']</ref> It takes the string or a file content and encrypts it with random encryption commands and generates polymorphic decryption code in one of the many supported programming languages:
<source lang="cpp">
// encrypted with https://www.stringencrypt.com (v1.1.0) [C/C++]
// szLabel = "Wikipedia"
wchar_t szLabel[10] = { 0xB1A8, 0xB12E, 0xB0B4, 0xB03C, 0x33B9, 0xB30C, 0x3295, 0xB260,
0xB5E5, 0x35A2 };
for (unsigned int tUTuj = 0, KRspk = 0; tUTuj < 10; tUTuj++)
{
KRspk = szLabel[tUTuj];
KRspk ^= 0x2622;
KRspk = ~KRspk;
KRspk --;
KRspk += tUTuj;
KRspk = (((KRspk & 0xFFFF) >> 3) | (KRspk << 13)) & 0xFFFF;
KRspk += tUTuj;
KRspk --;
KRspk = ((KRspk << 8) | ( (KRspk & 0xFFFF) >> 8)) & 0xFFFF;
KRspk ^= 0xE702;
KRspk = ((KRspk << 4) | ( (KRspk & 0xFFFF) >> 12)) & 0xFFFF;
KRspk ^= tUTuj;
KRspk ++;
KRspk = (((KRspk & 0xFFFF) >> 8) | (KRspk << 8)) & 0xFFFF;
KRspk = ~KRspk;
szLabel[tUTuj] = KRspk;
}
wprintf(szLabel);
</source>
As you can see in this C++ example, the string was encrypted and each character was stored in encrypted form using [[UNICODE]] widechar format. Different encryption commands were used like bitwise [[XOR]], [[Bitwise NOT|NOT]], addition, subtraction, bit rotations. Everything is randomized, encryption keys, bit rotation counters and encryption commands order as well. Output code can be generated in [[C/C++]], [[C Sharp (programming language)|C#]], [[Java (programming language)|Java]], [[JavaScript]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], [[Haskell (programming language)|Haskell]], [[MASM]], [[FASM]] and [[AutoIt]]. Thanks to the randomization the generated algorithm is different every time. It's not possible to write generic decryption tools and the compiled code with polymorphic encryption code has to be analyzed each time it's re-encrypted.
 
== See also ==
* [[Timeline of notable computer viruses and worms]]
* [[Metamorphic code]]
* [[Self-modifying code]]
* [[Alphanumeric codeshellcode]]
* [[Shellcode]]
* [[Software cracking]]
* [[Security cracking]]
* [[Obfuscated code]]
* [[Oligomorphic code]]
Line 87 ⟶ 28:
== References ==
<references/>
{{refbegin}}
*{{cite journal |author-link= |last=Spinellis, |first=Diomidis; [|url=http://www.spinellis.gr/pubs/jrnl/2002-ieeetit-npvirus/html/npvirus.html ''|title=Reliable identification of bounded-length viruses is NP-complete''], |journal=IEEE Transactions on Information Theory, |volume=49( |issue=1):280–284, |pages=280–4 |date=January 2003. {{doi|doi=10.1109/TIT.2002.806137}}
{{refend}}
 
[[Category:Types of malware]]