Deflate 압축 알고리즘에서
악성코드 주입 취약점 분석

Journal of The Korea Institute of Information Security & Cryptology
VOL.32, NO.5, Oct. 2022 | ISSN 1598-3986

김정훈

요 약

본 연구를 통해 매우 대중적인 압축 알고리즘인 Deflate 알고리즘을 통해 생성되는 3가지 유형의 압축 데이터 블록 가운데 원본 데이터 없는 비 압축 블록(No-Payload Non-Compressed Block; NPNCB) 유형을 임의로 생성하여 정상적인 압축 블록 사이에 미리 설계된 공격 시나리오에 따라 삽입하는 방법을 통해 악의적 코드 또는 임의의 데이터를 은닉하는 취약점을 발견하였다.

비 압축 블록의 헤더에는 byte align을 위해서만 존재하는 데이터 영역이 존재하며, 본 연구에서는 이 영역을 DBA(Disposed Bit Area)라고 명명하였다. 이러한 DBA 영역에 다양한 악성 코드와 악의적 데이터를 숨길 수 있었으며, 실험을 통해 정상적인 압축 블록들 사이에 오염된 블록을 삽입했음에도 기존 상용 프로그램에서 정상적으로 경고 없이 압축 해제 되었고, 악의적 디코더로 해독하여 악성 코드를 실행할 수 있음을 보였다.

Keywords: Deflate, Zip, Non-compressed block, Steganography, Malware

I. 서 론

1.1 Deflate 압축 알고리즘의 압축 블록

Deflate 압축 알고리즘은 전 세계에서 가장 많이 사용되는 압축 포맷인 Zip 형식에서 채택한 핵심 압축 알고리즘이며, .apk, .gzip, .zlib, .pdf, .xlsb 등 수많은 파일 포맷에 적용되어 있다. 압축 대상의 특성에 따라 Dynamic 블록, Fixed 블록, 그리고 비 압축 블록(Non-compressed block) 3가지를 생성한다.

1.2 비 압축 블록의 구조와 DBA 개념

비 압축 블록은 헤더에 BFINAL(1bit)과 BTYPE(2bit)을 포함하며, 이후 바이트 경계에 맞추기 위해 무시되는 패딩 영역이 존재한다. 본 연구는 이 영역을 DBA라고 명명한다. 디코더가 이 데이터를 무시하므로 악성 코드를 숨길 수 있는 공간이 된다.

Fig. 1. General structure of non-compressed block in the Deflate algorithm

Fig. 2. Variation of DBA size in the non-compressed block (0~7비트 가변성)

Fig. 3. The case where the DBA does not exist

1.3 원본 데이터 없는 비 압축 블록(NPNCB)

NPNCB는 Literal data가 없는 특수한 형태(LEN=0)이다. 압축 스트림을 바이트 경계에서 강제 종료(Flush)하기 위해 네트워크 통신(PPP, TLS, SSH) 등에서 합법적으로 사용되기도 한다.

Fig. 4. The structure of No-Payload Non-Compressed Block (NPNCB)

Fig. 5. Consecutive compressed data blocks dividing with NPNCB in the PPP, TLS, SSH

1.4 보안 취약점

NPNCB는 압축 해제 시 생성되는 원본 데이터가 '0바이트'이므로 전체 파일의 CRC 검사 값에 영향을 주지 않는다. 따라서 일반 백신과 압축 프로그램의 탐지를 완벽히 우회한다.

II. 본 론

2.1 NPNCB 삽입 공격 개요

정상적으로 생성된 모든 유형의 압축 블록 사이에 악의적 인코더를 이용해 임의의 NPNCB를 무한정 삽입할 수 있다.

Fig. 6. The variability of NPNCB injection attack scenario design

Fig. 7. The process of injecting malignant NPNCBs while initially encoding by malignant encoder

2.2 NPNCB 삽입 수행 및 압축 해제

m4a 파일과 xlsx 파일을 대상으로 실제 테스트를 진행하였다. 각 블록 사이에 수많은 NPNCB('X' 블록)를 삽입하였음에도 압축이 정상 해제되었다.

Table 1 & 2. m4a 파일 정상 압축(174블록) vs 악의적 NPNCB 삽입(348블록)
Normal Encoder (Table 1)	DDDD...DDND... (총 174개)
Malignant Encoder (Table 2)	DXDXDX...DXNX... (총 348개)

Fig. 8. The process of injecting NPNCBs to normal compressed blocks while the M4A file being encoded

Fig. 9. NPNCBs inserted compressed file (.m4a) is normally decompressed by commercial encoder without any alerts

Table 3 & 4. xlsx 파일 정상 압축 vs 악의적 NPNCB 삽입
Normal Encoder (Table 3)	DNNN...DDD... (총 135개)
Malignant Encoder (Table 4)	DXNX...DXDX... (총 270개)

Fig. 10. The process of injecting NPNCBs to xlsx file compressed blocks

Fig. 11. NPNCBs inserted compressed file (.xlsx) is normally decompressed by commercial encoder without any alerts

2.3 ~ 2.5 압축 데이터 디코딩 및 코드 실행

시중의 상용 압축 프로그램(반디집, 알집 등)은 NPNCB를 무시하고 정상 데이터만 해독한다. 반면, 악의적 디코더는 압축 해제와 동시에 각 DBA의 비트 조각들을 추출하여 악성 코드로 재조합(Reassemble)하고 메모리 상에서 즉각 실행한다.

Fig. 12. Decoding the malignant NPNCB block injected compressed data by normal compression program

Fig. 13. Decoding the malignant NPNCB block injected compressed data by malignant decoder

Malignant NPNCB block inserted by attack scenario

Fig. 14. Malignant NPNCB block can be inserted between normal compressed blocks by designed attack scenario

III. 공격 사례 구현

3.1 악의적 실행 코드 삽입 개요

7,680 바이트의 테스트용 악성 응용 프로그램을 준비했다. 1바이트를 5비트와 3비트로 쪼개어 총 15,360개의 NPNCB DBA에 분산 탑재(Steganography)하였다.

Fig. 15. The running result of test malignant code (TEST_SECURE.exe) which will be fragmented into NPNCBs' each DBA

Fig. 16. NPNCB injection attack test case by malignant decoder compared with normal encoder

Fig. 17. The summary of malignant code dividing into each NPNCB DBA in the test attack case

결과적으로 이 거대한 변조에도 불구하고 알집 등 상용 툴에서는 정상 해제되었으며, 연구용 악의적 디코더 사용 시 숨겨진 팝업 프로그램이 완벽하게 재구성되어 실행되었다.

Malignant zip file decompression by normal decoder

Fig. 18. Malignant zip file decompression success by normal decoder

Malignant zip file decompression by malignant decoder

Fig. 19. Malignant zip file decompression success by malignant decoder

IV. 결 론

Deflate 알고리즘의 구조적 허점을 이용하면 CRC 검사 우회 및 백신 탐지를 완벽히 피하면서 방대한 악성 코드를 은닉할 수 있다. 동일한 코드라도 원본 데이터의 특성에 따라 가변적으로 바이너리가 변경되어 탐지를 더욱 어렵게 만든다. 본 논문은 이 광범위한 취약점을 명확히 증명하고 조속한 보안 패치의 필요성을 제기한다.

저자 소개 | 김정훈 (Jung-hoon Kim)

약사 프로그래머
서울대학교 약학과를 졸업하고 동 대학원 보건학 박사를 수료하였으며, 경희대학교 컴퓨터공학과 SW융합학 석사를 졸업했습니다. 의료정보, 정보이론, 정보보호 및 고성능 데스크톱 애플리케이션 아키텍처 분야에 관심이 많음

Malicious Code Injection Vulnerability Analysis
in the Deflate Algorithm

Journal of The Korea Institute of Information Security & Cryptology
VOL.32, NO.5, Oct. 2022 | ISSN 1598-3986

Jung-hoon Kim (CEO, BinaryLab)

ABSTRACT

Through this study, we discovered that among three types of compressed data blocks generated through the Deflate algorithm, No-Payload Non-Compressed Block type (NPNCB) which has no literal data can be randomly generated and inserted between normal compressed blocks.

In the header of the non-compressed block, there is a data area that exists only for byte alignment, and we called this area as DBA (Disposed Bit Area), where an attacker can hide various malicious codes and data. Finally, we found the vulnerability that hides malicious codes or arbitrary data through inserting NPNCBs with infected DBA. Experiments show that commercial programs decoded the contaminated zip file normally without any warning, and malicious code could be executed by the malicious decoder.

Keywords: Deflate, Zip, Non-compressed block, Steganography, Malware

I. Introduction

The Deflate algorithm, widely used in Zip, apk, pdf, etc., creates three types of blocks. The non-compressed block contains a padding area for byte alignment called the DBA. Since this area is ignored by decoders per RFC 1951, it can be exploited.

NPNCBs (LEN=0) contain no payload. By injecting NPNCBs infected with malware in their DBAs, the CRC of the decompressed output remains identical to the original, completely bypassing commercial integrity checks.

Fig. 1. General structure of non-compressed block in the Deflate algorithm

Fig. 2. Variation of DBA size in the non-compressed block

Fig. 4. The structure of No-Payload Non-Compressed Block (NPNCB)

II. Main Body

NPNCBs can be freely injected between normal blocks. In testing with .m4a and .xlsx files, thousands of malicious NPNCBs were injected.

Fig. 6. The variability of NPNCB injection attack scenario design

Standard decoders simply ignore the DBA bits. A specially crafted malignant decoder, however, actively extracts these bits from the DBA space during decompression and combines them to execute the payload in memory.

Fig. 12. Decoding the malignant NPNCB block injected compressed data by normal compression program

Fig. 13. Decoding the malignant NPNCB block injected compressed data by malignant decoder

III. Attack Case Implementation

We fragmented a 7,680-byte Windows malware into 5-bit and 3-bit pieces, hiding them across the DBAs of 15,360 NPNCBs.

Fig. 17. The summary of malignant code dividing into each NPNCB DBA in the test attack case

The resulting archive passed all commercial archiver checks (ALZip, Bandizip) silently. Only the malignant decoder reassembled and executed the hidden popup application successfully.

Fig. 19. Malignant zip file decompression success by malignant decoder

IV. Conclusion

The Deflate structural gap allows binary steganography that evades modern CRC and signature checks due to the variable nature of DBA size. This paper highlights the critical need for new validation algorithms to prevent silent data infection inside standard archives.

About the Author | Jung-hoon Kim

Pharmacist Programmer
B.S. Pharmacy, Ph.D. Candidate in Public Health (Seoul National University) / M.S. SW Convergence (Kyung Hee University). Specializes in Medical Informatics, Information Theory, and Cybersecurity.