Antivirus | Vithurshan SELVARAJAH

Introduction to Antivirus

In the early stages, antivirus software used only signatures to uniquely identify malware. A signature represents a piece or function of malware. For example, two signatures can be used to represent the exact same malware: one targeting the malware file on disk and another detecting its network communication.

In 2014, a signature language named YARA was open-sourced, aiming to help malware researchers identify and classify malware samples. YARA can also be used to query VirusTotal, a malware search engine that allows users to search for known malware.

NOTE: If the uploaded malware sample is new or unknown, it will be stored, and a hash will be produced to uniquely identify the new malware. As an alternative to VirusTotal, we can use AntiScan.Me. This service scans the sample against 30 different AV engines and claims not to divulge any submitted sample to third parties.

Modern AV solutions, including Windows Defender, now include a Machine Learning (ML) engine that is queried whenever an unknown file is discovered on a system. These ML engines can detect unknown threats but require an internet connection since they operate on the cloud.

Additionally, today's AV solutions consist of many other engines, which results in overloading computer resources. To overcome these limitations, Endpoint Detection and Response (EDR) solutions were introduced. EDR software is responsible for analyzing data received from end devices, generating security-event telemetry, and forwarding it to a Security Information and Event Management (SIEM) system, which stores data from every company host. This data can be viewed by a security analyst team to gain a full overview of any past or ongoing attacks affecting the organization.

Detection Methods

Signature-based antivirus detection scans the filesystem for known malware signatures and quarantines identified files. A signature can be as simple as the hash of the malware file itself or a set of multiple patterns, such as specific binary values and strings unique to that malware. Relying solely on the file hash as the detection mechanism is a weak strategy because changing a single bit in the file results in a completely different hash.
Heuristic-based detection is a method that involves disassembling the program into machine code or decompiling it to obtain a more comprehensive map of the program. The idea is to search for various patterns and program calls, rather than simple byte sequences, that are considered malicious.
Behavior-based detection executes the file in question in an emulated environment, such as a small virtual machine, and searching for behaviors or actions that are considered malicious.
The machine-learning detection uses ML models and heuristics that were produced by the client ML engine and then passed to the cloud ML engine which is capable of analyzing the submitted sample against a metadata-based model consisting of all samples submitted.

AV Engines

A modern antivirus is often equipped with the following components:

File Engine
Memory Engine
Network Engine
Disassembler
Sandbox
Browser Plugin
Machine Learning (ML) Engine

Each of the engines above work simultaneously with the signature database to rank specific events as either benign, malicious, or unknown.

The file engine is responsible for both scheduled and real-time file scans. When the engine performs a scheduled scan, it simply parses the entire file system and sends each file's metadata or data to the signature engine. Real-time scans involve detecting and reacting to any file system actions, such as downloading new malware from a website or writing malware to disk. To detect such operations, real-time scanners need to identify events at the kernel level via a specially designed driver called mini-filter. This is why modern antivirus software needs to operate in both kernel and user space.
The memory engine inspects every process's memory space at runtime for well-known binary signatures or suspicious API calls that might indicate memory injection attacks.
The network engine inspects the incoming and outgoing traffic on the network interfaces. Once a signature is matched, a network engine might attempt to block the malware from communicating with its Command and Control (C2) server.
Malware often uses encryption and decryption to conceal its true nature. Antivirus solutions counterattack this strategy by disassembling the malware packers/ciphers and loading the malware into a sandbox. The disassembler engine is responsible for translating machine code into assembly language, reconstructing the original program code sections, and identifying any encoding/decoding. Once the malware is unpacked/decoded and running in the emulator, it can be thoroughly analyzed against known signatures.
As browsers are protected by the sandbox, modern AVs often install browser plugin to get better visibility and detect malicious content that might be executed inside the browser.
The machine learning component enables detection of unknown threats by relying on cloud-enhanced computing resources and algorithms.