Submitted Abstract
The current momentum of Android has attracted the interest of malicious app writers who are contributing with an increasingly high number of malware distributed through official and alternative markets. A malicious app is typically designed and advertised as providing specific user-desired functionalities, and yet is implemented to behave in a way that contradicts with the user interests. The palette of techniques used by malware goes from simple use of sensitive API methods (such as sendSMS) to more sophisticated exploitation of new vulnerabilities (such as data residue attacks after uninstallation of popular apps). Malware writers can further leverage evasion techniques to harden the job of security analysts by challenging static analysis approaches through the use of reflection, native code and string encryption, or by limiting the efficiency of dynamic analysis techniques through the non-execution of malicious behavior in an emulated environment. Nonetheless, reports from Antivirus vendors and studies in the literature regularly highlight the predominance of a set of malware families within which samples are categorized based on the runtime behavior of app code (i.e., the malware activation process as well as the actions and data used by the malicious payloads). In CHARACTERIZE, we build on this key assumption that malicious behavior types are, to some extent, instantiated by similar code patterns in the variety of malware samples. The first challenge is then, for each family, to identify recurring samples of malicious pieces of code to infer the common patterns or the common features in order to “characterize” them. The second challenge is to leverage these patterns or features to detect Android Malware. To that end, CHARACTERIZE envisages to explore two parallel directions: (1) Explainable Per-Family Machine Learning Malware Detection; (2) Pattern Matching Based Malware Detection.We foresee the following contributions:(a) By relying on our previous works and on our AndroZoo dataset, CHARACTERIZE contributes in releasing a unique dataset of Piggybacked apps and apps with lineage (i.e. different versions). (b) CHARACTERIZE provides a large-scale characterization of malware families by extracting patterns and features by means of “diff” computations. The innovation lies on our capacity to learn from localized pieces of code which implements maliciousness in an app.(c) CHARACTERIZE proposes per-family machine learning malware detectors using new (semantics) features. This per-family detection with semantics features allows the explanation of why an app has been classified as malware. (d) CHARACTERIZE proposes a pattern-matching based approach, robust against obfuscation, to detect occurrences of malicious code fragment. The analyzed apps will be first instrumented to reduce the impact of obfuscation and ease the analysis (i.e. the pattern finding). The analysis will be static or dynamic according to the level of sophistication of the obfuscation.The novelty of CHARACTERIZE lies in building on our capacity to learn from localized pieces of code which implements maliciousness in an app. A funded research project in this theme will allow to reach new levels of expertise and yield significant contributions in the field of mobile malware detection in app markets.