Swap Detector Uses “Big Code Analysis” Techniques to Catalog Known Good Call Patterns and Identify Anomalies Associated with API-Usage Bugs
Modern software development involves the use of third-party APIs, libraries, and/or frameworks that are complex, rapidly evolving, and sometimes poorly documented. According to industry estimates, open-source components can represent up to 90% of the code in the average application. Meanwhile, API usage errors are a common source of security and reliability vulnerabilities. Swap Detector enables developers and DevOps teams to identify errors due to swapped function arguments, which can also be present in the deployed code.
“Traditional static-analysis techniques do not take advantage of the vast wealth of information on what represents error-free coding practices available in the open-source domain,” says Alexey Loginov, Vice President of Research at GrammaTech. “With Swap Detector we applied Big Data analysis techniques, what we call Big Code analysis, to the Fedora RPM open-source repository to baseline correct API usage. This allowed us to develop error-detection capabilities that far exceed the scalability and accuracy of conventional approaches to program analysis.”
Read More: When Data Reliability and Scale is Everything, Why We Turned to Open Source
Swap Detector consumes input information about a call site, and optionally, function declaration information pertaining to that call site. If it detects a potential swapped-argument error at that call site, it outputs an appropriate warning message and a score for the warning. The Swap Detector interface integrates with a variety of static analysis tools, such as the Clang Static Analyzer, Clang-Tidy, and PyLint. Although initially focused on C/C++ programs, Swap Detector is applicable to programs in other languages; and is especially beneficial for languages that are interpreted and not compiled.
Swap Detector uses multiple error-detection techniques, layered together to increase accuracy. For example, it compares argument names used in call sites with the parameter names used in corresponding declarations. In addition, it uses “Big Code” techniques, applying statistical information about usages of “known good” API-usage patterns collected from a large corpus of code, and flagging usages that are statistically anomalous as potential errors. To improve the precision of the reported warnings, Swap Detector applies false-positive reduction strategies to the output of both techniques.
Swap Detector was developed based on research sponsored by DHS S&T (contract numbers HHSP233201600062C, 70RSAT19C00000056). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DHS.