There’s been hundreds of software dependency supply chain attacks exploiting a range of vectors in the past, with great effect. The July 2020 paper by Marc Ohm et al describes that on average a malicious package is available for 209 days.
(𝑚𝑖𝑛=−1,𝑚𝑎𝑥=1,216,𝜎=258,𝑥̃ =67) so naturally, any method to reduce this number would be well recieved.
In this post, I want to document a really interesting detection vector that I am attempting to operationalize using software health metrics from readily available metadata in an attempt to begin a compendium of metadata based detection aids for some attack vectors related to this problem.
This method is not really applicable or intended for typosquatted packages (although a lack of metadata is a signal in and of itself), but more for situations where a threat actor gains publishing permissions or control over an existing repository.
From an attackers perspective, modifying a known good package’s source code has several stages in the OODA loop
1) Identification of a target. Not withstanding opportunistic account acquisition (i.e., compromised credentials), projects with low maintainer counts, long periods with no commits, and longstanding open issues historically make for good targets. Tooling like CHAOSS and OSSF Metrics are useful for both attackers and defenders in selecting targets that meet this criteria.
2) Determination of malicious code entry vector. Social Engineering (working to obtain trusted status on the repositories, or PR’s for issues), obtaining publishing rights, and repo takeovers are common entry vectors and the ones we’re concerned with here. This process is closely related to target identification steps.
3) Commit(s) of malicious content. Once code is committed to the repository, the attacker is exposed to any gating mechanisms, and thousands of eyes on their work. Lengthy peer review periods or branch protections put the attack at risk; they’re incentivized to move briskly through this process so their malicious additions execute, pushing their code through to main as soon as possible.
Analysis 1: Event-Stream
Event-Stream at time of exploit was used by another 1,600 packages, and was on average downloaded 1.5 million times a week.
The new publisher added a dependency and a minor version increment to Event-Stream called Flatmap-Stream which had at the time 1 commit and no users. Flatmap-Stream was targeted in its malicious behavior in that it was designed to look for targets’ cryptocurrency wallets but take no action if none were found.
After a few days (and millions of installs) the publisher removed the dependency and adds a major version increment, leaving a large number of installs but ‘cleaner’ looking source code if it were to be inspected.
What does the metadata tell us about Event-Stream?
The data also shows that the project had not received any updates for a substantial period of time, which was surely a factor in the attackers reconnaissance and target selection.
The metadata shows an unusual trend in reviews at the time: less than half the length of any prior review, coupled with a new user publishing:
These data points don’t point to anything untoward on their own. For instance, the picture below shows that it was common for the project to have one-off or “drive by” contributors(very common in open source) - but together the data starts to paint a picture.
Of course, once the damage is done we see a significant increase in issues reported:
I’ll be studying and posting about other, similar attacks on trusted repositories. I believe by observing packages from the eyes of an attacker it is possible to first isolate ‘unhealthy’ projects and then subject those repos to additional interdiction when anomalous activity commences in the future so we can reduce the discovery time on malicious repositories in open source software. The OSSF Metrics group - is already collecting excellent data points on these precursor conditions, which I am using for the isolation process.