SEC Insider Information

Project Leaders: Elijah Grubbs, Justin Paul

Team Members: Anjika Jain, Jacob Iwashyna, Jake Gwinn, Kevin Wang, Malachi Mealoy, Matthew Inda, Riley Rich, Karl Mohy El Din, Zakarai Zerrouki,

About This Project

What comes to mind when you hear the words “insider trading?” Do you think of high-up executives using their position to make outrageous returns on investment? Maybe you think of congressmen that abuse their inside knowledge of legislation on the stock market? This project aims to analyze the trading patterns of such insiders.

We specifically focused on the SEC Filings from Q1 of 2020. As this dataset encompasses the stock market crash caused by COVID-19, we hypothesized that the instability in the market could help highlight strange trading patterns. This was a beginner-friendly project, starting with learning the basics of dataframes to creating visualizations with our data.

Dataset in bRief

To actually collect the data, we processed historical SEC Filings. For readability, a cropped version of a filing can be found below. As web scraping was not the focus of this project, we won’t focus on the implementation. The main feature of the webscraper was converting a nosql-structured xml file into a tabular csv file format compatible with pandas. We ran into issues (which we believe to be human-error, such as dates indicating shares were bought 10 years into the future). Of note, because this was extremely time-consuming, the whole filing data was not pulled, but rather a subset consisting of 23,000 filings, which still formed a large dataset. This means Elon Musk and Tesla trades were not considered in our analysis.


The findings of our analysis can be broken up into two groups. Each captured a different way we approached the data to try and sniff out the sus trades. First, we see the results of generating questions before looking at the data, using only contextual knowledge as a foundation for analysis. Second, visualizations detailing a data-first approach and the results drawn from that can be found.

The Top-Down Approach

In this approach, we formulated several hypotheses that would reveal a statistical edge held by company insiders if validated. The hard part was using the data to definitively test our hypotheses. Baking them up with data analysis. Below are two hypotheses, their motivations, and the results of our investigations into each.

Scatterplot of Transaction Amount Versus Potentially Realized 6-Month Transaction Returns

Does Higher Amount Purchased Correlate to Higher Returns?

All Company insiders are more or less inside the same millionaire, potentially billionaire, tax bracket. Although some may be wealthier than others, it's fair to say that a million dollars are still a million dollars, no matter if you have 50 or 200 of them. From this, we thought that the amount an insider buys is a good indicator of their conviction in their company, and thus higher amounts purchased would correlate to higher returns.

Unfortunately, we found that there was no relationship between the transaction amount and the corresponding return. The line of fit in the figure to the left looks flat, and by constructing a correlation matrix between the transaction amount and 6-month returns, we find covariance between the two is nearly 0 (0.034).

Does Increasing Position Size Correlate with Increased Returns?

To correct for differences in net worth that could have been a problem in the previous, we decided to look at changes in the amount owned. If insiders had a strong conviction about the direction of their company's stock price, we would expect

them to load up on shares or dispose of most of their shares before a big run-up/meltdown.

The percentage change in ownership refers to (#shares traded) / (#shares owned prior). The reason we wanted to analyze this statistic is that if someone has a large percentage change of ownership, this indicates they are buying a lot of shares. For an insider, this could potentially mean that something drastic is happening in the company. The opposite could be said of someone who is dumping a lot of shares (indicating a negative percentage change of ownership. However, once again, no correlation between these features was found.

Scatterplot of change in insider ownership and the potentially realized 6-month reutnrs of transactions

Data-Driven Visualizations

Instead of sitting back, thinking about a plausible theory, then looking for data to justify that, we let our data guide or next steps.

What you see to the left is a graph of the count of transactions acquired and disposed of each day from Jan 1st 2020 to March 31st 2020. The dips to zero every so often represent the weekends (You can't trade securities on the weekend). What you notice is that there is a spike on New Years day. Which isn't surprising since there are a whole host of plausible business reasons, tax reasons, and what not that would explain the large spike. But, what is interesting is the dominance of acquisitions and decrease in disposals that take place during early and mid march. That big break corresponds exactly with the near market bottom of Mid-March 2020. Logically, the next thing to ask is: do insiders who bought during the market decline experience better returns?

Histogram of Transaction 6-Month Returns for the period Jan 1st - Feb 26th 2020

Histogram of Transaction 6-Month Returns for the period Feb 27th - Mar 31st 2020

Unfortunately, the distribution of returns looks exactly the same. There was a general shift to the right (higher returns), of the transactions that took place during/after the market crash. However, the shift was exactly the same in magnitude for the market overall.

All we can observe is that the tails of the distribution has slightly fatter tails. Meaning, the Insiders who won big during/after the crash won bigger than the big winners before the crash.

It seems as a group that insiders who bought at the bottom didn't do better than their contemporaries who bought just a few months before. So next, we looked at the insiders who performed the best and tried to look for connections between them.

As seen in the graph to the right, the top 5 insiders with highest average returns all bought during the market bottom. But this seems a little contradictory. If the histograms above are very similar, save the positive shift also experienced by the general market, we'd expect the insiders responsible for the fat tails of the returns before the market bottom to show up on the graph to the right. Well, it turns out that one insider is responsible for the entire datasets fat tails. An whats more, they never lost money.

Introducing the Champion: allison abraham

- Chairman of the board of directors for

- On the Board of Directors since 2002

Overstock is a discount online furniture and home store. A business model that saw increasing success during the Covid pandemic and the government-mandated lockdown.

We believe that Ms. Abraham embodies everything we were trying to uncover. An insider with a keen understanding of their business, who trades publically available information a little too well. With all of the years of experience she has at Overstock, it's clear that she understood the downstream effects a lockdown would have on her company, and really put her money where her mouth was.

Simulating an Insider ETF

It feels like we have a few conflicting messages. On one hand, we see the distribution of insider returns looks the same, but is simply correlated to the market. Yet, we found a perfect example of an insider we set out the prove existed. To try and see how these two can coexist, we simply recreated what it would be like to invest alongside the investors.

6-Month Returns of all of Allison Abraham's transactions. She never lost money trading her company

Price History of Overstock. Dates of stock acquisitions indicated by the black lines

To spare some details, we simply took all the transactions on a given day, created a weighted portfolio of each Ticker traded, where the weights corresponded to their fraction of the total money spent in that days bucket of transactions. Then, we tracked that portfolio for 6 months.

The graph to the left depicts the results for each hypothetical portfolio, but averaged over a given week. The average at zero can be ignored as it is due to an off-by-one error when making the graphic, but unfortunately we did not have time to correct it and re-run our analysis. That aside, it looks like investing in the "Insider ETF" does not lose you money, and puts into perspective the returns you would have gotten investing alongside them instead of investing in the market.

The results of this finding does not constitute financial advice. Try to replicate these results at your own risk. This was done purely for educational purposes.

Limitations and Future Work

We had several limitations starting from the data that we used. The first is that the data was not well set up for EDA as it was in a non-tabular format. Additionally, the 23,000 filings that composed our dataset were just a subset of the 2020 quarter 1 filings. The other, perhaps biggest, challenge was that this is a hard problem in a hard field. Analysts at the SEC do a lot of work in this area to catch illegal instances of insider trading, and it is just not easy to come up with a question for this, with lots of time, or with our limited 10 work sessions.

Due to time constraints (both for data collection and data analysis), we weren’t able to flesh this project out as much as we would have liked, but hopefully it is enough to interest YOU, the reader, to reach out and revive this project!