This page documents project Edge: Finding an Edge in the Options Market by Analyzing Aggressive Option Order Flow. It is a work in progress. I have been working on collecting the data and writing analysis scripts for over a year.
Part I: Thesis
The working thesis is aggressive trades in the option market can lead to momentum in the underlying stock. By analyzing aggressive order flow, I hope to find an indication of market direction, as well as confirmation that the underlying stock will move in that direction with momentum.
Part II: Data Collection and Cleaning
The model includes data pulled from multiple places. The following elements are include so far:
- Option order flow
- OHLCV Stock data
- SPX Put/Call ratio
- VIX Put/Call ratio
Option Order Flow
The option order flow I collect is a curated list of aggressive option trades. The un-curated, raw data is from Trade-Alert‘s Real-Time Alerts: Option blocks and sweeps. The curating is done by the Sang Lucci/Wall St. Jesus trading group.
I collect the data twice a month: 15th, and last day.
The website from which I get the data is not conducive to web scrapping. I tried writing a python script using Beautiful Soup, but it was unsuccessful due to their web design. Since I was unable to get the data automatically, and their website doesn’t allow native copying, I use a Google Chrome extension called Allow Copy to enable copying in the browser. I copy the data manually from their trading room. I paste the data into a spreadsheet to clean it up. The raw data I collect looks like this:
|trending_down||14:59:57||T||09-19 M||30||30.67||$105K||Long Put|
|trending_down||14:59:56||UVXY||03/22-19 W||41||37.9||$612K||Long Put|
Before importing the data into a Microsoft SQL database, I enhance it by adding dates and transforming the dollar value into a numeric value. The google-sheets formula for transforming the dollar value is this:
If the value ends with K, it knows to match and multiply by 1000; if it ends in M, match and multiply by 1000000.
Once completed, the dataset looks like this:
|trending_down||14:59:57||T||09-19 M||30||30.67||$105K||Long Put||03/15/2019||105000|
|trending_down||14:59:56||UVXY||03/22-19 W||41||37.9||$612K||Long Put||03/15/2019||612000|
After being enhanced in Google-Sheets, I save the data as a CSV file and use the import function to pull the data into my database. Since each collection is imported, I run a script to insert the data into a master table called FlowTotal:
INSERT INTO FlowTotal ( flowID ,Flow ,Received ,Sym ,Expiration ,Strike ,Ref ,Size ,Sentiment ,DATE ,spent ) SELECT FlowID ,Flow ,Received ,Sym ,Expiration ,Strike ,Ref ,Size ,Sentiment ,DATE ,spent FROM d0315
Flow total contains two additional columns: Sector, and Expiration Date (expirDate).
To populate the additional columns I use a series of scripts. First, I update the sectors in the flowTotal table from a master list in my sectors table as follows:
update f set sector = s.sector from flowTotal f inner join sectors s on s.sym = f.sym where f.sector IS NULL
If I do not have the sector in the master list, I add it to the table and re-run the update. Here is the process:
update f set sector = s.sector from flowTotal f inner join sectors s on s.sym = f.sym where f.sector IS NULL select distinct sym from flowTotal where sector is null INSERT INTO Sectors (sector, sym) select 'Cyclical Consumer Goods', 'LAUR' --Repeat the insert process until all nulls are accounted for --Rerun update
The process to insert the expiration dates is similar. I use a master table of expiration values and expiration dates. I keep a count of the expiration as well. The entire process looks like this:
--Find the expiration values that do not yet exist --in the ExpiRank table select f.expiration, count (f.expiration) as expicount from flowtotal f left join ExpiRank e on e.Expiration = f.Expiration where e.expiration is null group by f.expiration --Insert null the expiration values Insert into ExpiRank(Expiration, ExpiCount) select f.expiration, count (f.expiration) as expicount from flowtotal f left join ExpiRank e on e.Expiration = f.Expiration where e.expiration is null group by f.expiration --Update the date value associated with --the expiration value: Update expiRank set expiDate = '07/27/2018' where expiration = '07/27-18 W' --Update the number of times the expiration appears update expirank set expiCount = t.expicount from (select count (expiration) as expiCount, expiration from flowTotal group by expiration)t where t.expiration = expiRank.expiration --Update flowTotal to add the expiration dates --to the expirDate column update f set expirDate = e.expidate from ExpiRank e inner join FlowTotal f on f.Expiration = e.Expiration where expirDate is null
Once updated, the FlowTotal data looks like this. TotalID is the flowTotal primary key. FlowID is the key from the import.
|1||1||trending_up||14:58:55||WLL||12/01-17 W||24||23.22||$18K||Long Call||2017-11-29||18000||Energy||2017-12-01|
|2||2||trending_up||14:58:47||TEVA||12-17 M||16||15.05||$20K||Long Call||2017-11-29||20000||Healthcare||2017-12-15|
OHLCV Stock Data
In addition to collecting the option order flow, I collect Open, high, low, close, volume (OHLCV) data using Google Finance on Google sheets. I copy the following formula into a spreadsheet, add a column for the Stock symbol, and then add the sector in the SQL database:
=GOOGLEFINANCE("SPY", "all", "10/01/2017", "01/21/2019", "DAILY")
I’ve only included 42 stock symbols in the stock data collection of the 2148 unique stock symbols within the option order flow. Majority of them were selected based on record count within the Flow Total database. A wider variety of symbols appear during earnings season likely because earnings increases volatility and acts as a binary event. In time, exclusively analyzing earnings order flow might be interesting, but is currently outside the scope of this analysis.