Enter Analytics: From Boot Camp to working in Data Science

When researching boot camps online, there always seem to be a lot of mixed opinions. There are a lot of them out there, some more expensive than others, and they all have slightly different focuses while claiming they can churn out analysts, full stack devs, and/or data scientists in as little as 12 weeks. In my experience, education tends to boil down to you get out what you put in, and my experience taking a Data Visualization and Analytics boot camp was no different.


Why a Boot Camp for Me?

I’ve always been interested in computers, but never thought computer science was for me. Growing up, I never felt like I was that good with numbers so I didn’t try that hard when learning math. In high school, I took a course on Java and a course on computer networking fundamentals. Neither of them captivated me, so I decided to keep computers a hobby and ended up studying Scientific and Technical Communication in college.

When I landed my first job out of college as an intern technical writer at a software company, I got my first taste of what computer science was really about. I fell in love with a language I had never seen before: SQL. When my internship ended, since I loved the company, I decided to take a job with their support team. It gave me an opportunity to write every day, pick up technical skills like SQL, and learn about the software development life cycle.

After six-ish years of climbing the ladder in the support department, I felt like I hit a wall. I had transformed into a product expert who was reading and writing SQL all day. Although I had some skills, I felt like getting to the next level was going to take credentials I didn’t have. Additionally, my interest in analyzing data was beginning to take off. For example, I was obsessed with the stock market.

While researching some stocks one night, I came across an advertisement saying something like, “Learn Data Analytics in just 24 weeks at the University…” Since it was advertised through my alma mater university, I decided to click the link and download a brochure. After doing more research, I learned it was a coding boot camp put together by a company named Trilogy, and it focused on teaching data visualization and analytics skills using coding languages like VBA, Python, and JavaScript. The boot camp was expensive, and even though I didn’t enjoy coding in high school, I decided I had matured enough as a person to give it another shot. I signed up for the boot camp and hoped it would take me down a path into data analytics!


Starting the Boot Camp

While researching what I had gotten myself into, I read a lot of reviews about people not liking their teacher. I was super lucky in that regard and got a fantastic teacher. He was humble, patient, knowledgeable, and good at keeping the class interesting. One of the benefits of taking a boot camp that offers in-class instruction is getting to meet like-minded and motivated people face to face. It allowed me to make some new friends as we all struggled through the process of going back to school and tackling fast-paced content.

I went into the course with very little Python experience. I had written a couple scripts to scrape some stock data from finance websites, and I messed around with scikit-learn to make a few predictive models, but majority of my technical skill set was SQL. Within the first third of the boot camp, we had worked through fundamentals of Excel and VBA, and started working with python, APIs and JSON data. It was an overwhelming amount of content, but it really made me push myself to focus and learn. My free-time evaporated as I dedicated it to homework.


Creating a Portfolio of Projects

The boot camp focused heavily on creating a portfolio of projects that could get employers interested. They were mostly group projects which was nice since it forced us to strategize merging code and gave us experience developing on a team. We were able to form our own groups, pick whatever topics we wanted (for the most part), and then were off to the races. We had almost zero restrictions as long as we met bare minimum requirements.

If you’re an over achiever or have experience with any of the languages, the projects acted as great opportunities show off a bit. Some of the people in my cohort had advanced degrees, and the collective background of the class was broad. There were quite a few people who had a PhD or MBA, and there were people who had never written a line of code in their life. Because so many people came from different fields, there was no shortage of interesting project topics. The projects also gave us a chance to work on our presentation skills since we had to present them to the class once they were complete.

You can see some of my projects here:

Generating Wine Recommendations
Visualizing World Cup Data
Extract Transform Load with Comic book data and MySQL
Analyzing Midwest Murder Data
Visualizing Options Trades

Criticisms of the Boot Camp

We covered a lot in a short amount of time… almost too many topics, actually. Just when you start getting comfortable and ready to do more advanced things, they change the topic. It is really up to you to decide what direction you want to take things outside of the classroom. For example, I am focused on python the most because I wanted to get into data science.

I do wish there was a larger focus on analytics in the class. Although we learned some basic statistics and fundamental analysis techniques, there was a large chunk of class devoted to web technologies like HTML, CSS, and JavaScript. I understand putting it in there so we can learn to code interactive dashboards and web pages to show off our projects, but it ended up taking a third of the class. I didn’t mind it because I work at a web-based software development company, so understanding JavaScript was applicable to my job, but others in the class found little value learning HTML and CSS. Another reason they cover basic web design is to make it easier to segue into web-scraping.

The coursework focused more on visualizing data than analyzing it, which might be great if that is what you want to learn. I was hoping to get more feedback on my assignments so I could gauge how I did with my analysis. The only stats we covered were things like Gaussian distributions, standard deviation, ANOVA, Chi-square, student-t, hypothesis testing, and a few other stats functions that I can’t remember from the SciPy Python library.

If you have no coding or comp-sci experience, I recommend checking out some online resources first to get a bit of knowledge because the course is really fast paced. Beyond Excel, VBA, Python and JavaScript, the course also touches on SQL, R, tableau, machine learning, and big data. I left the boot camp having exposure to all kinds of languages, but it just isn’t possible to remember them all because we only cover them briefly. I had to supplement the class room learning with additional online courses to get a deeper understanding of the languages.

Life After Boot Camp

After boot camp, I felt like I accomplished a lot and was able to develop a foundation of coding skills. It opened the door for me to advanced my career. A couple months after completing my certification, I landed a job as a Software Product Analyst on the Data Science team at my employer. Now I get a chance to work with experienced data scientists, software engineers, data engineers and data analysts. I developed a passion for coding and an enthusiasm for data science. Beyond the coding and career development, I made some new friends in the boot camp who I still hope to see from time to time. The boot camp changed my life for the better because I busted my ass and kept a positive attitude even though I had to sacrifice a lot of time and dedicate myself to learning something new.

My last day before starting my new career post boot camp

Final Thoughts

Boot camps aren’t for everyone, and if you’re thinking about taking one, make sure to research what the course offers. If you think you’re up for doing the work, they can definitely change your life and start you on a new path. The boot camp worked for me, but I already knew SQL and already had a passion for data. If you’re on the fence, check out all of the free MOOCs or online content that can help you decide if the topics are right for you.

Check out my github if you’re interested in seeing some of the assignments and projects I completed in the boot camp.

https://github.com/bendgame?tab=repositories

Medium Articles

I’ve been writing articles on Medium. Check them out!

Exploring Beef Jerky Data using Python https://medium.com/@erickleppen01/exploring-beef-jerky-data-using-python-115054741608

My Trick to learning List Comprehensions in Python https://towardsdatascience.com/my-trick-to-learning-list-comprehensions-in-python-8a54e66d98b

Pivoting your Data using Python, SQL or spreadsheets https://towardsdatascience.com/pivoting-your-data-using-python-sql-or-spreadsheets-62736ba23921

Analyzing Wine Descriptions using the Natural Language Tool Kit https://towardsdatascience.com/analyzing-wine-descriptions-using-the-natural-language-toolkit-in-python-497ac1e228d5

Web Scraping board game descriptions with Python https://towardsdatascience.com/web-scraping-board-game-descriptions-with-python-7b8f6a5be1f3

Affordable options for hosting your Data Science Projects https://medium.com/@erickleppen01/affordable-options-for-hosting-your-data-science-projects-7d290e8a5c11

Embracing Manual Data Collection https://medium.com/@erickleppen01/embracing-manual-data-collection-9ebccbac578d

Generating Wine Recommendations using the universal sentence encoder https://towardsdatascience.com/generating-wine-recommendations-using-the-universal-sentence-encoder-d086edd13d00

Big Changes

Life is going great! I’m getting married in nine days, I’ll be starting a new career in three weeks, I’m planning my next coding project, and studying hard to keep some of the skills I learned in the Data Analytics boot camp I completed a couple months ago. I have a lot of big changes coming up, and I know I will accomplish a lot of great things as a husband, Product Analyst, and data science enthusiast.

I’m proud of the wine recommendation app I was able to put together and host at robotsdodream.com, but I need to make a few improvements to enhance the user experience. I am working on redesigning the site so it returns the results on the same page, instead of redirecting the user away from the search tool. I’ll also be embedding links within the results so you can shop for the recommendations on wine.com.

The last feature I have planned for the coming months is a “Random” button. Not just any-old random button, though… I want to engineer a feature that could generate a random search query to produce the recommendations. Not only could I practice some NLP data manipulation techniques, but also I’d be able to provide some insight into wine descriptions. For example, I am not a wine connoisseur and don’t know a whole lot about ways in which people describe wine. These are some of my typical queries:

  • red and easy to drink
  • sweet and fruity
  • buttery rich chardonnay
  • full bodied and flavorful

As you can see, I don’t have an exhaustive vocabulary when it comes to wine. Using data science to better understand some of the key words used in the descriptions will hopefully help me produce better recommendations.

While I’m working on those features, I’ve started planning my next project. My soon-to-be wife and I have been talking about getting into some board games. As a life long gamer, I think analyzing some of the top board games on boardgamegeeks.com would be a fun way to decide which game we try next. I created a plan for the dataset and started working on a python script to scrape some data. I’ll write up a tutorial on what I did once I’m done with the data engineering phase.

Other than the coding projects, my wedding is right around the corner and I landed a new job that allows me to apply the experience I’ve gained over the last 8 years in new and exciting ways. Life is busy, but I am enjoying my upward spiral and can’t wait to see what challenges I’ll overcome next.

Generating Wine Recommendations using the Universal Sentence Encoder

Natural Language Processing (NLP) has fascinated me since I first read about the Turing testwhile studying rhetorical theory and technical communication in college. The complexities and subtleties of our communication always seemed like such a defining factor in what makes us a distinct and intelligent species, so training a machine to understand language transforms communication from something that can be so ambiguous, persuasive, and soulful into a something that seems mechanical, ordered, and predictable. Once I started coding, it wasn’t long before my curiosity drove me to better understand how we can use machine learning to gain new insight into natural language and derive nuances we might have missed. For example, a recent paper was published discussing how NLP was used to make new discoveries in materials science.

One of the NLP tools I’ve been playing with is the Universal Sentence Encoder (USE) hosted on Tensorflow-hub. USE is a pre-trained model that encodes text into a 512 dimensional vector. It is optimized for greater-than-word length text and is trained on a variety of data sources. There are a few different versions of USE. I choose the model that was trained using Deep Averaging Network (DAN) since it is lighter on resources than the Transformer based model. My first project using the tool was to generate wine recommendations based on the semantic similarity between wine descriptions and my search query.


The Data

The wine data encoded by the model comes from a wine review dataset found on kaggle.com. It contains around 130,000 rows of data and includes columns like country, description, title, variety, winery, price, and rating. After I put the data into a dataframe, I dropped rows that contained duplicate descriptions and rows that had null price. I also limited the data to wine varieties that had more than 200 reviews.

#import dependancies
import numpy as np
import pandas as pd
import sqlite3
from sqlite3 import Error#create a connection to the sqlite database.
conn = sqlite3.connect('db\wine_data.sqlite')
c = conn.cursor()#read the table in the database.
wine_df = pd.read_sql('Select * from wine_data', conn)#Drop the duplicate descriptions.
wine_df = wine_df.drop_duplicates('description')#drop null prices.
wine_df = wine_df.dropna(subset=['price'])#filter the dataframe to include only varieties with more than 200 reviews.
wine_df = wine_df.groupby('variety').filter(lambda x: len(x) > 200)

Reducing the data by excluding varieties with less than 200 reviews left me with 54 varieties of wine. By googling the remaining varieties, I was able to added a Color column so the user can limit their search by desired wine color.

#create a column named color.
wine_df["color"] = ""#used to update the database with the wine color. Manually updated each wine variety.
c.execute("update wine_data set color = 'red' where variety = 'Aglianico' ")#commit the update to the database so it saves.
conn.commit()#remove all the records without a color.
wine_df = pd.read_sql("select country, description,rating,price,province,title,variety, winery, color from wine_data where color in ('red', 'white', 'other')", conn)
wine_df.to_sql('wine_data', conn, if_exists = "replace")

After cleaning the data, I was left with 100,228 rows.


Setting up the Universal Sentence Encoder

The DAN based model is around 800mb, so I felt it was important to host it locally. Using the OS library, I set where the model gets cached and am able to call it from a local directory instead of downloading it each time.

import os#create the directory in which to cache the tensorflow universal sentence encoder.
os.environ["TFHUB_CACHE_DIR"] = 'C:/Users/Admin/Downloads'
download = tfhub.Module("https://tfhub.dev/google/universal-sentence-encoder/2")

After downloading the model, you will see a file appear in the directory named something like 1fb57c3ffe1a38479233ee9853ddd7a8ac8a8c47.

Creating the Functions

Even with the model downloaded, the first few iterations of the app were resource intensive and annoyingly slow. After a bit of research and revision, I decided to use a function as a means of reducing the overhead and time it takes for tensorflow to build a graph.

def embed_useT():
with tf.Graph().as_default():
text_input = tf.compat.v1.placeholder(dtype = tf.string, shape=[None])
embed = tfhub.Module('C:/Users/Admin/Downloads/1fb57c3ffe1a38479233ee9853ddd7a8ac8a8c47')
em_txt = embed(text_input)
session = tf.compat.v1.train.MonitoredSession()
return lambda x:session.run(em_txt, feed_dict={text_input:list(x)})#run the model.
embed_fn = embed_useT()#encode the wine descriptions.
result = embed_fn(wine_df.description)

Encoding all of the descriptions eats away at system resources and takes up two or more gigabytes of RAM. If you have limited access to memory in your environment, I recommend you save the numpy array of encoded values to the SQLite database. Calling the array from the database instead of the encoding it on the fly consumes more hard drive space, but it uses half of the RAM based on my testing. You can save the numpy array to the database using this solution I found on Stackoverflow:

def adapt_array(arr):
'''
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
'''
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())

def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)

# Converts np.array to TEXT when inserting.
sqlite3.register_adapter(np.ndarray, adapt_array)# Converts TEXT to np.array when selecting,
sqlite3.register_converter("array", convert_array)c.execute("create table embeddings (arr array)")conn.commit()c.execute("insert into embeddings (arr) values (?)", (result, ))conn.commit()#return the array
c.execute("select * from embeddings")
data = c.fetchone()[0]

After encoding the wine descriptions, I created a function that outputs wine recommendations by encoding a user’s query and finding the dot product of the two arrays:

def recommend_engine(query, color, embedding_table = result):wine_df = pd.read_sql('Select * from wine_data', db.session.bind)embedding = embed_fn([query])#Calculate similarity with all reviews
similarity_score = np.dot(embedding, embedding_table.T)recommendations = wine_df.copy()
recommendations['recommendation'] = similarity_score.T
recommendations = recommendations.sort_values('recommendation', ascending=False)#filter through the dataframe to find the corresponding wine color records.
if (color == 'red'):
recommendations = recommendations.loc[(recommendations.color =='red')]
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]
elif(color == "white"):
recommendations = recommendations.loc[(recommendations.color =='white')]
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]
elif(color == "other"):
recommendations = recommendations.loc[(recommendations.color =='other')]
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]
else:
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]

return recommendations.head(3).T

Test the function:

query = "fruity, rich, easy to drink, sweet"
color = 'red'recommendation = recommend_engine(query, color)
print(query)recommendation.head(3).T

It was fun exploring all of the wine data and coming up with a some-what light weight way to generate recommendations based on a search query. I plan on continuing to explore the Universal Sentence Encoder and think of new projects to challenge myself and improve my code. Check out the code on my github here:

https://github.com/bendgame/WineRecommend

The Final Project

After six months of sacrifice and hard-work, I am proud to say I have completed the data visualization and analytics boot camp I was attending at the University of Minnesota. For my final project, I wanted to push myself and try to figure out whether or not I got my money’s worth. In the end, I realized I started the boot camp with nothing more than an understanding of SQL, and ended with the knowledge and experience of putting machine learned models into a production environment. Beyond that, I gained experience working with a variety of popular coding languages like JavaScript, Python, and SQL. I created a lot of interactive visualizations, honed my presentation skills, and collaborated with peers to design git merging strategies and workflows.

A couple months into the bootcamp, our student coordinator invited our cohort to the quarterly meetups at which alums present their best work to their peers and local recruiters. I remember seeing some pretty cool projects and meeting some brilliant people, but I always had the same question when looking at their work: How do you make money from this? I understand not everyone thinks this way, and that everyone has their reason for pursuing a particular topic; however, that question was burned into my mind as I approached my final project. I not only wanted to build fancy machine learning models, but also wanted to approach them from a business perspective instead of an academic perspective.

With my mind made up, I approached my team with the idea of creating a recommendation engine. They thought the idea sounded great and we all began speculating on topics. We sifted through a lot of datasets for potential ideas and eventually settled on analyzing a collection of 130,000 wine reviews.

We only had eight days to complete our project. The original idea was to create a tool in which a user could enter a query that describes their ideal wine, and then our model will spit out recommendations; we were an ambitious bunch! Unfortunately, after a bit more research and a couple discussions with our professor, we learned the complexities of training a neural network for natural language processing might be beyond what we can accomplish in the allotted time, so we scaled the project back and decided to build a model that predicts wine prices. Regardless of the difficulty, I never gave up on our original idea and sought a less-complicated way to turn my ideas into tools.

As a student of rhetorical theory and technical communication, I became fascinated with natural language processing when I came across the topic a few years back. To me, the complexities and subtleties of our communication are a defining factor in what makes us a distinct and intelligent species. Training a machine to understand language transforms communication from something that seems so organic, persuasive, and soulful into a something mechanical, ordered, and predictable. For better or worse, I see a lot of potential in the field and wanted to push myself into figuring it out. I spent a few hours researching tools and came across Tensorflow’s Universal Sentence Encoder (USE). It is a pre-trained language processing model. I read through a bit of documentation and decided to use it as a mechanism for accomplishing our initial idea.

Following a few examples, I quickly coded up a prototype in python and showed it off to the group. It blew them away! With a functional prototype representing our original idea, I shifted gears and began focusing on developing a price prediction model. Since there are so many machine learning algorithms to choose from, we split the workload and each worked on a model. They explored various models such as linear regression, random forests, and deep learning while I dug into Scikit – Learn’s ensemble, Gradient Boosting Regressor. After tuning my hyper -parameters, our model produced an R^2 of .49 and a Mean Absolute Error of 9.14. This told us that our features weren’t able to predict the price of wine very well, but they were close enough to be used as estimates. Several factors that influence the price of wine such as age and winery were not included in our model. Given more time, perhaps those are things we could have included.

With our price prediction model figured out, I put my focus back onto our recommendation engine. Although it took a few iterations to overcome minor performance hurdles, I successfully used the pre-trained model to output recommendations based on the dot product (linear algebra) of the encoded user query and the encoded wine reviews. With the functions figured out, the next challenge was to figure out how to host the models on our website. I began writing a flask app in python.

Beyond the performance issues with the recommendation engine, the second biggest challenge was putting our price prediction model into production. I had to decide how to handle the non-numeric values. Since the machine learning algorithm requires our data be numerical, I needed to figure out the best way for users to select words from a drop list, yet pass numbers into our model. I decided to assign all of our text-based categorical values a numeric value. Then I saved all of the values to our dataset instead of using a function that encoded them on the fly. I figured doing it this way would reduce the risk of a performance bottleneck.

A few late nights and countless iterations later, we successfully built a web page that hosted two working models in less than two weeks time. My team was fantastic at encouraging each other and staying positive. It was a crew of intellectually bright and lighthearted individuals that came together to produce something great, and for that I’m extremely thankful. Although it was beyond the scope of the project, the next step is to figure out a way in which we can host the tools we developed so users from anywhere online can play with them and figure out what wine to try next.

Our two models on our web page.
The output of the Wine Price Predictor.

The output from our wine recommendation engine.
Residual plots from our experimental price prediction models.
Graphs made during the exploratory data analysis phase of our project

Links to my notebooks:

https://github.com/bendgame/WineRecommend/blob/master/Create_Wine_Recommendations.ipynb

https://github.com/bendgame/WineRecommend/blob/master/Create_Gradient_Boost_Regressor_Wine_Price.ipynb

My Brief Data Tech 2019 Recap

I spent the day at a data science event held by the Minneanalytics community. I summoned my inner sponge and absorbed as much information as possible. The event was called Data Tech 2019! I listened to a few talks on machine learning, creating data lakes, and natural language processing. I took these themes away from the event:

Just like technical writing, knowing your audience is very important. Think about who will use the data. How quickly will they need it and what do they need it for? Is this data to be used for machine learning models or is it data a business analyst needs in a report every morning? These are the types of questions you should be asking to avoid turning your data lakes into data swamps, and to create a positive data culture from which to structure and plan your data governance.

Data Governance is key for successful data lakes and big data storage. Things like cataloging and securing data are of utmost importance. The data should also be structured in ways to make searching it simple and auditable. If the data cannot be easily audited, things like duplication can occur and human error might not get caught quickly.

Automation is essential to scale big data. Managing data lakes with hundreds or thousands of pipelines is not possible to scale manually. The cleaning and loading techniques need to be automated to make incorporating future data easier.

Ensembles are being increasingly used. Whether it is for evaluating features or comparing outcomes, it seems like more and more tools were incorporating the ability to generate different kinds/sets of models and allow you to select the one with the best fit. For example, one of the sessions discussed using a set of 90+ features to find machine learned models that could predict SP500 future prices. Unfortunately the speaker concluded beating the market using models trained on features derived from technical indicators might still be beyond AI.

Finding Strength

Although most of it has been easy, I’ve faced turmoil and adversity in many ways and times throughout my life. Becoming successful takes a lot of energy. Even if the rewards for doing so are unknown and beyond sight, I must look deep within myself and find the strength to accomplish what might look easy on paper. Accomplishments are the things I want to collect.

When I feel like I’ve accomplished a lot, I step out of myself and see how little I’ve done. From there I ask myself, “what more can I do?” I close my eyes and imagine what I think a successful person does. Is that person sitting around being stupefied by substances and stimulation in some sort of hedonistic ritual, or is that person honing their mind and acquiring skills through study, practice and reflection? Is that person spending their time alone and trying to be a one-man team, or is that person trying to inspire others and make connections? Is that person hiding behind delusion of grandeur, or is that person putting himself into the thick of the competition even if there is no chance at winning? I close my eyes and imagine the person I must be if I want to accomplish more. I close my eyes and imagine a better me.

It isn’t enough to ask myself, “what more can I do?” I also need to ask myself, “how can I do it?” Life is all about finding an answer to that question, and it seems like the root of all responses worth investigating begin with a bit of discomfort and struggle. When I think about the events from which I’ve grown the most, they always involve an uncomfortable amount of change. Life would likely be boring if everything was as easy as I assumed it would be.

One of the toughest parts of my adult life was recovering from the loss of my closest friend. I lost someone who knew more about me than I’ll ever remember, and it still brings me to tears to realize he’s gone. Although it feels like life will never be the same, emerging from the fog of grief left an unfathomable fire in my soul that I channel when chasing accomplishments. The burn not only reminds me to succeed, but also to never forget the people I love.

In juxtaposition to emotions of loss, struggle and discomfort can come from positive events too. My work sent me to live in Hawaii for nearly two months. Wrapped in what sounds like a dream to many was actually a package of stress. Although I had an advantage of having a few coworkers with me on the island, being that far away from my friends and family was a huge change for me. Fitting in and making friends were never things I excelled at, and in general I’m fairly shy, so I had to rediscover who I was and fall in love with putting myself “out-there” so I could develop relationships and meet people. I gained a new perspective and a lot of self-confidence on Oahu, and I came home feeling like a mature adult.

Shortly after returning from Hawaii, I met the woman I’m going to marry. Commitment is slightly terrifying, but she is amazing and pushes me to accomplish so much more than I imagined I could: A house, a dog, a technical skill-set, and better relationships with my family… She is the teammate I’ve always wanted, and she makes finding the strength within myself a little easier everyday.

Exploring Ally Financial’s API

One of my passions is analyzing stock and option data. Although I love trading, I find looking for an edge in the data to be the most fun part of the game (besides making money, of course). Since I learned about using API’s in the Data Analysis class in which I’m currently enrolled, I decided to flex my new skills and connect to a broker’s API so I can start collecting more data for my first trading algorithm.

For anyone interested using the Ally Financial API to collect and analyze data, I have put together a brief cheat sheet to help you get started making the API calls and outputting the data to a Pandas’ dataframe.

It contains the following examples:

  • Time and Sales for Stocks
  • Options Market Depth
  • Options Extended Quote

Click here to check out my Ally Financial API Cheat Sheet on my github. It is written for Python 3

#!/usr/bin/env python
# coding: utf-8

# # Ally Financial API Cheat Sheet
# * Refer to the documentation for additional routes:
# * https://www.ally.com/api/invest/documentation/getting-started/

import requests
from requests_oauthlib import OAuth1
from config import (api_key, secret, oath_token, oath_secret)

import pandas as pd
import sqlalchemy
import numpy as np

import sqlite3
from sqlite3 import Error

import matplotlib.pyplot as plt
import datetime as dt


#authentication 
auth = OAuth1(api_key, secret, oath_token, oath_secret)


# # Time and Sales for Stocks Example
# * documentation: https://www.ally.com/api/invest/documentation/market-timesales-get/ 
# * base url: https://api.tradeking.com/
# * route: v1/market/timesales.json
# * query: ?symbols=MSFT&startdate=2019-05-03&interval=1min


#url 
url = 'https://api.tradeking.com/v1/market/timesales.json?symbols=MSFT&startdate=2019-05-03&interval=1min'


#api request
response = requests.get(url, auth = auth).json()


#send to data frame and format data types
df = pd.DataFrame(response["response"]["quotes"]["quote"])
df = df.sort_values(['datetime'], ascending = False)
df['date'] = pd.to_datetime(df['date'])
df['datetime'] = pd.to_datetime(df['datetime'],  utc=False).dt.tz_convert('US/Central')
df['hi'] = df["hi"].astype(float)
df['incr_vol'] = df["incr_vl"].astype(float)
df['last'] = df["last"].astype(float)
df['lo'] = df["lo"].astype(float)
df['opn'] = df["opn"].astype(float)
df['vl'] = df['vl'].astype(float)
df.head()



#resample the time value to be greater than 1 min as needed. Example: 30 min resample for last price
df.set_index(df['datetime'], inplace = True)
df.head()
df_resample30 = df.resample(rule = '30min', label = 'right').last()
df_resample30.head()


# # Options Search Example
# * Provides market depth for options
# * Documentation: https://www.ally.com/api/invest/documentation/market-options-search-get-post/
# * base url: https://api.tradeking.com/
# * route: v1/market/timesales.json
# * query: ?symbol=MSFT&query=xyear-eq%3A2019%20AND%20xmonth-eq%3A06%20AND%20strikeprice-eq%3A140
# 
# 
# * Query breakdown:
#     * exipiration year equals 2019:
#     * xyear-eq%3A 2019
# 
#     * and:
#     * %20AND%20
# 
#     * expiration month equals 06:
#     * xmonth-eq%3A 06
# 
#     * and strike price equals 140:
#     * %20AND%20 strikeprice -eq%3A 140
# 
# 
# * Operators:
#     * lt :	less than
#     * gt :	greater than
#     * gte :	greater than or equal to
#     * lte :	less than or equal to
#     * eq :	equal to


url = 'https://api.tradeking.com/v1/market/options/search.json?symbol=MSFT&query=xyear-eq%3A2019%20AND%20xmonth-eq%3A06%20AND%20strikeprice-eq%3A140'
response = requests.get(url, auth = auth).json()


df = pd.DataFrame(response["response"]["quotes"]["quote"])
df


# # Extended Quote Example (Option)
# * Works for stocks too
# * Documentation: https://www.ally.com/api/invest/documentation/market-ext-quotes-get-post/
# * base url: https://api.tradeking.com/
# * route: v1/market/ext/quotes.json
# * query: ?symbols=MSFT190607C00140000
# 
# * Option Symbol naming convention:
#     * Underlying symbol - MSFT
#     * 2 digit expiration year - 19
#     * 2 digit expiration month - 06
#     * 2 digit expiration day - 07
#     * "C" for Call or "P" for Put - C
#     * 8 digit strike price - 00140000
#     
# 
# * Specify desired fields in the query as needed using fids: 
#     * i.e. fids=ask,bid,vol


url = 'https://api.tradeking.com/v1/market/ext/quotes.json?symbols=MSFT190607C00140000'
response = requests.get(url, auth = auth).json()


df = pd.DataFrame(response["response"]["quotes"]["quote"], index = [0])
df

Getting Trade Data from Alpha Vantage

Trading is often a game of information, analysis, and luck. Having better information than the next guy can be the difference between making a profit and losing money. I try to have as much information as I can get my hands on. Recently, a few algo traders introduced me to the website Alpha Vantage which is a free to use API for downloading market data, including 1 min time series and a bunch of indicators. The downside to using Alpha Vantage is that it only allows 5 calls per minute and 500 calls per day. Regardless, it is still a great free tool.

Since I’ve been learning how to write python scripts for the past two months in my Data Analysis class, I figured it would be a good exerciser to figure out a way to get 1 min data for a few stocks I follow out of Alpha Vantage and into my SQL database where I store the rest of my market data for my project code named Edge.

I shared my script with Reddit and a user pointed out that it is actually bad practice to write SQL directly into the script, and that I should be using Object Relational Mapping (ORM) instead. I am looking into that and will be updating the script once I have a handle on what I need to do to replace the SQL. I also know it is very bad practice to have a visible API key within the script, but this is just meant to be a “plug and play” script that is easy for a noobie like me to use. In the meantime, my python script does what I need it to do. It downloads the 1 min data as a CSV file, and then uploads that data into a table in my Microsoft SQL Server Database:

#Dependencies
import requests
import datetime as dt
import urllib.request as req
import time
import pyodbc
import csv

#Get today's datetime information
date = dt.datetime.now()

#API KEY from Alpha Vantage
api_key = ''

#value used to iterate through the syms list
i = 0

#list of stock symbols to track
syms = ['MSFT', 'SPY', 'AAPL', 'DIA', 'QQQ', 'GS', 'GE', 'IBM', 'JPM', 'XLF', 'AA', 'BABA', 'TWTR', 'XHB', 'INTC', 'C', 'CZR', 'MGM'
    'SQ','BAC', 'AMD', 'FB', 'VXX', 'TSLA', 'IWM', 'GLD', 'SVXY', 'EEM', 'FCX', 'WMT']

#variables to store year/month/day numbers to append to file name.
y = date.strftime("%Y")
m = date.strftime("%m")
d = date.strftime("%d")

#connects to SQL database
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=your server here;'
                      'Database= your database here;'
                      'Trusted_Connection=yes;')

cursor = conn.cursor()

#for loop to iterate through the symbol list
for sym in syms:
    
#sleep for 15 seconds to avoid API call limit
    time.sleep(15)
    i+=1

#Create the URL for the API call
    url = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=' + syms[i] + '&interval=1min&outputsize=full&apikey='+ api_key+'&datatype=csv'

#create the file path to save the csv
    save_path = 'your save path here'+ syms[i] + y + m + d + '.csv' 

    print(url)
  
#read the page and store it   
    get_url = req.urlopen(url)
    data = get_url.read()
    get_url.close()

#write the data to a file    
    tnsfile = open(save_path,'wb')
    tnsfile.write(data)
    
#close python connection to the file
    tnsfile.close()

#create the table name 
    tableName = syms[i] + y + m + d   

#set the database you want to use, create the table
    cursor.execute('use YOUR DATABASE HERE; Create Table ' + tableName + '(symID int IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, sym varchar(5), [timestamp] datetime ,[open] float,[high] float,[low] float,[close] float, [volume] float)')

#Open the csv file, build the query.   
    with open (save_path, 'r') as f:
        reader = csv.reader(f)
        columns = next(reader) 
        query = "insert into " + tableName +" (sym, timestamp,[open],[high],[low],[close],[volume]) values (" +"'"+ syms[i]+"'"+ ",{1})"
        query = query.format(','.join(columns), ','.join('?' * len(columns)))
        cursor = conn.cursor()
        #print(query)

#write the query to the database
        for rows in reader:
            #print(rows)
            #print(query)
            cursor.execute(query, rows)
        cursor.commit()

Before getting into writing web scrapping scripts, I was scrapping the data manually, and it was tedious and time consuming. Now I can spend more time do analytics and less time manually building datasets! School is paying off already.

A Simple Git Workflow

We started our first group project in the Data Analysis and Visualization bootcamp. Coordinating a workflow for five inexperienced people using git and github was one of the first hurdles to jump. I was appointed the team’s “git master” since I had a tiny bit of experience using it before the bootcamp began. Combining my love for troubleshooting and technical writing, I came up with a simple set of instructions for my team and wanted to share it here.

Common console commands: 
cd - change directory
mkdir - make directory
ls - view the files/folders in directory

NOTE: Exit VIM if needed ctrl + c then type :qa! and push enter
NOTE: If file is not in local repo, manually move the file into
the correct folder (outside of console)

--------------------------------------------
Managing your Local Repo
--------------------------------------------
NOTE: If you need to hard reset your local repo to match
the remote master use the following commands:
$ git fetch origin
$ git reset --hard origin/master

Undo the act of committing, leaving everything else intact:
$ git reset --soft HEAD^:

Undo the act of committing and everything you'd staged,
but leave the work tree (your files intact):
$ git reset HEAD^

Completely undo it, throwing away all uncommitted changes,
resetting everything to the previous commit:
$ git reset --hard HEAD^

--------------------------------------------
BEGIN WORKFLOW
--------------------------------------------
Clone the Repo to local machine:
$ git clone https://github.com/bendgame/Project-one.git

Make sure the local master is up-to-date:
$ git pull origin master

Create new branch:
$ git banch branch_name

Move to branch:
$ git checkout branch_name

Navigate file structure as needed:
$ ls
$ cd folder_name

Add the files to the branch:
$ git add .

Verify file:
$ git status

Commit the files:
$ git commit -m "comment"

Add branch and files to the Remote Repo:
$ git push -u origin branch_name

Go to the github website to manage pull request and merge.

Switch back to local master so you can delete the local branch:
$ git checkout master

Delete local branch:
$ git branch -d branch_name
OR
$ git branch -D branch_name

If you don't want to go to the website, you can merge your branch
to the master locally and push the new master to the remote repo:

Switch back to master branch:
$ git checkout master

Merge the branch with the local master:
$ git merge branch_name -m "comment"

Push the local master to the remote master:
$ git push origin master

Delete local branch:
$ git branch -d branch_name
OR
$ git branch -D branch_name

There you have it! Our simple git workflow used by five noobies to manage our first group project on github.