Generating Wine Recommendations using the Universal Sentence Encoder

Natural Language Processing (NLP) has fascinated me since I first read about the Turing testwhile studying rhetorical theory and technical communication in college. The complexities and subtleties of our communication always seemed like such a defining factor in what makes us a distinct and intelligent species, so training a machine to understand language transforms communication from something that can be so ambiguous, persuasive, and soulful into a something that seems mechanical, ordered, and predictable. Once I started coding, it wasn’t long before my curiosity drove me to better understand how we can use machine learning to gain new insight into natural language and derive nuances we might have missed. For example, a recent paper was published discussing how NLP was used to make new discoveries in materials science.

One of the NLP tools I’ve been playing with is the Universal Sentence Encoder (USE) hosted on Tensorflow-hub. USE is a pre-trained model that encodes text into a 512 dimensional vector. It is optimized for greater-than-word length text and is trained on a variety of data sources. There are a few different versions of USE. I choose the model that was trained using Deep Averaging Network (DAN) since it is lighter on resources than the Transformer based model. My first project using the tool was to generate wine recommendations based on the semantic similarity between wine descriptions and my search query.


The Data

The wine data encoded by the model comes from a wine review dataset found on kaggle.com. It contains around 130,000 rows of data and includes columns like country, description, title, variety, winery, price, and rating. After I put the data into a dataframe, I dropped rows that contained duplicate descriptions and rows that had null price. I also limited the data to wine varieties that had more than 200 reviews.

#import dependancies
import numpy as np
import pandas as pd
import sqlite3
from sqlite3 import Error#create a connection to the sqlite database.
conn = sqlite3.connect('db\wine_data.sqlite')
c = conn.cursor()#read the table in the database.
wine_df = pd.read_sql('Select * from wine_data', conn)#Drop the duplicate descriptions.
wine_df = wine_df.drop_duplicates('description')#drop null prices.
wine_df = wine_df.dropna(subset=['price'])#filter the dataframe to include only varieties with more than 200 reviews.
wine_df = wine_df.groupby('variety').filter(lambda x: len(x) > 200)

Reducing the data by excluding varieties with less than 200 reviews left me with 54 varieties of wine. By googling the remaining varieties, I was able to added a Color column so the user can limit their search by desired wine color.

#create a column named color.
wine_df["color"] = ""#used to update the database with the wine color. Manually updated each wine variety.
c.execute("update wine_data set color = 'red' where variety = 'Aglianico' ")#commit the update to the database so it saves.
conn.commit()#remove all the records without a color.
wine_df = pd.read_sql("select country, description,rating,price,province,title,variety, winery, color from wine_data where color in ('red', 'white', 'other')", conn)
wine_df.to_sql('wine_data', conn, if_exists = "replace")

After cleaning the data, I was left with 100,228 rows.


Setting up the Universal Sentence Encoder

The DAN based model is around 800mb, so I felt it was important to host it locally. Using the OS library, I set where the model gets cached and am able to call it from a local directory instead of downloading it each time.

import os#create the directory in which to cache the tensorflow universal sentence encoder.
os.environ["TFHUB_CACHE_DIR"] = 'C:/Users/Admin/Downloads'
download = tfhub.Module("https://tfhub.dev/google/universal-sentence-encoder/2")

After downloading the model, you will see a file appear in the directory named something like 1fb57c3ffe1a38479233ee9853ddd7a8ac8a8c47.

Creating the Functions

Even with the model downloaded, the first few iterations of the app were resource intensive and annoyingly slow. After a bit of research and revision, I decided to use a function as a means of reducing the overhead and time it takes for tensorflow to build a graph.

def embed_useT():
with tf.Graph().as_default():
text_input = tf.compat.v1.placeholder(dtype = tf.string, shape=[None])
embed = tfhub.Module('C:/Users/Admin/Downloads/1fb57c3ffe1a38479233ee9853ddd7a8ac8a8c47')
em_txt = embed(text_input)
session = tf.compat.v1.train.MonitoredSession()
return lambda x:session.run(em_txt, feed_dict={text_input:list(x)})#run the model.
embed_fn = embed_useT()#encode the wine descriptions.
result = embed_fn(wine_df.description)

Encoding all of the descriptions eats away at system resources and takes up two or more gigabytes of RAM. If you have limited access to memory in your environment, I recommend you save the numpy array of encoded values to the SQLite database. Calling the array from the database instead of the encoding it on the fly consumes more hard drive space, but it uses half of the RAM based on my testing. You can save the numpy array to the database using this solution I found on Stackoverflow:

def adapt_array(arr):
'''
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
'''
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())

def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)

# Converts np.array to TEXT when inserting.
sqlite3.register_adapter(np.ndarray, adapt_array)# Converts TEXT to np.array when selecting,
sqlite3.register_converter("array", convert_array)c.execute("create table embeddings (arr array)")conn.commit()c.execute("insert into embeddings (arr) values (?)", (result, ))conn.commit()#return the array
c.execute("select * from embeddings")
data = c.fetchone()[0]

After encoding the wine descriptions, I created a function that outputs wine recommendations by encoding a user’s query and finding the dot product of the two arrays:

def recommend_engine(query, color, embedding_table = result):wine_df = pd.read_sql('Select * from wine_data', db.session.bind)embedding = embed_fn([query])#Calculate similarity with all reviews
similarity_score = np.dot(embedding, embedding_table.T)recommendations = wine_df.copy()
recommendations['recommendation'] = similarity_score.T
recommendations = recommendations.sort_values('recommendation', ascending=False)#filter through the dataframe to find the corresponding wine color records.
if (color == 'red'):
recommendations = recommendations.loc[(recommendations.color =='red')]
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]
elif(color == "white"):
recommendations = recommendations.loc[(recommendations.color =='white')]
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]
elif(color == "other"):
recommendations = recommendations.loc[(recommendations.color =='other')]
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]
else:
recommendations = recommendations[['variety', 'title', 'price', 'description', 'recommendation'
, 'rating','color']]

return recommendations.head(3).T

Test the function:

query = "fruity, rich, easy to drink, sweet"
color = 'red'recommendation = recommend_engine(query, color)
print(query)recommendation.head(3).T

It was fun exploring all of the wine data and coming up with a some-what light weight way to generate recommendations based on a search query. I plan on continuing to explore the Universal Sentence Encoder and think of new projects to challenge myself and improve my code. Check out the code on my github here:

https://github.com/bendgame/WineRecommend

The Final Project

After six months of sacrifice and hard-work, I am proud to say I have completed the data visualization and analytics boot camp I was attending at the University of Minnesota. For my final project, I wanted to push myself and try to figure out whether or not I got my money’s worth. In the end, I realized I started the boot camp with nothing more than an understanding of SQL, and ended with the knowledge and experience of putting machine learned models into a production environment. Beyond that, I gained experience working with a variety of popular coding languages like JavaScript, Python, and SQL. I created a lot of interactive visualizations, honed my presentation skills, and collaborated with peers to design git merging strategies and workflows.

A couple months into the bootcamp, our student coordinator invited our cohort to the quarterly meetups at which alums present their best work to their peers and local recruiters. I remember seeing some pretty cool projects and meeting some brilliant people, but I always had the same question when looking at their work: How do you make money from this? I understand not everyone thinks this way, and that everyone has their reason for pursuing a particular topic; however, that question was burned into my mind as I approached my final project. I not only wanted to build fancy machine learning models, but also wanted to approach them from a business perspective instead of an academic perspective.

With my mind made up, I approached my team with the idea of creating a recommendation engine. They thought the idea sounded great and we all began speculating on topics. We sifted through a lot of datasets for potential ideas and eventually settled on analyzing a collection of 130,000 wine reviews.

We only had eight days to complete our project. The original idea was to create a tool in which a user could enter a query that describes their ideal wine, and then our model will spit out recommendations; we were an ambitious bunch! Unfortunately, after a bit more research and a couple discussions with our professor, we learned the complexities of training a neural network for natural language processing might be beyond what we can accomplish in the allotted time, so we scaled the project back and decided to build a model that predicts wine prices. Regardless of the difficulty, I never gave up on our original idea and sought a less-complicated way to turn my ideas into tools.

As a student of rhetorical theory and technical communication, I became fascinated with natural language processing when I came across the topic a few years back. To me, the complexities and subtleties of our communication are a defining factor in what makes us a distinct and intelligent species. Training a machine to understand language transforms communication from something that seems so organic, persuasive, and soulful into a something mechanical, ordered, and predictable. For better or worse, I see a lot of potential in the field and wanted to push myself into figuring it out. I spent a few hours researching tools and came across Tensorflow’s Universal Sentence Encoder (USE). It is a pre-trained language processing model. I read through a bit of documentation and decided to use it as a mechanism for accomplishing our initial idea.

Following a few examples, I quickly coded up a prototype in python and showed it off to the group. It blew them away! With a functional prototype representing our original idea, I shifted gears and began focusing on developing a price prediction model. Since there are so many machine learning algorithms to choose from, we split the workload and each worked on a model. They explored various models such as linear regression, random forests, and deep learning while I dug into Scikit – Learn’s ensemble, Gradient Boosting Regressor. After tuning my hyper -parameters, our model produced an R^2 of .49 and a Mean Absolute Error of 9.14. This told us that our features weren’t able to predict the price of wine very well, but they were close enough to be used as estimates. Several factors that influence the price of wine such as age and winery were not included in our model. Given more time, perhaps those are things we could have included.

With our price prediction model figured out, I put my focus back onto our recommendation engine. Although it took a few iterations to overcome minor performance hurdles, I successfully used the pre-trained model to output recommendations based on the dot product (linear algebra) of the encoded user query and the encoded wine reviews. With the functions figured out, the next challenge was to figure out how to host the models on our website. I began writing a flask app in python.

Beyond the performance issues with the recommendation engine, the second biggest challenge was putting our price prediction model into production. I had to decide how to handle the non-numeric values. Since the machine learning algorithm requires our data be numerical, I needed to figure out the best way for users to select words from a drop list, yet pass numbers into our model. I decided to assign all of our text-based categorical values a numeric value. Then I saved all of the values to our dataset instead of using a function that encoded them on the fly. I figured doing it this way would reduce the risk of a performance bottleneck.

A few late nights and countless iterations later, we successfully built a web page that hosted two working models in less than two weeks time. My team was fantastic at encouraging each other and staying positive. It was a crew of intellectually bright and lighthearted individuals that came together to produce something great, and for that I’m extremely thankful. Although it was beyond the scope of the project, the next step is to figure out a way in which we can host the tools we developed so users from anywhere online can play with them and figure out what wine to try next.

Our two models on our web page.
The output of the Wine Price Predictor.

The output from our wine recommendation engine.
Residual plots from our experimental price prediction models.
Graphs made during the exploratory data analysis phase of our project

Links to my notebooks:

https://github.com/bendgame/WineRecommend/blob/master/Create_Wine_Recommendations.ipynb

https://github.com/bendgame/WineRecommend/blob/master/Create_Gradient_Boost_Regressor_Wine_Price.ipynb

My Brief Data Tech 2019 Recap

I spent the day at a data science event held by the Minneanalytics community. I summoned my inner sponge and absorbed as much information as possible. The event was called Data Tech 2019! I listened to a few talks on machine learning, creating data lakes, and natural language processing. I took these themes away from the event:

Just like technical writing, knowing your audience is very important. Think about who will use the data. How quickly will they need it and what do they need it for? Is this data to be used for machine learning models or is it data a business analyst needs in a report every morning? These are the types of questions you should be asking to avoid turning your data lakes into data swamps, and to create a positive data culture from which to structure and plan your data governance.

Data Governance is key for successful data lakes and big data storage. Things like cataloging and securing data are of utmost importance. The data should also be structured in ways to make searching it simple and auditable. If the data cannot be easily audited, things like duplication can occur and human error might not get caught quickly.

Automation is essential to scale big data. Managing data lakes with hundreds or thousands of pipelines is not possible to scale manually. The cleaning and loading techniques need to be automated to make incorporating future data easier.

Ensembles are being increasingly used. Whether it is for evaluating features or comparing outcomes, it seems like more and more tools were incorporating the ability to generate different kinds/sets of models and allow you to select the one with the best fit. For example, one of the sessions discussed using a set of 90+ features to find machine learned models that could predict SP500 future prices. Unfortunately the speaker concluded beating the market using models trained on features derived from technical indicators might still be beyond AI.

Finding Strength

Although most of it has been easy, I’ve faced turmoil and adversity in many ways and times throughout my life. Becoming successful takes a lot of energy. Even if the rewards for doing so are unknown and beyond sight, I must look deep within myself and find the strength to accomplish what might look easy on paper. Accomplishments are the things I want to collect.

When I feel like I’ve accomplished a lot, I step out of myself and see how little I’ve done. From there I ask myself, “what more can I do?” I close my eyes and imagine what I think a successful person does. Is that person sitting around being stupefied by substances and stimulation in some sort of hedonistic ritual, or is that person honing their mind and acquiring skills through study, practice and reflection? Is that person spending their time alone and trying to be a one-man team, or is that person trying to inspire others and make connections? Is that person hiding behind delusion of grandeur, or is that person putting himself into the thick of the competition even if there is no chance at winning? I close my eyes and imagine the person I must be if I want to accomplish more. I close my eyes and imagine a better me.

It isn’t enough to ask myself, “what more can I do?” I also need to ask myself, “how can I do it?” Life is all about finding an answer to that question, and it seems like the root of all responses worth investigating begin with a bit of discomfort and struggle. When I think about the events from which I’ve grown the most, they always involve an uncomfortable amount of change. Life would likely be boring if everything was as easy as I assumed it would be.

One of the toughest parts of my adult life was recovering from the loss of my closest friend. I lost someone who knew more about me than I’ll ever remember, and it still brings me to tears to realize he’s gone. Although it feels like life will never be the same, emerging from the fog of grief left an unfathomable fire in my soul that I channel when chasing accomplishments. The burn not only reminds me to succeed, but also to never forget the people I love.

In juxtaposition to emotions of loss, struggle and discomfort can come from positive events too. My work sent me to live in Hawaii for nearly two months. Wrapped in what sounds like a dream to many was actually a package of stress. Although I had an advantage of having a few coworkers with me on the island, being that far away from my friends and family was a huge change for me. Fitting in and making friends were never things I excelled at, and in general I’m fairly shy, so I had to rediscover who I was and fall in love with putting myself “out-there” so I could develop relationships and meet people. I gained a new perspective and a lot of self-confidence on Oahu, and I came home feeling like a mature adult.

Shortly after returning from Hawaii, I met the woman I’m going to marry. Commitment is slightly terrifying, but she is amazing and pushes me to accomplish so much more than I imagined I could: A house, a dog, a technical skill-set, and better relationships with my family… She is the teammate I’ve always wanted, and she makes finding the strength within myself a little easier everyday.

Exploring Ally Financial’s API

One of my passions is analyzing stock and option data. Although I love trading, I find looking for an edge in the data to be the most fun part of the game (besides making money, of course). Since I learned about using API’s in the Data Analysis class in which I’m currently enrolled, I decided to flex my new skills and connect to a broker’s API so I can start collecting more data for my first trading algorithm.

For anyone interested using the Ally Financial API to collect and analyze data, I have put together a brief cheat sheet to help you get started making the API calls and outputting the data to a Pandas’ dataframe.

It contains the following examples:

  • Time and Sales for Stocks
  • Options Market Depth
  • Options Extended Quote

Click here to check out my Ally Financial API Cheat Sheet on my github. It is written for Python 3

#!/usr/bin/env python
# coding: utf-8

# # Ally Financial API Cheat Sheet
# * Refer to the documentation for additional routes:
# * https://www.ally.com/api/invest/documentation/getting-started/

import requests
from requests_oauthlib import OAuth1
from config import (api_key, secret, oath_token, oath_secret)

import pandas as pd
import sqlalchemy
import numpy as np

import sqlite3
from sqlite3 import Error

import matplotlib.pyplot as plt
import datetime as dt


#authentication 
auth = OAuth1(api_key, secret, oath_token, oath_secret)


# # Time and Sales for Stocks Example
# * documentation: https://www.ally.com/api/invest/documentation/market-timesales-get/ 
# * base url: https://api.tradeking.com/
# * route: v1/market/timesales.json
# * query: ?symbols=MSFT&startdate=2019-05-03&interval=1min


#url 
url = 'https://api.tradeking.com/v1/market/timesales.json?symbols=MSFT&startdate=2019-05-03&interval=1min'


#api request
response = requests.get(url, auth = auth).json()


#send to data frame and format data types
df = pd.DataFrame(response["response"]["quotes"]["quote"])
df = df.sort_values(['datetime'], ascending = False)
df['date'] = pd.to_datetime(df['date'])
df['datetime'] = pd.to_datetime(df['datetime'],  utc=False).dt.tz_convert('US/Central')
df['hi'] = df["hi"].astype(float)
df['incr_vol'] = df["incr_vl"].astype(float)
df['last'] = df["last"].astype(float)
df['lo'] = df["lo"].astype(float)
df['opn'] = df["opn"].astype(float)
df['vl'] = df['vl'].astype(float)
df.head()



#resample the time value to be greater than 1 min as needed. Example: 30 min resample for last price
df.set_index(df['datetime'], inplace = True)
df.head()
df_resample30 = df.resample(rule = '30min', label = 'right').last()
df_resample30.head()


# # Options Search Example
# * Provides market depth for options
# * Documentation: https://www.ally.com/api/invest/documentation/market-options-search-get-post/
# * base url: https://api.tradeking.com/
# * route: v1/market/timesales.json
# * query: ?symbol=MSFT&query=xyear-eq%3A2019%20AND%20xmonth-eq%3A06%20AND%20strikeprice-eq%3A140
# 
# 
# * Query breakdown:
#     * exipiration year equals 2019:
#     * xyear-eq%3A 2019
# 
#     * and:
#     * %20AND%20
# 
#     * expiration month equals 06:
#     * xmonth-eq%3A 06
# 
#     * and strike price equals 140:
#     * %20AND%20 strikeprice -eq%3A 140
# 
# 
# * Operators:
#     * lt :	less than
#     * gt :	greater than
#     * gte :	greater than or equal to
#     * lte :	less than or equal to
#     * eq :	equal to


url = 'https://api.tradeking.com/v1/market/options/search.json?symbol=MSFT&query=xyear-eq%3A2019%20AND%20xmonth-eq%3A06%20AND%20strikeprice-eq%3A140'
response = requests.get(url, auth = auth).json()


df = pd.DataFrame(response["response"]["quotes"]["quote"])
df


# # Extended Quote Example (Option)
# * Works for stocks too
# * Documentation: https://www.ally.com/api/invest/documentation/market-ext-quotes-get-post/
# * base url: https://api.tradeking.com/
# * route: v1/market/ext/quotes.json
# * query: ?symbols=MSFT190607C00140000
# 
# * Option Symbol naming convention:
#     * Underlying symbol - MSFT
#     * 2 digit expiration year - 19
#     * 2 digit expiration month - 06
#     * 2 digit expiration day - 07
#     * "C" for Call or "P" for Put - C
#     * 8 digit strike price - 00140000
#     
# 
# * Specify desired fields in the query as needed using fids: 
#     * i.e. fids=ask,bid,vol


url = 'https://api.tradeking.com/v1/market/ext/quotes.json?symbols=MSFT190607C00140000'
response = requests.get(url, auth = auth).json()


df = pd.DataFrame(response["response"]["quotes"]["quote"], index = [0])
df

Getting Trade Data from Alpha Vantage

Trading is often a game of information, analysis, and luck. Having better information than the next guy can be the difference between making a profit and losing money. I try to have as much information as I can get my hands on. Recently, a few algo traders introduced me to the website Alpha Vantage which is a free to use API for downloading market data, including 1 min time series and a bunch of indicators. The downside to using Alpha Vantage is that it only allows 5 calls per minute and 500 calls per day. Regardless, it is still a great free tool.

Since I’ve been learning how to write python scripts for the past two months in my Data Analysis class, I figured it would be a good exerciser to figure out a way to get 1 min data for a few stocks I follow out of Alpha Vantage and into my SQL database where I store the rest of my market data for my project code named Edge.

I shared my script with Reddit and a user pointed out that it is actually bad practice to write SQL directly into the script, and that I should be using Object Relational Mapping (ORM) instead. I am looking into that and will be updating the script once I have a handle on what I need to do to replace the SQL. I also know it is very bad practice to have a visible API key within the script, but this is just meant to be a “plug and play” script that is easy for a noobie like me to use. In the meantime, my python script does what I need it to do. It downloads the 1 min data as a CSV file, and then uploads that data into a table in my Microsoft SQL Server Database:

#Dependencies
import requests
import datetime as dt
import urllib.request as req
import time
import pyodbc
import csv

#Get today's datetime information
date = dt.datetime.now()

#API KEY from Alpha Vantage
api_key = ''

#value used to iterate through the syms list
i = 0

#list of stock symbols to track
syms = ['MSFT', 'SPY', 'AAPL', 'DIA', 'QQQ', 'GS', 'GE', 'IBM', 'JPM', 'XLF', 'AA', 'BABA', 'TWTR', 'XHB', 'INTC', 'C', 'CZR', 'MGM'
    'SQ','BAC', 'AMD', 'FB', 'VXX', 'TSLA', 'IWM', 'GLD', 'SVXY', 'EEM', 'FCX', 'WMT']

#variables to store year/month/day numbers to append to file name.
y = date.strftime("%Y")
m = date.strftime("%m")
d = date.strftime("%d")

#connects to SQL database
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=your server here;'
                      'Database= your database here;'
                      'Trusted_Connection=yes;')

cursor = conn.cursor()

#for loop to iterate through the symbol list
for sym in syms:
    
#sleep for 15 seconds to avoid API call limit
    time.sleep(15)
    i+=1

#Create the URL for the API call
    url = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=' + syms[i] + '&interval=1min&outputsize=full&apikey='+ api_key+'&datatype=csv'

#create the file path to save the csv
    save_path = 'your save path here'+ syms[i] + y + m + d + '.csv' 

    print(url)
  
#read the page and store it   
    get_url = req.urlopen(url)
    data = get_url.read()
    get_url.close()

#write the data to a file    
    tnsfile = open(save_path,'wb')
    tnsfile.write(data)
    
#close python connection to the file
    tnsfile.close()

#create the table name 
    tableName = syms[i] + y + m + d   

#set the database you want to use, create the table
    cursor.execute('use YOUR DATABASE HERE; Create Table ' + tableName + '(symID int IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, sym varchar(5), [timestamp] datetime ,[open] float,[high] float,[low] float,[close] float, [volume] float)')

#Open the csv file, build the query.   
    with open (save_path, 'r') as f:
        reader = csv.reader(f)
        columns = next(reader) 
        query = "insert into " + tableName +" (sym, timestamp,[open],[high],[low],[close],[volume]) values (" +"'"+ syms[i]+"'"+ ",{1})"
        query = query.format(','.join(columns), ','.join('?' * len(columns)))
        cursor = conn.cursor()
        #print(query)

#write the query to the database
        for rows in reader:
            #print(rows)
            #print(query)
            cursor.execute(query, rows)
        cursor.commit()

Before getting into writing web scrapping scripts, I was scrapping the data manually, and it was tedious and time consuming. Now I can spend more time do analytics and less time manually building datasets! School is paying off already.

A Simple Git Workflow

We started our first group project in the Data Analysis and Visualization bootcamp. Coordinating a workflow for five inexperienced people using git and github was one of the first hurdles to jump. I was appointed the team’s “git master” since I had a tiny bit of experience using it before the bootcamp began. Combining my love for troubleshooting and technical writing, I came up with a simple set of instructions for my team and wanted to share it here.

Common console commands: 
cd - change directory
mkdir - make directory
ls - view the files/folders in directory

NOTE: Exit VIM if needed ctrl + c then type :qa! and push enter
NOTE: If file is not in local repo, manually move the file into
the correct folder (outside of console)

--------------------------------------------
Managing your Local Repo
--------------------------------------------
NOTE: If you need to hard reset your local repo to match
the remote master use the following commands:
$ git fetch origin
$ git reset --hard origin/master

Undo the act of committing, leaving everything else intact:
$ git reset --soft HEAD^:

Undo the act of committing and everything you'd staged,
but leave the work tree (your files intact):
$ git reset HEAD^

Completely undo it, throwing away all uncommitted changes,
resetting everything to the previous commit:
$ git reset --hard HEAD^

--------------------------------------------
BEGIN WORKFLOW
--------------------------------------------
Clone the Repo to local machine:
$ git clone https://github.com/bendgame/Project-one.git

Make sure the local master is up-to-date:
$ git pull origin master

Create new branch:
$ git banch branch_name

Move to branch:
$ git checkout branch_name

Navigate file structure as needed:
$ ls
$ cd folder_name

Add the files to the branch:
$ git add .

Verify file:
$ git status

Commit the files:
$ git commit -m "comment"

Add branch and files to the Remote Repo:
$ git push -u origin branch_name

Go to the github website to manage pull request and merge.

Switch back to local master so you can delete the local branch:
$ git checkout master

Delete local branch:
$ git branch -d branch_name
OR
$ git branch -D branch_name

If you don't want to go to the website, you can merge your branch
to the master locally and push the new master to the remote repo:

Switch back to master branch:
$ git checkout master

Merge the branch with the local master:
$ git merge branch_name -m "comment"

Push the local master to the remote master:
$ git push origin master

Delete local branch:
$ git branch -d branch_name
OR
$ git branch -D branch_name

There you have it! Our simple git workflow used by five noobies to manage our first group project on github.

Thoughtful Thursday #8

Back at it again with another Thoughtful Thursday. School has kept me busy and I’ve been writing some different stuff so I took a short break from them. For those who don’t remember/know, this practice is inspired by Timothy Ferris’ 5-Bullet Friday where he lists 5 things he is keeping up on.

Timothy Ferris is a best selling author and self-proclaimed human guinea pig. If you are unfamiliar with his work, I highly recommend checking out his blog here: https://tim.blog/

What I’m Doing: I’ve been extremely busy learning to analyze and visualize data using python. School has been going great so far! In previous blogs, I’ve posted things I’ve done for homework assignments.

What I’m Missing: Since I shifted my hours back to mornings, and since school is eating up all of my free time, I have put trading on the back burner and have barely touched the markets lately. Overall, I needed a break from it anyways so I can come back with fresh eyes. I am working on some pretty cool stuff for Project Edge, and I’ve been experimenting with API calls to my broker. I can’t wait to get back into the game!

What I’m Watching: I’ve been watching old Family Guy episodes lately. Even at 32, I love cartoons. I haven’t kept up with anything after season 9, but I can’t tell you how many times I’ve gone through seasons 1-9 especially 1-5… The golden years.

What I’m Excited About: DESSERT TASTING! I am going to taste cakes with Kristen this Sunday so we can decide what to serve at the wedding. Only six months away! Also, it is almost Rigby’s birthday! I can’t believe we’ve almost had him for a year. Time sure flies.

What I’m Eating: I’ve been cooking a lot of pasta lately. Kristen and I didn’t eat it for the longest time, but lately we’ve been indulging. As an adult, I love red sauce! If kid me could see me know… He’d be disgusted.

One more for good measure:

What I’m Pondering: Carl Icahn’s summary of Anti-Darwinism in Corporate America, lol: https://oyc.yale.edu/economics/econ-252-08/lecture-15# An Anti-Darwinian Corporate America [00:06:10]

Assuming does What?

The assumption we’re all taught about assumptions is that they will make an a-s-s out of u-m-e… In other words, assuming too much can make you seem foolish. Color me foolish then because I find assumptions to be one of the most useful mental tools in my toolbox. I think they act as helpful mental shortcuts and can help maintain mental resilience. I make the following assumptions constantly; I am wrong a lot:

I assume I’ll be good at whatever I try. Why sell myself short before I even begin? When I am not good at something I feel I should be, instead of feeling discouraged, I find myself putting in more work and effort because I want my performance to sync with my mental framework. Even if my assumption is wrong and I end up being terrible at something, I know I put in effort to align my actions with my thoughts, and I don’t need to beat myself up about it.

I assume everything will be easy for me. Nothing comes without a learning curve, but believing something will be too hard before I begin is giving myself an excuse to avoid effort. For example, learning to code is one of the most challenging things I’ve done in a long time. Before beginning, I knew it was going to be challenging, but I believed it would be easy for me to overcome that challenge. I know it will take time to become fluent, but I believe it will be an easy road to travel if I pave it with patience, diligence, and honest feedback. I attribute my lack of proficiency to the steepness of the learning curve, not the difficulty of the task.

I assume that everything will be OK if my assumptions are wrong. Being wrong doesn’t have to be a blow to the ego. Just shrug it off and learn something! When I’m wrong, I consider it an opportunity to be honest with myself, and I give myself a chance to reflect. It allows me to self-correct and adapt my mental framework. I’d rather try and be wrong than be afraid and inactive.

Those three assumptions allow me to maintain a positive and resilient state of mind when I approach new tasks or need to learn new things. Although it might sound like overconfidence to some, it is an approach I use to reduce the weight of the external factors putting pressure on me. My assumptions act as mechanisms to help me align my thoughts and actions, and they act as frames for positive self talk. For example, instead of telling myself I’m terrible at writing python scripts when a concept doesn’t click, I make my assumptions and remind myself that I am good, and that it is easy, and that I am just not yet doing what it takes to perform at the level I imagine. My assumptions move me.

Two Python Scripts

For a homework assignment we had to write a couple python scripts. The first analyzes monthly financial data to find the average monthly change, and the min and max changes along with their corresponding month. The second one can be used to count votes for an election and determine the winner. Both scripts output to a text file. The second one uses Pandas Data Frames. Both written using python 3. These and more are available on my github:
https://github.com/bendgame

Find Monthly Change:

import csv
import os

#fetch the file
file = os.path.join ('..', 'PyBank','PyBank_data.csv')

#create placeholder lists for the data
months = []
net_total = []

with open (file, newline = "") as csvfile:
    readcsv = csv.reader(csvfile, delimiter = ',')

    csv_header = next(csvfile)
    
    #put the data into lists
    for row in readcsv:
        months.append(row[0])
        net_total.append(int(row[1]))
        
    #count the number of months
    month_count = len(months)
    
    #set variables for loops
    x = 1
    y = 0
    
    #average change place holder
    average_change = (net_total[1]-net_total[0])
    
    #place holder list for changes 
    changes = []
    
    #for loop to calculate month to month change and dump the values into a list
    for month in range(month_count-1):
        average_change = (net_total[x] - net_total[y])
        changes.append(int(average_change))
        x+=1
        y+=1
        
    
    #Calcuate the average monthly change and round it    
    av_mon_chng = round(sum(changes)/(month_count -1),2)

    #find the min and max change
    min_change = min(changes)
    max_change = max(changes)

    #return the index to find the positions in the list
    chng_i_min = changes.index(min_change)
    chng_i_max = changes.index(max_change)
    
    #find the months for the min and max changes
    min_chng_month = months[chng_i_min + 1]
    max_chng_month = months[chng_i_max + 1]
  

#Print the values in console

print("Financial Analysis")
print("----------------------------")
print(f"Months: {len(months)}")
print(f"Total: ${sum(net_total)}")
print(f"Average Monthly Change: {av_mon_chng}")
print(f"Greatest Increase in Profits: {max_chng_month} (${max_change})")
print(f"Greatest Decrease in Profits: {min_chng_month} (${min_change})")

#Write the output to a text file
fin_analysis = open("Financial_Analysis.txt","w")

fin_analysis.write("Financial Analysis\n")
fin_analysis.write("----------------------------\n")
fin_analysis.write(f"Months: {len(months)}\n")
fin_analysis.write(f"Total: ${sum(net_total)}\n")
fin_analysis.write(f"Average Monthly Change: {av_mon_chng}\n")
fin_analysis.write(f"Greatest Increase in Profits: {max_chng_month} (${max_change})\n")
fin_analysis.write(f"Greatest Decrease in Profits: {min_chng_month} (${min_change})\n")

 
fin_analysis.close() 

Count votes and determine a winner:

import pandas as pd
import os

# Make a reference to the csv file path
csvfile = os.path.join ('..', 'PyPoll','election_data.csv')

# read the csv file as a DataFrame
elec_data = pd.read_csv(csvfile)

df_ed = pd.DataFrame(elec_data)

#create an empty list to store vote counts as percent
percent_count = []

#Create a list of unique(distinct) candidates
candidates = list(df_ed["Candidate"].unique())

#count the number of times the candidate appears and put into a list
counts = list(df_ed['Candidate'].value_counts())

#get the total vote count
total_votes = sum(counts)

#set variable for for loop
x = 0

#for loop to calculate percentages
for candidate in candidates:
    percentage = round(counts[x]/total_votes,3)
    percentage="{:.3%}".format(percentage)
    percent_count.append(percentage)
    x+=1

#create a zipped list of data to rebuild data frame
data = list(zip(candidates, percent_count, counts))

#calculate counts to find winner
max_count = df_ed['Candidate'].value_counts()

#find the most counts
winner_count = max_count.max()

#put the zipped data into a data frame
df_data = pd.DataFrame(data)

#search the data frame for the candidate with the most counts put it into a list
winner = list(df_data.loc[df_data[2]== winner_count,0])

#rename the columns for df_data
sorted_data =df_data.columns =["Candidate |", "Percent of Votes |", "Vote Count"]

#sort the columns by vote count in descending order for clean output
sorted_data = df_data.sort_values("Vote Count", ascending = False )

#print the output into console
print("Election Results")
print(" -------------------------")
print(f"Total Votes: {total_votes}")
print("--------------------------")
print(f"{sorted_data}")
print(f" -------------------------")
print(f"Winner: {winner}")
print(f"-------------------------")

#Write the outcome data to a txt file named election_winner

election_winner = open("election_winner.txt","w")

election_winner.write("Election Results\n")
election_winner.write("----------------------------\n")
election_winner.write(f"Total Votes: {total_votes}\n")
election_winner.write(f"--------------------------\n")
election_winner.write(f"{sorted_data}\n")
election_winner.write(f" -------------------------\n")
election_winner.write(f"Winner: {winner}\n")
election_winner.write(f"-------------------------")

election_winner.close()