Data Scientist with 5+ years of experience building models and pipelines that bridge academic rigor and real-world impact.
I specialize in statistical inference, machine learning, and optimization — with projects ranging from predicting NFL draft picks
and modeling NCAA brackets, to portfolio optimization and experimental lab evaluations.
My toolkit includes Python (Pandas, Scikit-learn, Plotly, Statsmodels), R (for teaching Probability and Statistics), and cloud tools like GCP and Looker Studio.
Whether it's identifying undervalued prospects or building AI-powered draft agents, I turn data into decisions.
View the Project on GitHub greg1997-dev/MyPortfolio
Objective: To develop a model that helped on trading challenges to select the
optimal portfolio each week to get the best profits in a return-risk trade-off.
- Multiple weeks sitting at top 3 places on best portfolios on Reto Actinver 2022.
- 5th Place National Award for Bloomberg Trading Challenge 2024.
- Worked as advisor for stock buying in Trading challenges.
- The methodology was:
- Download the stock prices with the yfinance library
- Clean the data and calculate:
- Mean Returns
- Log Returns
- Portfolio Risk
- Portfolio Returns
- Sharpe Ratio
- Create a function that created random weights for selecting a 10 stock
portfolio randomly.
- Create 100,000 random portfolios.
- Save the next best portfolio given the return.
Objective: Identify players that by their measurments at the combine would
be recommended to draft but weren’t drafted. With this I tried to look for
hidden talent.
- This project was to try to identify the most important variables for scouts to
Draft a Prospect.
- Developed hypothesis on comparing positions and their distributions.
- There is a statistical difference between positions on their Combine
measurements.
- Compared Logistic Regression vs Random Forest Classifier and provide a better
example.
- Logit with Lasso Penalty was the best model.
- 40 yd dash time, and Weight are the most important features.
- AUC=0.72
- Model recommended to draft UDFA such as Jaylen Warren and Cameron Dicker.
- Brock Purdy was a strong recommendation to draft.
📑 Retrieval Augmented Generation (RAG) and Agent with GeminiAPI for Drafting players in the NFL
- As part of the Gen AI Intensive Course Capstone 2025Q1
- Developed an LLM with a RAG to be trained on all the publicly available scouting reports regarding the 2025 NFL Draft.
- LLM able to give an assesment as well as compare and contrast players of the same position.
- Added an Agent that was capable of playing a mock draft with the user and make picks from the big board
based on the rankings and team needs.
- Leveraging the R programming language, students were able to have a deeper
understanding of Probability and Statistics concepts such as:
- Conditional Probability
- Discrete Probability Distributions
- Continuous Probability Distributions
- Analysis of Variance
- Experimental Design
- Upcoming term: On the next term, students will have a reference guide, you can read the WIP here.
Predicting NFL Matches with different ML Models and variables
- Using publicly available data like scraping tables from Pro Football Reference,
Sports History Odds and NFLFastR
- Training Data of all games since 1999 to predict the 2023 season
- 72% Accuracy Score
- Variables referenced in (Delen,2012)
were also relevant for our claim.
- Beats many state-of-the-art algorithms regarding prediction of games.
MCMC to prove lottery is strictly random or an associate distribution can be found
- Worked with Mexican Power Balls such as Chispazo and Melate Retro.
- Markov Chain Monte Carlo Simulations (MCMC) to prove such claim.
- -146 and -93 log likelihood on simulation distributions compared with real
distribution of contests.
- Failed to reject the hypothesis that Lottery contests are completely random.
- Developed an end-to-end data pipeline to provide ad hoc analytics on
evaluating Lab Workers to correctly identify blood cells through a specific
methodology.
- Whole architecture is hosted on GCP with the final product is delivered
through Looker Studio.
- Participants in this quality program are evaluated in two ways: the monthly
expert, and an expert consensus to insure an unbiased assessment.
- This project was used in the
March Machine Learning Mania 2022 - Men’s competition to predict the bracket
- Logistic Regression with CV was used for predicting the bracket
- Avg. Log Loss of the algorithm 0.68492
- Calculate probability of win for a team
- Beat the auto bracket (all teams have equal probability to win)
- Predicted St. Peter’s Peacocks upset over No.2 Seed Kentucky and No. 3 Seed
Purdue
- Took Amazon Reviews Dataset (you can find the dataset
here)
- Looked at the most common reviews
- EDA
- Created a topic classifier with an Latent Dichrilet Allocator (LDA)
- Classified possitive topics into 10 different categories based on their sentiment score
***
- The most recommended Kaggle competition to get your hands on ML
- Voter Classifier that included distinct methodologies
- Feature engineering on deck in which passenger was assigned
- Filled NULL values in numeric dimensions with average of the column
- This project graded 0.77751 on accuracy
- Analyzed a Dataset and predict if a user would renew its insurance policy or
not
- After an EDA we identified that users without a license we the most likely to
not buy an insurance
- Users with older vehicles were more inclined to buy insurance policies
- Decision Tree was the best option for classifying users that were prone to buy
an insurance policy
📅 Schedule a Call
If you’d like to chat or collaborate, feel free to book a time with me: