
Michael
Knight
Data Scientist/Machine Learning Engineer

"I have been working in Data Science since 2019 and obtained my Master’s Degree in Data Science in 2022. With a strong foundation in mathematics, I love using Machine Learning, AI, and Natural Language Processing techniques to solve complex puzzles."

01 PROFESSIONAL
MY DATA SCIENCE SKILL SET
MACHINE LEARNING
NLP
PYTHON
R
JAVA
SQL
JAVASCRIPT
NEUAL NETWORKS
DEEP LEARNING
AI
AWS
LLM


03 Experience
2025-Present
HELIOS
​Data Scientist
-
Designed and built an AI driven (Gemini-2.0, VertexAI) model that produces 80-110 word paragraph summaries explaining the predictions made for country-level wholesale and export prices for Pineapples, Grapes, and Durum Wheat across 10 different countries per year for 2025-2028, given the results of a proprietary price forecasting model (CNN) (written in Python and deployed via Google Cloud BigQuery)
-
To produce the US Wholesale explanations, designed and built an AI driven (Gemini-2.0, VertexAI) model that produces a summary paragraph for the forecasted price of each commodity in each terminal market in each country during a given year, and from this list of explanations wrote a summary paragraph for each country
-
Designed and built an AI driven (Gemini-2.0, VertexAI) model that extracts from the price explanation 1 to 2 of the primary drivers related to the price predictions each country, commodity and year pairing (writing 1-3 words per driver)
-
Collaborated with cross-functional teams across Slack, JIRA, and Confluence to enhance product performance and innovation
2023-2024
CHERRY STREET ENERGY
Data Scientist (Contractor)
-
Within a two week deadline, created linear, ridge, and LASSO regression models that predict the monthly energy usage (kWh/mo) and intensity (kWh/sqft/mo) of a building in the U.S Southeastern Region within 90% accuracy, based on six inputs (square footage, stories, building profile, year constructed, weekly operating hours, and month)
-
Optimized the best performing model (ridge) using GridSearchCV to tune hyperparameters
-
Incorporated the regression model into a SEED calculator, which, in tandem with a function that calculates Billing Demand, determines how much a client would save on energy cost by switching to Cherry Street Energy
-
Developed and designed data pipelines to support an end-to-end solution for accessing Georgia Power’s API to extract meaningful insights on commercial building data for the SEED calculator
2024-2024
VIDOORI
Data Scientist
-
Designed and built an AI driven resume grader (written in Python and deployed in AWS Lambda) that evaluates (on a scale of 0-5) candidates based on 2 pass-fail disqualifying criteria, 3 weighted scoring criteria, and 3 bonus criteria; with 90% similarity to human grading
-
Designed 2 chatbots (HR Bot, Legal Bot) with distinct voices and exclusive access to their designated data using Llama2 via Llamaindex
-
Developed cross-functional PowerPoint presentations to educate coworkers on the company’s SOTA Transformer based Deep Neural Net used to link people across two different surveys for the Census Bureau, as well as other AI / ML /NLP models and techniques
-
Precisely tracked and maintained project timelines cross functionally in JIRA tickets through Agile methodology and Scrum practices
2021-2023
AMERICAN UNIVERSITY
​Graduate Research Assistant (Machine Learning Engineer)
-
Developed human assisted machine learning and natural language processing (NLP) approaches to infer information about chemical compounds from highly technical open literature sources
-
Enhanced existing machine learning techniques to predict 9 different properties of CNOHF chemical molecules from their molecular structures for 439 unique chemical compounds
-
Designed and implemented nested K-fold cross validation on a Kernel ridge regression (KRR) model using radial basis function (RBF) kernel mapping to find parameters that would give the best possible Mean Absolute Error (MAE) score for the model when using the 439 vectors of the 28 dimensional Sum Over Bonds featurization method as the feature matrix (X) and the 9 chemical properties as the target vectors (y1-y9)
-
Designed and implemented convolutional neural networks within PyTorch and PyTorch_Geometric to create neural fingerprints for these compounds based on their graphical representations (as generated using RDKit)
2023-2023
CAREFORGE AI
Data Scientist
-
Built and maintained a secure and organized data repository using PostGreSQL 16, ensuring that data integrity and accessibility are maintained throughout
-
Trained and optimized machine learning models enriched with key terms and search strings tailored to the platform's specific needs
-
Worked with the advisory boards and used NLP techniques to extract insights, automate processes, and enhance user interactions on the platform, ensuring the most relevant and accurate responses to user input
-
Set up local large language models (LLMs) development pipeline

04 education
2021-2022
AMERICAN UNIVERSITY
Washington, DC
Master of Science, Data Science
2019-2019
GENERAL ASSEMBLY
Data Science Immersive (full-time)
Computer science bootcamp (data science)
2016-2017
COMMUNITY COLLEGE OF PHILADELPHIA
Philadelphia, PA
​Additional computer science courses (Java)
2009-2011
UNIVERSITY OF MARYLAND
College Park, Maryland
​Additional mathematics, computer science courses (Javascript)
2001-2006
BARD COLLEGE
Annandale-on-Hudson, NY
​Bachelors of the arts, mathematics
Some computer science courses






