top of page

Michael 

Knight

Data Scientist/Machine Learning Engineer
headshot smaller.jpg

"I have been working in Data Science since 2019 and obtained my Master’s Degree in Data Science in 2022. With a strong foundation in mathematics, I love using Machine Learning, AI, and Natural Language Processing techniques to solve complex puzzles."

01 PROFESSIONAL

MY DATA SCIENCE SKILL SET

MACHINE LEARNING

NLP

PYTHON

R

JAVA

SQL

JAVASCRIPT

NEUAL NETWORKS

DEEP LEARNING

AI

AWS

LLM

PROFESSIONAL 

03 Experience

2025-Present

HELIOS

​Data Scientist

  • Designed and built an AI driven (Gemini-2.0, VertexAI) model that produces 80-110 word paragraph summaries explaining the predictions made for country-level wholesale and export prices for Pineapples, Grapes, and Durum Wheat across 10 different countries per year for 2025-2028, given the results of a proprietary price forecasting model (CNN) (written in Python and deployed via Google Cloud BigQuery)

  • To produce the US Wholesale explanations, designed and built an AI driven (Gemini-2.0, VertexAI) model that produces a summary paragraph for the forecasted price of each commodity in each terminal market in each country during a given year, and from this list of explanations wrote a summary paragraph for each country

  • Designed and built an AI driven (Gemini-2.0, VertexAI) model that extracts from the price explanation 1 to 2 of the primary drivers related to the price predictions each country, commodity and year pairing (writing 1-3 words per driver)

  • Collaborated with cross-functional teams across Slack, JIRA, and Confluence to enhance product performance and innovation

2023-2024

CHERRY STREET ENERGY

Data Scientist (Contractor)

  • Within a two week deadline, created linear, ridge, and LASSO regression models that predict the monthly energy usage (kWh/mo) and intensity (kWh/sqft/mo) of a building in the U.S Southeastern Region within 90% accuracy, based on six inputs (square footage, stories, building profile, year constructed, weekly operating hours, and month)

  • Optimized the best performing model (ridge) using GridSearchCV to tune hyperparameters

  • Incorporated the regression model into a SEED calculator, which, in tandem with a function that calculates Billing Demand, determines how much a client would save on energy cost by switching to Cherry Street Energy

  • Developed and designed data pipelines to support an end-to-end solution for accessing Georgia Power’s API to extract meaningful insights on commercial building data for the SEED calculator

2024-2024

VIDOORI

Data Scientist

  • Designed and built an AI driven resume grader (written in Python and deployed in AWS Lambda) that evaluates (on a scale of 0-5) candidates based on 2 pass-fail disqualifying criteria, 3 weighted scoring criteria, and 3 bonus criteria; with 90% similarity to human grading

  • Designed 2 chatbots (HR Bot, Legal Bot) with distinct voices and exclusive access to their designated data using Llama2 via Llamaindex 

  • Developed cross-functional PowerPoint presentations to educate coworkers on the company’s SOTA Transformer based Deep Neural Net used to link people across two different surveys for the Census Bureau, as well as other AI / ML /NLP models and techniques

  • Precisely tracked and maintained project timelines cross functionally in JIRA tickets through Agile methodology and Scrum practices

2021-2023

AMERICAN UNIVERSITY

​Graduate Research Assistant (Machine Learning Engineer)

  • Developed human assisted machine learning and natural language processing (NLP) approaches to infer information about chemical compounds from highly technical open literature sources

  • Enhanced existing machine learning techniques to predict 9 different properties of CNOHF chemical molecules from their molecular structures for 439 unique chemical compounds 

  • Designed and implemented nested K-fold cross validation on a Kernel ridge regression (KRR) model using radial basis function (RBF) kernel mapping to find parameters that would give the best possible Mean Absolute Error (MAE) score for the model when using the 439 vectors of the 28 dimensional Sum Over Bonds featurization method as the feature matrix (X) and the 9 chemical properties as the target vectors (y1-y9)

  • Designed and implemented convolutional neural networks within PyTorch and PyTorch_Geometric to create neural fingerprints for these compounds based on their graphical representations (as generated using RDKit)

2023-2023

CAREFORGE AI

Data Scientist

  • Built and maintained a secure and organized data repository using PostGreSQL 16, ensuring that data integrity and accessibility are maintained throughout

  • Trained and optimized machine learning models enriched with key terms and search strings tailored to the platform's specific needs 

  • Worked with the advisory boards and used NLP techniques to extract insights, automate processes, and enhance user interactions on the platform, ensuring the most relevant and accurate responses to user input

  • Set up local large language models (LLMs) development pipeline

EXPERIENCEO
EDUCATION

04 education

2021-2022

AMERICAN UNIVERSITY

Washington, DC

Master of Science, Data Science

2019-2019

GENERAL ASSEMBLY

Data Science Immersive (full-time)

Computer science bootcamp (data science)

2016-2017

COMMUNITY COLLEGE OF PHILADELPHIA

Philadelphia, PA

​Additional computer science courses (Java)

2009-2011

UNIVERSITY OF MARYLAND

College Park, Maryland

​Additional mathematics, computer science courses (Javascript)

2001-2006

BARD COLLEGE

Annandale-on-Hudson, NY

​Bachelors of the arts, mathematics

Some computer science courses

CONTACT
CONTACT

Thank you for visiting my website.  If you have any questions or would like to discuss any opportunities, you can reach me here.  I look forward to working with you.

​

mknight4714@gmail.com

Tel: 202-747-4509

Thanks for submitting!

  • LinkedIn
  • GitHub
  • Medium
bottom of page