Work Experience
Data Scientist at National Security Agency
02/2024 - Present
- Optimized LLM inference, achieving a 3x reduction in inter-token latency and significantly accelerating response times without sacrificing model accuracy
- Deployed a performance monitoring application for self-hosted LLMs with vLLM using Grafana and Prometheus
- Trained LLMs to induce reasoning behavior using supervised fine-tuning and reinforcement learning
- Designed MLOps framework for NSA's big data analytics platform using open-source tools such as Mlflow and Evidently to support 200+ AI practitioners
- Built a reference implementation for adopting the end-to-end MLOps framework and made a website with Mkdocs to serve documentation and tutorials
Data Scientist at LMI
11/2022 - 02/2024
- Transformed and consolidated large-scale datasets with millions of records from disparate data sources using PySpark, creating reliable pipelines and data models for effective analysis and modeling
- Developed and deployed machine learning models to forecast monthly demand for different types of ammunition, improving existing forecasts by 52%
- Built an interactive dashboard using Palantir's data platform, managing the Army's ammunition supply chain throughout the entire U.S. and serving 100+ users
Projects
VideoInsight AI
Chat with your videos!
- Built a multimodal RAG application with Python that allows users to ask questions about their videos
- Implemented vector database and custom embedding model based on CLIP to optimize semantic search and retrieval accuracy
- Developed a user-friendly interface with Gradio, facilitating seamless interaction and allowing users to see the retrieved video segments
- Deployed the application with Docker on AWS Lightsail
Demand Forecasting for City Bike Rentals
A key factor in the success of bike-sharing programs is the efficient allocation of rental bikes across the city. This study aims to predict bike demand based on time differences by making hourly forecasts over a 24-hour horizon using time series, deep learning, and tree-based models.
- Generated hourly bike demand forecasts for the next day using time series and ML models in R and Python
- Built ML model with 58% error reduction compared to seasonal naive baseline forecasts
- Increased efficiency of city-wide bike allocation by making forecasts with an average error of 78 bikes / hour



Web Traffic Forecasting
Forecasting web traffic helps businesses better understand future trends and user behavior and can be leveraged to improve IT infrastructure. This analysis uses time series methods to forecast the number of daily website visits over a 30-day horizon.
- Designed time series models in R for predicting daily website visits over a 30-day horizon
- Analyzed user growth by identifying user trends and behaviors over the last 5 years
- Supported IT infrastructure planning by generating point forecasts with an average error of 9%



Loan Default Prediction
Delinquent borrowers are a major risk factor in peer-to-peer lending. This analysis implements machine learning methods to predict if a person will default based on their loan application, allowing the business to approve a higher quality of loan applicants and therefore increasing returns to investors.
- Built an interactive UI with R Shiny that offers 17+ configurable components to calculate default probability
- Developed a classification model in R to identify high risk loan applications with 93% accuracy
