I'm Patrick Salsbury

and I like .

See Resume

Who am I?

My name is Patrick Salsbury and I am a data-driven specialist with a strong foundation in data science and engineering. I have built and deployed high-performance data systems, scalable pipelines, robust databases, and analytical tools to enable organizations to make smarter decisions. My passion for creating meaningful impact is reflected across my work experiences where I have been responsible for leading data projects from start to finish - transforming the way existing teams operate using data.

Skills

Languages:
Python, SQL, Java
Databases:
Iceberg, ClickHouse, PostgreSQL, MySQL
Libraries:
Pandas, PySpark, FastAPI, Dash, DuckDB, Sci-Kit Learn, Tensorflow, SciPy, Plotly, Seaborn
Softwares:
AWS, Azure, Jira, Git, Databricks, Power BI, Tableau

Experience

Data Engineer

Tesla - Palo Alto, CA
January 2025 - Present
  • Lead the proof-of-concept, design, and development of a ClickHouse database for time-series data generated from 100+ hardware devices, achieving 20x faster analytical queries compared to Iceberg; this enabled engineers to perform data analysis faster, increasing productivity.
  • Deployed a REST API using FastAPI with Pydantic to enable 100+ devices to store test metadata and metrics efficiently.
  • Designed and implemented a MySQL database to model the logging, tracking, and data analysis of vehicle part components for a wide range of reliability tests the Hardware Test team conducts.
  • Built and maintained scalable data pipelines using Airflow and Spark to process petabytes of vehicle data and terabytes of hardware test data, integrating MySQL, S3 storage, Iceberg datasets, and external APIs to fulfill requests for various stakeholders.

Data Science Intern

Tesla - Palo Alto, CA
June 2024 - December 2024
  • Developed a comprehensive internal data analytics application for the hardware test team, enabling them to monitor device usage, analyze key performance indicators, and get notified of statistically significant anomaly data points during live tests; using Plotly Dash hosted on a Linux server.
  • Analyzed over 10 billion entries of fleet data to identify root causes of vehicle fault errors; went forward and created an interactive dashboard using Bokeh Panel and Trino to showcase these different key performance indicators and allow users to identify issues themselves.
  • Collaborated with test engineers and project managers from various teams to understand and meet various data goals ranging from ad-hoc analysis reports to entire new dashboard creations.

Data Science & Engineering Intern

EverCharge - Palo Alto, CA
June 2023 - September 2023
  • Identified bugs, hardware issues, and installation problems that were previously unknown affecting 5-10% of all products by performing time series analysis, inferential analysis, and anomaly detection.
  • Engineered Retool dashboards to showcase data validation and integrity issues while optimizing PostgreSQL database queries.
  • Published 3 different data science reports that communicated answers to stakeholders’ questions and highlighted areas of concern.
  • Initiated a proof-of-concept project that involved training an ML anomaly detection model to identify abnormal device behavior.

Computer Science Coach

TheCoderSchool - Encinitas, CA
February 2023 - March 2023
  • Coached students of levels ranging from Elementary to Highschool within the topics of computer science and software engineering
  • Designed personalized curriculum including projects and learning paths to support individuals from different backgrounds and experience
  • Supported the development of apps in Python and Scratch over 4-6 week time while simultaneously teaching the fundamentals of coding

Data Science Intern

HM Electronics - Carlsbad, CA
June 2022 - September 2022
  • Analyzed customers device data improving the quality of production and customer relationships by showcasing areas of concern.
  • Led sprint presentations to showcase findings using Python, SQL, and Power BI to influence the decision-making of the product managers and software engineers.
  • Collected semi-structured console log data from 100+ customer devices to ingest into a Azure Blob Data Lake.
  • Automated the ETL process on 30+ million raw records of data at once using Apache Spark on Databricks(Microsoft Azure).
  • Predicted customer device failure using Classification and Clustering ML models to isolate variable indicators for tech support.

Mathematics Tutor

De Anza College - Cupertino, CA
August 2019 - April 2021
  • Instructed mathematical subjects such as Precalculus, Calculus, Differential Equations, and Linear Algebra to college students
  • Administered individual study sessions and group workshops ranging anywhere from 5-10 people at once
  • Maximized individual tutee’s test results up to the 90%+ range and increased rates of passing by more than 50%.

Education

University of California, San Diego

September 2021 - Present

B.S. - Data Science

Relevant Coursework:
Data Management, Machine Learning Foundations, Intro to Deep Learning, Applications of Data Science, Data Analysis & Inference, Business Analytics
Involvements:
- Delta Sigma Pi : President, Chancellor, Vice President of Scholarships & Awards
- Data Science Student Society : Member

De Anza College

September 2018 - June 2021

A.S. - Computer Science

Relevant Coursework:
Data Science Fundamentals, Data Structures & Algorithms, Object Oriented Analysis and Design
Involvements:
- Honor Society : Member
- Computer Repair Technician

Additional Education

IBM Data Engineering (Coursera)
Skills:
MySQL, PostgreSQL, IBM DB2, Apache Spark, Apache Airflow
Google Data Analytics (Coursera)
Skills:
Python, SQL, Excel, Tableau, R

Projects

Mitigating Gender Bias In Coronary Heart Disease Prediction

(Capstone Project)
December 2024
  • Designed and implemented a feed-forward neural network to predict Coronary Heart Disease (CHD) using CDC’s NHANES dataset with over 37,000 patient records and 35 key features.
  • Integrated an adversarial debiasing framework to mitigate gender bias by penalizing correlations between predictions and gender.
  • Improved balanced accuracy scores from 0.71 to 0.77 and reduced equal opportunity difference by 95.6%.
  • Contributed a reusable machine learning framework for bias mitigation applicable to broader healthcare AI models.

E-commerce Recommender System

December 2023
  • Developed a machine learning model catered for an e-commerce recommender system using Sci-kit Learn, H2o AutoML, and tensorflow
  • Performed exploratory data analysis, feature engineering, model training and validating, and hyperparameter tuning in order to obtain a model with an accuracy, F1-score, and ROC-AUC of 0.71, 0.72, 0.79 respectively.
  • Deployed the finalized Collaborative Filtering Tensorflow model using Flask and HTML to emulate a real-world recommender system.

Real-Time Weather Data Dashboard

April 2023
  • Constructed dynamic dashboards visualizing real-time weather updates using Tableau, MongoDB, and weather APIs.
  • Highlighted temperature, humidity, and other attributes using Apache Airflow to build an ETL batch processing pipeline that ingested, transformed, and loaded the data while performing various quality checks.

Popular Youtube Videos Title Generator

March 2023
  • Utilized Youtube APIs to analyze recent top 50 trending videos and predict a new possible trending video title.
  • Established an ETL pipeline to populate an SQLite database after processing words using Pandas and NLTK.
  • Generated possible trending video titles using sentence linguistics, unigram NLP models, and N-gram NLP models.

Recipe Ratings Analysis and Predictor

February 2023
  • Analyzed 234,429 different recipe reviews given on Food.com to discover underlying associations and trends regarding rating.
  • Concluded statistically significant results using hypothesis testing and after performing exploratory data analysis.
  • Developed a Decision Tree model using CV grid search after testing models using SKLearn pipelines.

Best Companies to Work For

July 2021
  • Showcased the top 500 best companies to work for in 2021 according to Forbes.com using Python and SQL.
  • Extracted and transformed data using web-scraping and data processing to store into an SQLite3 database.
  • Showcased the top 500 best companies to work for by utilizing embedded-SQL queries and Matplotlib.

A Programmer's Pay Analysis

June 2021
  • Performed EDA using Python to investigate a programmer’s salary and contributing factors
  • Assembled Pandas, NumPy, and Matplotlib to clean, query, and visualize over 64461 data entries.
  • Concluded positive associations between pay and hobbyist coding, language, education level, and experience.

Whether you are looking to fill a position for a role you think I would be a great fit for or whether you simply just want to chat, feel free to reach out! I always love to meet new people and learn new things so please connect with me on LinkedIn or contact me through email. Thanks!

Helped designed by BootstrapMade