Please find below the list of projects I have done during my undergraduate and graduate studies and during any internships in that time period.
Data Science for Business | New York University
Feb 2023 – May 2023
Developed a system to analyze and predict the success of future restaurants in the New York City tri-state area based on attributes such as price, location, and category. Using data from the Yelp API, we aimed to create recommendations for existing restaurants to improve their success. Due to the unavailability of exact revenue figures, we used a proxy for success that took into account both restaurant ratings and turnover, in order to better capture the profitability of a given venture.
Responsible Data Science | New York University
Mar 2023 – May 2023
Conducted technical audits on the top 2 Automated Decision-Making System (ADS) submissions in the MRI and Alzheimer’s categories of the Kaggle competition, focusing on early Alzheimer’s diagnosis. The first ADS was trained on both OASIS-1 and OASIS-2 datasets, while the second ADS was trained solely on the OASIS-2 dataset. The OASIS-1 and OASIS-2 datasets provide cross-sectional and longitudinal MRI data, respectively. Our analysis aimed to address biases and errors in these healthcare systems and improve their accuracy and efficiency.
Cloud and Machine Learning | New York University
Nov 2022 – Dec 2022
Investigated the deployment of a deep learning model using MLFlow on Kubernetes clusters across three major cloud platforms: IBM Cloud, Google Cloud Platform (GCP), and Amazon Web Services (AWS). We analyzed the performance of the model in terms of training time, job execution time, test loss, and data loading time on each platform, using MLFlow metrics and tracking. Our findings provided insights into factors that may influence the choice of a cloud platform for MLflow deployment.
DevOps | New York University
Sep 2022 – Dec 2022
The deployment of a Kubernetes container-based RESTful API Flask microservice that performs CRUD operations on customer data for an e-commerce application. The deployment was done through a CI/CD pipeline using GitHub, with agile planning. The customer resource includes basic information such as first and last names, a unique identifier, and possibly a user id and password, as well as at least one address. A subordinate REST API is required to store one or more addresses, and the customer API should allow the activation and deactivation of customer accounts.
Data Analytics and Visualization for Healthcare | New York University
Mar 2022 – May 2022
Explored the correlation between socioeconomic factors such as GDP, Human Development Index, hospital infrastructure, stringency index, and Covid-19 deaths and infection fatality rates. Data from 177 countries were analyzed, with a focus on the top three countries with the highest case numbers: USA, Brazil, and India. Machine learning algorithms were employed to model fatality rates and assess the credibility of the chosen factors in predicting Covid-19 fatalities. The study concluded with proposals for future research topics to better predict infection fatality rates and aid in pandemic preparedness.
Data Science Intern | Axians
Jul 2021 – Aug 2021
Working with Axians’ Location Intelligence Team, we developed algorithms using satellite images to determine whether vegetation growth will interfere with power line construction. This will involve using machine learning to identify and monitor power lines, geographical features, obstructions, and other changes in the landscape. This project will involve a lot of Random Forest based ensemble methods and segmentation to do pixel-based classification.
Applied Econometrics Capstone | University of San Francisco
Feb 2021 – May 2021
How the pandemic affected the change in the flow of deposits in banks across the United States. The paper attempts to do an analysis of the changes in the number of Total Deposits at multiple banks across the US taking into account the impact of the recent novel coronavirus, COVID-19. Looking at the effect of the change in total deposits with the COVID-19 pandemic. This will be using panel data from multiple economic and banking alongside public health measurements to see the overall effect. It would be interesting to see how the economic lockdown affects funds inflow and outflow in banks it can show debit and credit patterns and the simple net change can tell a lot about the overall economic activity for that specific area in relation to the covid cases
Econometrics of Financial Markets | University of San Francisco
Sep 2020 – Dec 2020
The paper attempts to do a time series analysis of stock prices taking into account the impact of the recent novel coronavirus, COVID-19, and market trends in the form of stock data of companies that have been very popularized and have seen tremendous growth since the downturn of the stock market in early 2020. It will be interesting to see how this pandemic that has affected the whole world, affects our financial markets. For this paper, I will focus on three returns from three companies that have had a lot of momentum and growth in the stock market for 2020. I decided to look at one stock from each sector of the market. Zoom represents the communications sector, Moderna represents the healthcare sector and Salesforce represents the technology factor.
Machine Learning | University of San Francisco
Oct 2020 – Dec 2020
This video is about our final project in our Machine Learning class in which we attempted to create a model that attempts to predict the locations of optimally controlled wildfires using previous wildfire locations and did a lot of data processing, geo hashing the target variable, label encoding, and then used the random forest, classification, XGBoost, and neural nets.
Senior Team Project | University of San Francisco
Sep 2020 – Dec 2020
We used React, D3, and ChartJS to build a Healthcare Dashboard website to visualize data from the MIMIC-III Clinical Database in a simple and interactive way. The Dashboard’s visualization capabilities provide easy access to clinical data for healthcare researchers, hospital administrators, and data enthusiasts who do not have a computer science background, for example, individuals seeking to obtain insights from correlations between different clinical variables or through other data-based analyses that can be visualized.
Aug 2020 – Nov 2020
Built iOS apps using Xcode using navigation controls like table views, tab bars, custom cells, collection views, modal segues, and tap gestures and explored external libraries with Cocoapods, networking with authentication APIs, and parsing for the backend and UI through bar input and image.
Data Visualization | University of San Francisco
Econometrics | University of San Francisco
Our model attempts to analyze the impact of the recent novel coronavirus, COVID-19, and market trends in the form of stock data in stata. The goal of our project is to examine the explanatory power of those independent variables of the various publicly traded corporations through the adjusted closing price S&P 500. The index was selected to represent a variety of different aspects of the market, as we wanted to be able to have a broad approach in order to see the overall effect and uncover the explanatory power of Coronavirus data in relation to the S&P 500.
Introduction to Tableau | University of San Francisco
Apr 2020 – May 2020
To find factors in accident data over the course of five years across the country. Try and solve this issue where over 1.25 million people die every year. Car accidents can lead to a lot of mental, physical, and emotional trauma, and after-effects create a lot of costs for private insurance companies, health care, and the government. It costs insurance companies billions of dollars per year. We wanted to look at factors, through our data, in order to see the causes and determinants of accidents and we wanted to analyze those factors to see if we can find trends and patterns Our audience is mainly trying to solve the issue to reduce the cost for insurance companies.
Shell Command Line Interface
Operating Systems | University of San Francisco
Mar 2020 – Apr 2020
I implemented a C-based command-line interface shell, that can support direct user input and scripting commands. I created built-in commands running cd, history, jobs, and pipes. I also supported the up-down keys in the shell in order to go through the history, and also had tab completion. This is an interactive shell and uses multiple structs and data structures in order to store and manipulate the data.
Computer Architecture | University of San Francisco
Oct 2019 – Dec 2019
A C-based program that can execute ARM machine code by emulating the register state of an ARM CPU and emulating the execution of ARM instructions. Some notable features that are included are a representation of the register and the stack, the ability to emulate ARM functions for data processing and storing operations, and a cache simulation for directed-mapped instructions to help visualize how a cache works. It also included a dynamic analysis for function execution which reports all the processes through the command line such as the instructions count, number of branches taken, and cache ratios. We also ended up reimplementing a similar logic using digital design to run the c code and the arm code from that previous logic.
Software Engineering | University of San Francisco
Feb 2019 – May 2019
A Java-based implementation of an in-memory search engine that is able to process user input queries and output ranking-based results on a web page with a SLOC of 2000. Designed the back end using the inverted index data structure. Implemented a partial and exact search system. Used multithreading and work queues. Supported JSON output. Used the Bulma CSS library for the front end. There is also a web crawler that parses the text and strips HTML using regex. Finally, sockets are used in order to implement a Jetty-based server.