2018-12-07 · 4 min read
It was part time (~15 hours per week) and I worked remotely with the NYC Credit Monitoring team within Capital One's Center for Machine Learning organization. Most coworkers were working in New York City.
The Credit Monitoring team is building a machine learning pipeline, called Malcolm, for use amongst data science and engineering teams within Capital One. Data scientists and engineers can submit a machine learning model to the pipeline and glean insights on their model. Malcolm can help engineers find features that could be strong candidates for further feature engineering, discover correlation between features and residuals, determine whether a model is over or underestimating risk at a feature level, and much more.
Malcolm is a work in progress but is currently used by two of the largest asset generating programs within Capital One's Card organization (presumably for dogfooding). The Credit Monitoring team is working on making Malcolm ready for production so it can be used by other teams besides two teams under Card.
I worked on a web app for Malcolm (
The Malcolm pipeline deposits insights in AWS S3 buckets in the form of mostly html, csv, png, and json files and then
malcolm-web displays these results for engineers. Before I started as an intern, there was one developer working on the web app. The developer left Capital One and, apart from a README depicting the folder structure and how to run the web app (the usual
npm run build), there was little knowledge transfer.
The web app is built with React (front end) and Flask (back end). Flask is used to fetch assets from S3 to be served to the front end and as a proxy for authenticating with S3 (so the secret/password for S3 authentication is on the server and not in the user's browser). React is used to display assets from Flask.
I added new components and routes to the front end and new API endpoints to the back end. The below is a detailed list of additions and changes made.
Front End Changes
Back End Changes
I also went through some of Andrew Ng's machine learning course on Coursera and made a machine learning model to predict charge offs based on credit bureau data. It wasn't very accurate (AUROC = 0.55) but it was
marginally better than flipping a coin. I used Pandas to manipulate and preprocess the credit bureau data then scikit-learn to create a model using a random forest classifier.
Since I was working remotely, almost all communications with the team were through Slack (messaging) and Zoom (video calling). Slack is the most complete chat application I've used. It makes GroupMe look primitive. But sometimes I was unsure whether a coworker had a positive, neutral, or negative reaction to something I said because some emojis in some contexts are ambiguous.
I think there should have been code reviews so that I learn to write better code and so other engineers have a better time maintaining and developing with the code I wrote.
Coworkers were very friendly and I had a good experience.