Project List

First Hackathon Submission - LOCAL POACH

idea: a chrome extension that would recommend local restaurants and food establishments in favor of market dominating chains.

This application essentially just scraped the user’s search terms, leveraged google’s language detection API to determine if those terms were food related, and queried the Yelp FusionAPI for restaurants. One major caveat with this project was the absence of location related data.

https://devpost.com/software/localpoach

https://github.com/a8-s/Localpoach/tree/main/LocalProach

Second Hackathon Submission - ECO SWAP

idea: a chrome extension that at a click of a button, you can get recommended artisan hand-made versions of the item you’re shopping for on amazon (or whichever e-commerce).

This application essentially scraped the image data from the page the user is currently shopping on. Then we’d parse such image using the Google Vision API to map the desired image to search terms we can use. These produced search terms would then be used to query the Etsy’s Developer API for recommendations from online artisans from which the user can redirect to.

https://devpost.com/software/ecoswap

https://github.com/a8-s/Ecoswap

Internship Project 1 - DIRECT BAN

idea: when I first joined the fraud-services team as an intern we only supported bans through our automated pipeline which was cumbersome in the case we find specific violating identifiers/users.

This feature itself was an additional endpoint to an http-service, “directBan/”, which decoded a minimal ban request and through some business logic persist it to the right keyspace to place a ban. This taught me how to work with http-services and leverage libraries that interface over various data stores.

Internship Project 2 - Event Processing w/ Flink

idea: internally Disney Streaming has a notion of client-side events which were persisted a kinesis stream. One of our goals was to reduce false positives in ban decisions, and my immediate thought was certain client-side interactions can indicate legit users. In other words my idea was fraudulent users likely won’t be watching the latest season of Mandalorian.

This project got exponentially complex because different events mean different things, and even more so when in conjunction with each other. This is why I decided Apache Flink would be most appropriate. I essentially laid the groundworks for event aggregations, and got this project kicked off before the end of my internship.

Class Software Project - ROCKET RECIPES

Idea: a CRUD app for persisting recipes

This application pretty much just interfaced over a firebase database for persisting recipe related data. I was the project manager for this team: working with stakeholders (class TAs), assigning work, creating culture, and much more. While this seemed at first like a dirt easy project, in actuality this became very interesting because not only did we have to introduce user-specific data, recipe data actually has a particular uniform shape. In retrospect having well-defined models that represented the recipe data would have made this process easier.

Deployed site (until 2026) - https://www.rocket-recipes.com

https://github.com/cse112-sp22-group3/rocketrecipes

Work Contribution - Docusaurus Documentation

Idea: our documentation wasn’t the greatest on API Fraud Services, and for the longest time I’ve been pondering what to do with it without introducing too much overhead.

I discovered docusaurus would be the best tool for us to use, which allowed me to build a static website layer that interfaces over our existing markdown documentation. This provided us the ability to write our documentation the way we’ve always been doing it, but enabled it to be more presentable.

Work Contribution - Automated Thresholds

Idea: We have something called thresholds, which in short narrows which aggregates we need to take a closer look at for fraud. The values of these thresholds were ML-derived, but I noticed every time we updated a threshold we’d need to re-sync with ML. My thought was why not automate this?

This was a very long standing project that took a ton of stakeholder input, but eventually it was implemented. I essentially set up a separate pipeline where the ML team would persist live threshold updates to s3, we’d parse such updates with a lambda function and write it to a separate s3 bucket. From which our apache flink appli  

The below is a fair visual of how this works with API request aggregates. Now imagine was derived based on live traffic vs a static value. That’s the idea of automated thresholds.

Work Contribution - Associated Identifiers

Idea: I was observing various devices that had an excessive number of accounts which gave me the idea of leveraging associated identifiers as a datapoint in ban decision making. And eventually lay ground works for tracking account sharing.

This feature went through repeated iterations, at first I thought this could be a direct signal which we immediately take action on after aggregating enough context. When in reality it turned into simply a factor we use in a future ml decision model, which taught me the valuable lesson of project pivoting.

> df: pyspark.sql.dataframe…

ip: 1.2.3.4

accounts: 378

devices: 20

Work Contribution - Ipv6 Subnet Problem

Idea: A frequently overlooked feature of ipv6 is the ability to at any time change your interface identifier and pretend to be a completely different person. This in my opinion is the most difficult problem we face in fraud today, because it’s commonly abused by bad actors.

At first I implemented this in a way that would aggregate Ipv6 addresses by ignoring the interface identifier (masking it) and then handle it like any other identifier. This obviously was a naive solution, as we observed fraudulent users circumventing our bans by regenerating their interface identifiers repeatedly. I then devised a solution that would not only directly ban the specific ipv6 addresses, but also create an aggregate count of how many times that subnet has been banned. From which after a certain amount, we would just ban the entire subnet.