Machine Learning Engineer at Noble.AI
Jan’18 – May’18
Built offline ML Pipeline to extract & structure information in semi-structured docx as a part of UIE using Transfer Learning.
Worked on preprocessing the documents and creating client visualization for the unstructured documents presented to clients
Created Data Visualization for R&D experiment dataset showing various issues like variance in the dependent variable.
Built first working MVP for Intelligent Recommendation Engine.
Tools Used: Python, sklearn, matplotlib, luminoth, Tensorboard, Django
Real Time Audio Event Detection on Edge (RA - Prof Yuvraj Agarwal, Synergy Labs)
Jan’18 – May’18
Built from scratch the entire ML and Data Pipeline, stages include – Feature Extraction, Feature Engg, Hyper Parameter Tuning etc.
Ran Multiple Experiment using classical ML algorithms like Logistic Regression and SVM’s automatically detect Audio Events like Vacuum Cleaner, Drill Machine, Faucet Running etc.
Built a parallel pipeline running multiple experiments for each label tuning hyperparameter.
Performed Data Analysis to debug ML algorithm performance using dimensional reduction algo like PCA.
Tools Used – python, librosa, sklearn, jupyter.
Speech Recognition using Wall Street Journal Data (Professor Bhiksha Raj)
Jan’18 – May'18
Used the WSJ labelled dataset at frame and phoneme level to recognize unlabeled speech signal.
Built a 3 layer Neural Network on frame level data to train & make predictions resulting in accuracy of 56% for 136 labels.
Built a 4 layer CNN Model on phoneme level data to train and make predictions resulting in 80% accuracy for 46 labels.
Preprocessed data to deal with issues like variable length phoneme representation for CNN inputs.
Built an end-to-end ASR using Listen-Attend-Spell Architecture with the CMUSphinx language model.
Tools Used – Tensorflow, Pytorch, Python
Audio Forensic for Maritime Recognition (Carnegie Mellon University – Prof. Rita Singh and Prof. Bhiksha Raj)
Aug’17 – Dec’17
Built a system to automatically identify maritime audio signatures like Boat and Helicopter sound which can be used in Hoax Call Identification, solve criminal cases etc.
Collected audio recordings from Youtube 8M dataset using automatic scripts and parsing video description.
Used feature representations like Constant-Q. Correlograms , Modulation Spectrograms. Also used a pretrained CNN model to extract proxy features using the fully connected layer of CNN architecture.
Achieved accuracy of 73% using decision trees and 77% using Adaboost. Also proposed a full end to end architecture which could help in a more detailed analysis of sounds like make/type of helicopter and boat engine.
Tools Used – Python, Sklearn, Spark, MATLAB.
Data Science Intern, Walmart Labs
June’17 – Aug’17
Working on the Walmart Performance Ads team to optimize the current model used by Walmart to display relevant ads.
Predicting Click through Rate(CTR) of ads using contextual information resulting in increase in the revenue.
Feature Engineering (e.g Binning, polynomial and logarithmic feature transformation), identifying new features & performing experiments to tune hyper-parameters.
Tools Used – Python, Spark(MLlib), Scala, Hive, Cassandra, Weka
Movie Recommendation System using MovieLens Dataset (Carnegie Mellon University)
May’17 - June’17
Used the Matric Factorization Technique to recommend movies to users following the Netflix Prize Winner’s Strategy on the Movie Lens Dataset consisting of 1 million ratings as training set.
Implemented the Alternating Least Squares Optimizing Technique to solve the “RMSE” Objective Function.
Performed Experimental Analysis to tune hyperparamaters like K, lambda etc.
Tool Used: Spyder, Python (NumPy, matplotlib, SciPy)
Home Depot Product Search Relevance (Carnegie Mellon University)
May’17 - June’17
Performed feature engineering like cosine similarity, edit distance etc. using NLP techniques like word embeddings on the unstructured dataset consisting of Product Description and Attributes.
Data preprocessing like Stop word removal, Stemming, and typo correction were performed before feature engineering.
Used Machine Learning Algos like RandomForest Regressor and Linear Regression to score each search query.
Tool Used: Python (NumPy, matplotlib), Big Data/Distributed sytems -Spark – Pyspark, MongoDB
Super Fridge: Automated Grocery List using Object Detection in Refrigerator(CMU)
Mar’17-Apr’17
Built an application running on Raspberry Pi using the camera module to detect objects in a Refrigerator and creating a Grocery List for missing items.
Built modules consuming Clarifai Api used for object detection using a picture clicked from Pi camera and push the grocery list to google drive for users.
Tools Used: Python, Raspberry Pi & Camera, Calrifai API (Object Detection), Google Drive API
Musicon: Music playing based on User Activity Recognition: SteelHacks’17
24hr – Hackathon (Feb’17)
Built an Android app which used Google’s Accelerator(Motion Sensor) data to determine User Activity(Brisk Walk, Jogging, Sprint, Standing etc).
Integrated the User activity recognition module with Spotify API, which played song based on user activity and switched between them.
Tools Used: Android JDK, Java, Google Accelerator (motion sensor) API, Spotify API
Project Intern at Talencea Inc, Pittsburgh
Oct ’16 - May'17
Working with a Pittsburgh based startup founded by LTI Director Dr. Jaime Carbonell.
1st phase of project involves working on Big Data from different external sources like client and social media platforms and building Skill Repository.
2nd Phase includes building a cognitive model which matches candidates with appropriate job openings.
Data Munging activities include Data Cleanup, Indexing, Classification, Redundancy Removal & etc.
Technologies Used – Python, MS Excel, VBA, Informatica Siperian, PL/SQL
Image classification to classify proteins into subcellular localization patterns (Carnegie Mellon University)
Aug’16 - Dec’16
Built an Active Learning Framework containing Pool Based Data Access Model, Uncertainty based Querying Strategy and different base learners like SVM, Gaussian NB, KNN and Logistic Regression
Used SelectKBest algorithm for feature selection.
Accuracy score of 0.97 was achieved on test data using SVM as base learner.
Tool Used: Spyder, Python (sklearn, NumPy, matplotlib, SciPy)
Stock Price Prediction using Probabilistic Graphical Model (Carnegie Mellon University)
Aug’16 - Nov’16
Feature transformed stock prices into a log space for previous 5 days for each stock price of 6 companies (Apple, MS, Hecla, NEM Mining, GM, Ford)
Created precision matrix using transformed features. Marginalized Precision Matrix for missing data.
Conclusively was able to predict with minimal error rate the stock prices for Apple by using only 3 days worth of data and stock prices for companies MS, Hecla, NEM.
Tool Used: Spyder, Python (NumPy, SciPy)
Linear and Forward Stagewise Regression on unknown Dataset (Carnegie Mellon University)
Aug’16 - Nov’16
Feature transformed stock prices into a log space for previous 5 days for each stock price of 6 companies (Apple, MS, Hecla, NEM Mining, GM, Ford)
Created precision matrix using transformed features. Marginalized Precision Matrix for missing data.
Conclusively was able to predict with minimal error rate the stock prices for Apple by using only 3 days worth of data and stock prices for companies MS, Hecla, NEM.
Tool Used: Spyder, Python (NumPy, SciPy)
Paper Presentation “A Cloud Framework for Parameter Sweeping Data Mining Application”
Jan ’13 – Feb ’13
Explained the system framework i.e. its architecture and execution mechanism of how parameter sweeping
could be achieved in data mining application
Finally, concluded by showing a performance evaluation w.r.t clustering & classification algorithms