Hi, I'm Muskan Goyal

A
Self-driven, quick starter, passionate programmer with a curious mind who enjoys solving a complex and challenging real-world problems.

About

I am a Computer Science Grad Student at University of Colorado Boulder. I consider myself a lifelong learner and believe in working hard. Always strive to bring 100% to the work I do. Previously, I worked as a research assistant in IIIT-Delhi, studying model explainability and privacy leaks in machine learning and deep learning models. During my undergraduate program, I have presented papers at conferences and collaborated internationally on multiple projects that resulted in journal publications. I have hands-on experience in implementing end-to-end ML pipelines with solid programming expertise in Java and Python. Technology has always fascinated me with its abilities to impact lives in all possible ways. For me, it’s that means through which I can bring my ideas to reality with a few lines of code. I have worked with various tools and frameworks such as Numpy, Pandas, Matplotlib, SciKit, Spacy, NLTK, TensorFlow, Keras, PyTorch, Hugging Face transformers & datasets, Excel, Tableau, Google Data Studio, Git, Colab, PostgreSQL, and MySQL.

  • Languages: Java, Python, SQL, HTML/CSS, Javascript, C/C++
  • Databases: MySQL, PostgreSQL
  • Libraries: NumPy, Pandas, OpenCV, Matplotlib, SciKit, Spacy, NLTK, Hugging Face transformers & datasets
  • Frameworks: Vue.js, Keras, TensorFlow, PyTorch
  • Cloud Services: AWS (EC2, S3, RDS, Lambda)
  • Tools & Technologies: Git, AWS, Heroku, JIRA, Excel, Tableau, Google Data Studio

Looking for an opportunity to work in a challenging position combining my skills in Data Science, Machine Learning, and Software Engineering, which provides professional development, interesting experiences and personal growth.

Experience

Machine Learning Intern
  • Deployed Google Dynamic World model and Google earth model on Sentinel-2 dataset for improved classification of forest, land, and water, as compared to our existing model. Used Python libraries including geemap, rasterio, and google earth engine to facilitate the implementation.
  • Partitioned and downloaded the resulted Sentinel tiles in sections of 256x256 pixels. Then, reconstructed the downloaded parts into a singular tile to enable visualization with QGIS effectively.
  • Developed a comprehensive modular function in TensorFlow that accepts a Sentinel-1 ID as input and provides a corresponding Dynamic World TIF file with pixel-level alignment to the Sentinel-2 tile. Reduced manual intervention and streamlined the workflow by up to 80%.
  • Currently, building GANs for SAR to optical image transformation and SAR image denoising in PyTorch.
  • Tools: Python, Pytorch
July 2023 - Present | NY, USA
Research Assistant
  • Collaborated with a team of 5 members to conduct a detailed research analysis on types of privacy attacks on machine learning and deep learning models (model extraction, model inversion, membership and property inference).
  • Reviewed literature on various defensive measures to protect the privacy and confidentiality of models against different attacks.
  • Reproduced black-box model attacks on ML Systems using techniques like ActiveTheif and Knockoff Nets in PyTorch.
  • Proposed a technique to build a shadow model that helps in black-box model explanations to explore and understand the behavior of any black-box model in different feature spaces using PyTorch
  • Tools: Python, Pytorch
Sept 2021 – July 2022 | Delhi, India
Research Engineer Intern
  • Collected and pre-processed the chest x-ray data for training and evaluation pipeline. Applied techniques such as resizing, normalization, and image hashing algorithm for removing the duplicates.
  • Constructed a CNN architecture by adding custom layers to VGG16 model. Fined tuned the CNN architecture for the new Covid dataset.
  • Designed and executed an Auxiliary Classifier Generative Adversarial Network (AC-GAN) based model called CovidGAN that generated synthetic chest x-ray images using Keras and TensorFlow.
  • Result: Visualized the results with Principal Component Analysis (PCA) and confusion matrix. The addition of synthetic images produced by CovidGAN increased the accuracy of CNN for Covid-19 detection from 85% (F1-score 0.83) to 95% (F1 score 0.95).
  • Paper published in IEEE Access.
  • Tools: Python, Keras, Tensorflow
Mar 2020 – May 2020 | Remote
Research Enigneer Intern
  • Trained and evaluated 4 established CNN architectures for corn leaf disease classification: VGGNet, XceptionNet, EfficientNet, NASNet.
  • Performed Model Compression and developed an optimized DenseNet model for corn leaf disease identification with Keras and TensorFlow.
  • Proved that its performance was close to the established CNN architectures with significantly fewer parameters and computation time.
  • Used Grid Search method to find the optimal hyperparameter values and analyzed the models’ performance through rigorous simulations.
  • Result: The proposed DenseNet was computationally cost-effective with 98.06% accuracy, 0.07 million parameters and it took 3 minutes per epoch.
  • Paper published in Computers and Electronics in Agriculture, Elsevier.
  • Tools: Python, Keras, Tensorflow
Sept 2019 – Dec 2019 | Remote

Education

University of Colorado Boulder

Colorado, USA

Degree: Master of Science in Computer Science
CGPA: 3.883/4.00

    Relevant Courseworks:

    • Design and Analysis of Algorithms
    • Natural Language Processing
    • Machine Learning
    • Foundations of Software Engineering
    • Big Data Architecture

Maharaja Agrasen Institute of Technology

Delhi, India

Degree: Bachelor of Technology in Computer Science and Engineering
CGPA: 8.31/10.00

    Relevant Courseworks:

    • Data Structures
    • Object Oriented Programming
    • Advanced Database Management Systems
    • Operating Systems
    • Machine Learning
    • Applied Mathematics
    • Artificial Intelligence
    • Theory of Computation

Publications

Domain-Controlled Title Generation with Human Evaluation

In Proceedings of The International Conference on Innovative Computing and Communications (pp. 461-474), Springer, Singapore.

BloomNet: A Robust Transformer based model for Bloom’s Learning Outcome Classification

Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021).

Projects

music streaming app
Excel Clone

A web application clone of MS-Excel

Accomplishments
  • Tools: HTML, CSS, Javascript
  • A completely responsive design.
  • Has features like text formatting, address bar, formula evaluation, multiple sheet handling functionality.
  • Uses two-way binding for cell properties.
  • It also uses cycle detection algorithm and color tracking for formula cycle validation.
quiz app
White Board

Open Board web application

Accomplishments
  • Tools: HTML, CSS, Javascript, Canvas API, Web Sockets
  • Make real-time 2D graphical drawings using Canvas API and web sockets.
  • This board has features like draw, erase, undo/redo, sticky notes, and downloading functionality.
  • Created Undo-Redo features for the board using arrays as stack by storing positions, color, and width of the pen and eraser.
Screenshot of web app
Camera Gallery

A simple Camera with Gallery application.

Accomplishments
  • Tools: HTML, CSS, Javascript, Browser APIs
  • Developed a Camera with Gallery application that heavily utilized browser APIs including Media Stream, Media Recorder, Media Devices, IndexedDB (client-side storage), and Canvas.
  • has features like image capturing with filters, video recording, and a gallery with in-browser storage.
Screenshot of  web app
Code-mixed Visual Question Answering Dataset

Implemented code-mix generation to make a novel dataset.

Accomplishments
  • Created a novel dataset that helps multilingual speakers and medical practitioners.
  • The dataset composed of medical images that posed clinical queries in Hindi, English, or Code-mixed (Hinglish: Hindi – English) language.
  • Used Stanza for POS tagging, and GIZA++ tool for creating alignments between English and Hindi questions.
  • Lastly, calculated the Switch-Point-Factor and Code-Mixed-Index for all the code-mixed transliterations.
Screenshot of  web app
Sentiment Analyzer

Performed sentiment analysis on 2 datasets i.e. Amazon reviews and IMDB dataset.

Accomplishments
  • Conducted Exploratory Data Analysis and built a Sentiment Analyzer for Amazon reviews and IMDB dataset (total 64000 reviews).
  • Analyzed the stopwords, frequency of words, distribution of rating scores, and the results.
  • Cleaned the dataset and experimented with various models (Naive Bayes, XGBoost, MLP, RNN, BERT) for comparative insights. BERT gave the highest accuracy with 89% on IMDB and 95% on Amazon dataset.
Screenshot of  web app
Flipkart Phone Sales Analysis

Created a dashboard for Flipkart Phone Sales Analysis.

Accomplishments
  • Categorized the dataset of phone models according to the market segments, namely, budget, midrange and flagship.
  • Plotted various bar graphs and box plots in Colab to study the brand, rating, color, and selling price of various models.
  • Designed a 2-page dashboard in Google Data Studio to provide a high-level summary of Flipkart phone sales with the ability to drill down and reveal deeper information.
  • Dashboard can also monitor the phone specifications like total memory and storage capacities, prices, and the selling value per segment for each brand.
Screenshot of  web app
Foreign Direct Investment (FDI) Case Study

Created a Tableau dashboard for FDI Case study.

Accomplishments
  • Created a Dashboard on Tableau to visualize the behavior and variation of investment in various service sectors. Examined the projection of FDI over the years, as well as the trend and growth of FDI.
Screenshot of  trip planner
Budget-Based Recommendation System

Engineered a Single Page Application using Vue.js

Accomplishments
  • Engineered a SPA using Vue.js, integrating with 3+ services like S3, Weather API, and GoogleMapsAPI.
  • Implemented 7+ unique features, including user based tailored activity suggestions, itinerary planning, and real-time metrics related to trip.
  • Deployed with a CI/CD pipeline via GitHub actions.

Skills

Languages and Databases

Python
HTML5
CSS3
MySQL
PostgreSQL
Shell Scripting

Libraries

NumPy
Pandas
OpenCV
scikit-learn
matplotlib

Frameworks

Bootstrap
Keras
TensorFlow
PyTorch

Other

Git
AWS
Heroku

Contact