Gunjan B.
97% Job Success
Top Rated Plus

NLP, Multimodal, and Generative AI Expert | Top 1 Percent on Upwork

I am a research engineer focusing upon building and surpassing the latest and greatest in AI. My focus is on natural language processing (NLP) and multimodal (vision and text) AI, with a particular emphasis on large generative models. I have extensive experience working with a variety of different clients, from new startups backed by Y Combinator and AllenAI to large Fortune 500 companies. I have also provided consulting services, including to various venture capital firms and former Microsoft, AWS, and RTL Group executives hoping to build their own AI businesses. I am certified as a top 1% expert on Upwork and within the top 10% by Triplebyte. I also give talks across the industry - for example, I recently spoke on the Weaviate podcast about the current multimodal AI landscape. In terms of the models I am experienced with, I have worked with the vast majority of NLP and multimodal models as well as a significant amount of the speech and vision models available on Huggingface. I regularly build my custom model architectures to best fit client needs since while most models only really have one advancement they focus on, the clients I work with usually would benefit from the improvements from several different Huggingface models. However, I am also familiar with proprietary AI APIs, particularly that of OpenAI’s. I usually recommend those to clients who either will not see a lot of usage of their AI products for a while or those who have low maintenance costs being a top priority. Some of my favorite projects include: - Building a video summarizer taking as input a video or audio file and outputting a summary of its contents in either text or audio format. I used my own custom punctuation addition model, chunking and batching of speech inputs into 10 second increments to speed up transcription, TGlobal attention for the summarization system, and many other tricks. - Building a ChatGPT-style question answering chatbot built entirely with open-source models and tools - Creating a medical bill reimbursement system that took as input a picture of a medical bill, extracted the names of any relevant medical treatments, compared them using semantic search to reimbursable medical procedures in an insurer’s database, and automatically paid out money for all matches - Adding fine-tuning capabilities to MiniGPT-4 and other multimodal LLMs - Domain adapting CLIP on e-commerce data, reaching a 57% relative improvement in zero-shot classification after just 1 hour of training - Creating a custom training framework to train encoder-based language models using the replaced token detection (RTD) objective, including supporting weight sharing, custom generator sizes, training on multilingual datasets, adaptation of models not originally trained with RTD to be trainable using the task, etc. - Building a custom multimodal model architecture using linear attention - Reducing the size of the intent classification and NER models of an automotive company’s Alexa-style virtual assistant by 95-99% while improving accuracy - Creating the state-of-the-art ASL to English translation system back in 2021 (though I surmise this has since been dramatically improved upon since) Tech Stack: Model Development and Optimization - Huggingface, PyTorch, ONNX, Datasets, Pandas, Numpy, Scipy, SciKit-Learn, Sentence Transformers, PEFT, Accelerate, Loralib, TensorRT, Fairseq, LAVIS, TIMM, OpenNMT, TensorFlow, Jax, lots and lots of others Visualization - Matplotlib, tSNE, UMAP LLM Tools - OpenAI API (GPT-4, 3.5 turbo, Ada through davinci, etc.), Langchain, LLaMaIndex, Weaviate, Pinecone, Faiss, ScaNN Cloud - AWS (Sagemaker, EC2), GCP, Azure. I prefer to avoid cloud companies’ end to end AI services (e.g., Amazon Comprehend) since they tend to be inferior to open-source alternatives or things I can custom build, though I am open to using them if that is what you prefer!
Work history

Gunjan B. has more jobs. Create an account to review them