Hi, I'm Ilnar, a computational linguist / natural language processing developer with over 6 years of experience. I have an M.Sc. degree in Computational Linguistics from Stuttgart University.
Competencies
===========
- Programming Languages: Python, Racket, Clojure, Bash, SQL.
- Infrastructure: Azure ML, ONNX Runtime, Google Firebase, Terraform.
- ML libraries: NLTK, Spacy, Scikit-learn, Pandas, HF Transformers, HF Optimum.
- Web scraping: Requests, Selenium, BeautifulSoup.
- Web frameworks: FastAPI, Flask.
- Packaging & Deployment: Pyinstaller, Docker, GNU/Make.
- Rule-based NLP: HFST, VISL CG-3, Apertium, LFG/XLE.
- Operating systems: Ubuntu GNU/Linux.
- Version Control: Git, Subversion.
Most proficient in Python, but also have used Racket and Clojure for building static websites and prototyping desktop GUIs.
What I can do for you
================
- Natural language processing with NLTK, Spacy or Huggingface Transformers.
= Can train or fine-tune an NLP model.
- Write code in Python.
= Can build a REST API using FastApi or Flask.
- Convert models, e.g. models available on Huggingface Hub, into ONNX format, optimizing & quantizing them for specific hardware using Huggingface Optimum.
= So that you don't waste money on infrastructure costs.
- MLOps: deploying and operating those models on Microsoft Azure ML.
= So that you don't have to.
- Scrape the web using Python or Clojure.
= Can assemble a dataset, if there isn't one already for your needs.
- Can annotate texts with linguistic information, often in a semi-automatic way, using rule-based frameworks or semi-supervised methods.
= So that you can train your models.
- Fine-tune an OpenAI model via its API.
= So that you don't have to train anything.
- Prompt engineering: (Chat)GPT, Claude 2 etc.
= Profit!
Experience
=========
Most recently, I have worked with ProWritingAid, a writing assistant software that helps millions of users improve their writing skills and communication and be great at storytelling.
At ProWritingAid, I was responsible for training, evaluating, fine-tuning, optimizing, and deploying NLP models on Azure ML, using libraries such as Huggingface Transformers, Huggingface Optimum, and ONNX Runtime. I also collected, documented, and scraped text datasets, and used Large Language Models such as ChatGPT and Claude 2 for data generation and labeling. Some of the projects I worked on included tone analysis, paraphrasing, and grammar & style checking. I used Python as my main programming language, and collaborated with a diverse and talented team of researchers and developers.
Before ProWritingAid, I built language corpora, morphological analysis and generation tools, parts-of-speech taggers, dependency parsers, machine translators, and speech recognition systems [2].
While in Stuttgart, I have written a Statistical, Transition-based Dependency Parser [1] and a Statistical Parts-of-Speech Tagger from scratch.
As for human languages, I'm proficient in English, German, Russian and Tatar, and conversant in Turkish and Kazakh.
In my free time I contribute to localization of free/libre software and various educational sites like Khan Academy into my native language (Tatar). That work is showcased on selimcan.org.
[1] gitlab.com/selimcan/SDP
[2] scholar.google.com/citations?user=oeGuPdYAAAAJ