I have Expert-Vetted Talent (EVT) badge - it's Upwork's top 1% freelancers — pre-screened by Upwork Talent Managers and experts in their field.
Out of ~30 million programmers worldwide, only a few thousand know Algorithms & Data Structures better than I do, which is proven by programming competitions. Please, contact me if you need that skill level (top 0.01%).
I can do algorithmic/performance work in C/C++, Python, SQL, Java, MQL4, MQL5, C#, Assembly, JavaScript, and probably other languages.
I also work with AI, mostly in NLP and NLU: large language models including OpenAI GPT-3, Bloom, BloomZ, GPT-J 6B, LLaMA, Alpaca, etc.; HuggingFace Transformers, Accelerate; Petals, Deepspeed, zfp/zfpy; CUDA, CPU, and MPS (AArch 64 M2 MacOS Metal GPU) backends. I have some work experience with Apple Neural Engine (ANE). In AI, I also worked with XGBoost for predictions (including trading), LibSVM, TensorFlow, PyTorch, Scikit-learn, etc.
I have the hardware in my home office for training and inference with large language models and other AI.
English: C1 (Grammarly plugin says I use more unique words than 95% of other users, native speakers included). Polish: B1 (86%). Russian, Belarusian: Native.
- With unique skills in Algorithms & Data Structures, I improve programs asymptotically (often 100 or more times on large input data).
- 29 years of programming (started Basic and assembler on ZX Spectrum), 24 years of C/C++, 16 years of commercial work experience + 3 years of research projects.
- Contributed to widely used Open Source projects: LLVM/Clang (my contribution is XRay profiler on ARM32 and AArch64 systems), Katana Graph (multiple small contributions mostly driven by the proprietary part where I do GPU/CUDA), CBMC "C Bounds Model Checking" (I contributed parallelized output of DIMACS formatted Boolean Satisfiability formula), oatpp (C++ web framework, I contributed bug-fixes), OWL (OptiX Wrapper Library, I contributed build fixes for Ubuntu), Galois (research project for distributed computations on graphs, I contributed GPU improvements)
- Actively participated in bug reporting and reproduction (for NVIDIA CUDA, Cadical&kissat boolean satisfiability solver, JBOSS, MariaDB, Tensorflow, Linux, etc)
- Led several open-source projects of my own: ProbQA (a video game recommendation system based on a high-performance Bayesian inference engine with CUDA, SIMD, and multi-threading); InSoAr (automatic reconstruction of software architecture from source code ), a multi-threaded Boolean Satisfiability solver, etc.
Working for hire, implemented:
- efficient multi-threading, scaling real-world workloads almost linearly with the number of CPU cores (128x for AMD Ryzen Threadripper 3990X)
- SIMD vectorization (SSE/AVX), up to 8x improvement in computing thread or even copying (see my "Faster alternatives to memcpy" answer on Stackoverflow, URL upon request).
- cache-aware algorithms: up to 50x improvement on some workloads
- up to 20 trillion operations/second in CUDA (thousands of times faster than CPU)
- up to the theoretical limit (6.8 Gigarays/second on RTX 2080 laptop GPU) in ray-tracing with OWL and OptiX
- AVX512 and RTM (Restricted Transactional Memory) based acceleration, 16x improvement for float numbers
-up to 20x improvements to cryptocurrency miners on CPU using AVX512 and cache-friendly algorithms
13K reputation on Stackoverflow: (1915854/serge-rogatch)
Topcoder SRM score: 1480 - among top 5K programmers in the world - top 0.02% (rSerge)
During my career, I also took (technical) leadership roles such as Team Lead, Manager, Chief Architect, Vice-President, and CTO.
I developed all kinds of networking applications, from Linux Kernel modules up to Web applications.
The majority of work was, of course, done at TCP/IP level with socket calls like send/recv/select.
Programming languages: C++, C++11/14/17/20, C, Python, x86/x64/ARM/AArch64 assembly, SQL, C# .NET, JavaScript, HTML, CSS, Java, MQL4, MQL5, XML, Cypher, Rust.
Libraries/Frameworks: PyTorch, Tensorflow, HuggingFace Transformers/Accelerate/Safetensors, Hivemind/Petals, OpenAI, tiktoken, Django, Flask, STL, LibSVM, XGBoost, libcurl, Selenium, PyTorch, Transformers.
Technologies: OpenMP, CUDA, SIMD (AVX&SSE, RTM), Linux Kernel Modules, OptiX, OWL (OptiX Wrapper Library), RTX, raytracing.
Theory/Principles/Know-how/Methodologies: Algorithms & Data Structures, Performance Optimization, Artificial Intelligence, Multithreading, Vectorization, Object-Oriented Programming, compiler implementation, linkers, Low-latency, High-frequency, Blockchain.
Open source code: Clang, LLVM, LLVM's compiler-rt library, Linux Kernel, a few of my own repositories, contributions to AI and Algorithm open-source projects such as Petals and CBMC.
Tools/APIs/Architectures/Platforms: PostgreSQL, MSSQL, MySQL, Neo4j, MATLAB, CMake, GIT, MT4, MetaTrader 5, Conda, PyCharm.
Virtual Machines / Containers: Docker, VMWare, VirtualBox, QEmu, Hyper-V.
OSes: Windows, Linux, Android, macOS