You will get Custom Group Activity Recognition System

Project details
I deliver a cutting-edge Group Activity Recognition system based on advanced Hierarchical Deep Temporal Models (CVPR research). Unlike standard "out-of-the-box" detectors that simply locate people, my solution utilizes a 2-Stage Deep LSTM architecture to understand the complex interactions and temporal dynamics between individuals.
You will receive a robust, custom-trained PyTorch model capable of analyzing multi-agent scenes—such as sports games, surveillance footage, or retail environments—to infer high-level collective behaviors. I handle the full pipeline: from person-tracking and feature extraction to training hierarchical Recurrent Neural Networks that achieve state-of-the-art accuracy. This is a research-grade computer vision system engineered for real-world application.
You will receive a robust, custom-trained PyTorch model capable of analyzing multi-agent scenes—such as sports games, surveillance footage, or retail environments—to infer high-level collective behaviors. I handle the full pipeline: from person-tracking and feature extraction to training hierarchical Recurrent Neural Networks that achieve state-of-the-art accuracy. This is a research-grade computer vision system engineered for real-world application.
Machine Learning Tools
BERT, ChatGPT, GitHub Copilot, Google Sheets, GPT-3, Keras, Microsoft Excel, MLflow, NLTK, NumPy, OpenCV, pandas, Python, Python Scikit-Learn, PyTorch, scikit-learn, SciPy, Sonnet, SQL, TensorFlow, Word2vec, XGBoostWhat's included
| Service Tiers |
Starter
$80
|
Standard
$450
|
Advanced
$1,250
|
|---|---|---|---|
| Delivery Time | 2 days | 7 days | 14 days |
Number of Revisions | 1 | 2 | 3 |
Number of Model Variations | 1 | 2 | 3 |
Number of Scenarios | 1 | 3 | 5 |
Number of Graphs/Charts | 0 | 3 | 3 |
Model Validation/Testing | - | ||
Model Documentation | - | - | |
Data Source Connectivity | - | ||
Source Code |
Optional add-ons
You can add these on the next page.
Fast Delivery
+$150 - $400
Additional Revision
+$50
Additional Scenario
(+ 6 Days)
+$200
Model Documentation
(+ 1 Day)
+$100Frequently asked questions
About Boules
Machine Learning Engineer | Computer Vision & NLP Specialist
Sohag, Egypt - 9:10 pm local time
Whether you need to analyze video data, generate captions for images, or classify complex text patterns, I deliver clean, documented, and high-performance code.
My Core Services:
Computer Vision: Developing models for action recognition, object detection, and image classification (CNNs, ResNet, Transformers).
Natural Language Processing: Building text classification systems (e.g., spam detection, sentiment analysis) and utilizing LSTM/RNN architectures.
Model Optimization: Fine-tuning pre-trained models and creating custom Datasets/DataLoaders in PyTorch.
Featured Projects:
Group Activity Recognition: Built a system to analyze and classify group dynamics in sports (e.g., Volleyball) using spatio-temporal modeling.
Image Captioning: Developed an encoder-decoder model (CNN + Transformer) to automatically generate descriptive captions for images.
Fraud & Spam Detection: Created high-accuracy classification models for detecting anomalies in financial data and text messages.
I am passionate about turning complex data into actionable solutions. If you are looking for a dedicated engineer to bring your AI project to life, let’s connect.
Steps for completing your project
After purchasing the project, send requirements so Boules can start the project.
Delivery time starts when Boules receives requirements from you.
Boules works on your project following the steps below.
Revisions may occur after the delivery date.
Data Setup & Preprocessing
I will review your video dataset and set up the preprocessing pipeline. This involves running the person-detector (e.g., Dlib/YOLO) to extract "tracklets" for every individual in the scene and converting them into feature vectors.
Model Configuration & Training
I will configure the Hierarchical Deep Temporal Model (PyTorch) for your specific classes. This includes training Stage 1 (Person-Level LSTM) to recognize individual actions and Stage 2 (Group-Level LSTM) to learn collective behavior.




