Muhammad Abdul J.

Muhammad Abdul J.

YogyakartaIndonesia

NodeJS System and Scraping Specialist

I am a scrapping specialist. I have vast experiences on scraping websites using Node.js. Node.js are asynchronous so it can handle multiple request at the same time. As a reference on my latest scrape projects, scrapping 8,000,000 email from Facebook Business Page and scrapping 9,000,000 repository data from Github User Page are finished in under one week. As a general rule I can scrape up to 120 pages / minute / machine (assuming the target site has small latency). When using multiple machines the speed will be multiplied. When dealing with large dataset, connection definitely will be a factor. That is why, after developed it locally, I used to use my own private server(s) (up to 6 VPS) that can be used to scrape data in time critical manner. Since hosted servers will definitely have better connection. My experiences on scraping : - a number of flight tickets reservation sites, where I retrieved flight information, like price and schedule, seat availability, booking a seat flight, issuing a ticket, all is done using script, i.e automation scraping, and then aggregate it for storage or real-time search results, - a train tickets reservation sites - a number of real estate sites to get sale and auction listings, and aggregate it to be stored on database and accessed later, - a football stadium site to get charter seat sale, - a few online retail store sites to get product data and many more. - Github, a code sharing site to get user and code repositories data, I also have experiences in browser automation, from sending form, logging in with credentials, ajax request, registering user, navigating pagination pages, (also in my last project) booking a flight seat, issuing a flight ticket, printing a ticket (PDF) and so on. By using a Master-Worker pattern, I can use multiple machines to do the scraping process at the same time which is equal faster result. Additionally, it will have multiple different IP addresses, solving the block multiple requests / rate limiting problem. Of course if multiple IP addresses is not enough, we can use TOR proxy or some other premium proxy services. I can make the desired output in CSV format which is compatible with Excel and Google Sheets. Or even insert it to database which later can be reformatted as desired. It is not just limited to simple submit search query, sometime the actions are more complex, like registering, logging in, submit form and other interactive action used on actual browser. And some of this sites are using PHP / ASP / Java / Spring Framework with their XSS and CSRF token protection. Some are confusing and require much research and trial-and-error, like rate-limiting the request, requiring certain headers, will block if request is refreshed, or will block if credentials is using in another place (multiple session) and so on. Special note on ASPX sites, I know how to handle postback form, popup window, logging in to ASPx site. I also know some other features, like error logging, solving captcha, handle interval timing, adding a delay between requests, multiple attempts or re-attempts on error, backoff on exceeding a number of max error, IP proxying, clustering and distributing work load to multilple machine to scrape faster, etc. If you have some question or concerns please do not hesitate to contact me. Looking forward to work with you, M. A. Jabbaar
Rating is 5 out of 5.
5.00 Feb 16, 2022 - May 13, 2022

"great job"

Private earnings
Jun 15, 2021 - Dec 22, 2021

No feedback given

Private earnings
Dec 1, 2018 - Jan 23, 2019

No feedback given

Private earnings
Rating is 5 out of 5.
5.00 Oct 13, 2015 - Jul 31, 2016
Private earnings
Rating is 5 out of 5.
5.00 Jun 20, 2015 - Jun 28, 2015
Private earnings
Rating is 3.95 out of 5.
3.95 Apr 27, 2015 - May 15, 2015

"Developer had scheduling issues and we had to continue with another developer."

Private earnings

Muhammad Abdul J. has more jobs. Create an account to review them
Muhammad Abdul J.

Muhammad Abdul J.

YogyakartaIndonesia
20
Total Jobs
120
Total Hours

View profile

NodeJS System and Scraping Specialist

Specializes in
I am a scrapping specialist. I have vast experiences on scraping websites using Node.js. Node.js are asynchronous so it can handle multiple request at the same time. As a reference on my latest scrape projects, scrapping 8,000,000 email from Facebook Business Page and scrapping 9,000,000 repository data from Github User Page are finished in under one week. As a general rule I can scrape up to 120 pages / minute / machine (assuming the target site has small latency). When using multiple machines the speed will be multiplied. When dealing with large dataset, connection definitely will be a factor. That is why, after developed it locally, I used to use my own private server(s) (up to 6 VPS) that can be used to scrape data in time critical manner. Since hosted servers will definitely have better connection. My experiences on scraping : - a number of flight tickets reservation sites, where I retrieved flight information, like price and schedule, seat availability, booking a seat flight, issuing a ticket, all is done using script, i.e automation scraping, and then aggregate it for storage or real-time search results, - a train tickets reservation sites - a number of real estate sites to get sale and auction listings, and aggregate it to be stored on database and accessed later, - a football stadium site to get charter seat sale, - a few online retail store sites to get product data and many more. - Github, a code sharing site to get user and code repositories data, I also have experiences in browser automation, from sending form, logging in with credentials, ajax request, registering user, navigating pagination pages, (also in my last project) booking a flight seat, issuing a flight ticket, printing a ticket (PDF) and so on. By using a Master-Worker pattern, I can use multiple machines to do the scraping process at the same time which is equal faster result. Additionally, it will have multiple different IP addresses, solving the block multiple requests / rate limiting problem. Of course if multiple IP addresses is not enough, we can use TOR proxy or some other premium proxy services. I can make the desired output in CSV format which is compatible with Excel and Google Sheets. Or even insert it to database which later can be reformatted as desired. It is not just limited to simple submit search query, sometime the actions are more complex, like registering, logging in, submit form and other interactive action used on actual browser. And some of this sites are using PHP / ASP / Java / Spring Framework with their XSS and CSRF token protection. Some are confusing and require much research and trial-and-error, like rate-limiting the request, requiring certain headers, will block if request is refreshed, or will block if credentials is using in another place (multiple session) and so on. Special note on ASPX sites, I know how to handle postback form, popup window, logging in to ASPx site. I also know some other features, like error logging, solving captcha, handle interval timing, adding a delay between requests, multiple attempts or re-attempts on error, backoff on exceeding a number of max error, IP proxying, clustering and distributing work load to multilple machine to scrape faster, etc. If you have some question or concerns please do not hesitate to contact me. Looking forward to work with you, M. A. Jabbaar
Rating is 5 out of 5.
5.00 Feb 16, 2022 - May 13, 2022

"great job"

Private earnings
Jun 15, 2021 - Dec 22, 2021

No feedback given

Private earnings
Dec 1, 2018 - Jan 23, 2019

No feedback given

Private earnings
Rating is 5 out of 5.
5.00 Oct 13, 2015 - Jul 31, 2016
Private earnings
Rating is 5 out of 5.
5.00 Jun 20, 2015 - Jun 28, 2015
Private earnings
Rating is 3.95 out of 5.
3.95 Apr 27, 2015 - May 15, 2015

"Developer had scheduling issues and we had to continue with another developer."

Private earnings

Muhammad Abdul J. has more jobs. Create an account to review them
More than 30 hrs/week