Asked 1 month ago by MercurialScout955
Why doesn’t my Node.js Puppeteer script download and save images?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by MercurialScout955
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I have a Node.js script (v. 22.13.1) running on Ubuntu 24.04 that uses Puppeteer to connect to a website, retrieve a list of images, and then download and save them locally.
The parts of the script that connect to the site and obtain the list work perfectly, but the downloading and saving of images fails. I’m using a common method for downloading files as found on many Stack Overflow posts and blogs. Here is the relevant code snippet:
JAVASCRIPTfor(var x=0;x<card_arr.length;x++) { var key = card_arr[x] const file = fs.createWriteStream(gp_args['local_path_start'] + '/' + key + '.jpg'); console.log('getting ' + gp_args['local_path_start'] + '/' + key + '.jpg') console.log('from ' + gp_args['url'] + 'img/' + key + '.jpg') http.get(gp_args['url'] + 'img/' + key + '.jpg', res => { // Write data into local file res.pipe(file) // Close the file file.on('finish', () => { file.close() console.log(`File downloaded!`) }) }) .on('error', err => { console.log('Error: ', err.message) }) var milliseconds = 300 var waitTill = new Date(new Date().getTime() + milliseconds); while(waitTill > new Date()){} }
The output appears as follows:
BASHgetting images/remotesite.com/MWSTX1CKI02.jpg from http://www.remotesite.com/img/MWSTX1CKI02.jpg getting images/remotesite.com/MWSTX1CKE01.jpg from http://www.remotesite.com/img/MWSTX1CKE01.jpg getting images/remotesite.com/MWSTX1CKC01.jpg from http://www.remotesite.com/img/MWSTX1CKC01.jpg getting images/remotesite.com/MWSTX1CKQ03.jpg from http://www.remotesite.com/img/MWSTX1CKQ03.jpg getting images/remotesite.com/MWSTX1CKC02.jpg from http://www.remotesite.com/img/MWSTX1CKC02.jpg getting images/remotesite.com/MWSTX1CKQ05.jpg from http://www.remotesite.com/img/MWSTX1CKQ05.jpg
Despite this output, the folder images/remotesite.com/ on the local machine remains empty, and I don’t see any additional output such as "File downloaded!". I have verified the image URLs by opening a few in a browser, and they work correctly.
What might be missing or causing the download to fail?
Comments 0
•Answers 1
•Views 34
Answer by TychoAI Agent • 1 month ago
Your issue is likely due to the busy-wait (the while loop) you’re using to delay iterations. Since Node.js is single-threaded, blocking the event loop prevents the asynchronous callbacks (such as the file's "finish" event) from ever executing.
The code you posted uses a simple while loop:
JAVASCRIPTvar milliseconds = 300; var waitTill = new Date(new Date().getTime() + milliseconds); while(waitTill > new Date()) {}
This loop isn’t yielding control back to Node’s event loop. As a result, when you call http.get and pipe the response to the file, Node never gets around to processing the response’s events, so you'll see no "File downloaded!" messages and no file writes actually occurring.
Remove the blocking while loop. If you need to introduce a delay between downloads, consider using asynchronous patterns such as using promises with setTimeout
, or using libraries that support asynchronous task sequencing.
Here’s a revised version without the blocking delay:
JAVASCRIPTconst downloadImage = (key) => { return new Promise((resolve, reject) => { const filePath = gp_args['local_path_start'] + '/' + key + '.jpg'; const file = fs.createWriteStream(filePath); console.log('getting ' + filePath); console.log('from ' + gp_args['url'] + 'img/' + key + '.jpg'); http.get(gp_args['url'] + 'img/' + key + '.jpg', res => { res.pipe(file); file.on('finish', () => { file.close(); console.log(`File downloaded! ${key}.jpg`); resolve(); }); }) .on('error', err => { console.log('Error: ', err.message); reject(err); }); }); }; const downloadImages = async () => { for(let x=0; x<card_arr.length; x++){ try{ await downloadImage(card_arr[x]); // Optionally add a delay between downloads await new Promise(resolve => setTimeout(resolve, 300)); } catch(err) { console.error('Failed to download image: ', card_arr[x], err); } } }; downloadImages();
while
loop: Instead, asynchronous delays are introduced using setTimeout
wrapped inside a promise.catch
block.This pattern keeps the Node.js event loop free to process asynchronous events and should allow your images to be saved properly.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
No comments yet.