I'm trying to scrape some information(using phantomjs) from the link (http://www.myntra.com/women-sarees?nav_id=606) that involves lazy loading. Below is my code snippet for this:
window.setInterval(function() {
// Checks if there is a div with class=".has-more-items"
// (not sure if this is the best way of doing it)
var count = page.evaluate(function() {
try{
return document.getElementsByClassName('more-products-loading-indicator')[0].style.display;
}catch(e){return e.message;}
});
if((count == 'none')&&(k < 4)) { // Didn't find
console.log('count none' + k);
k=k+1;
page.evaluate(function() {
// Scrolls to the bottom of page
console.log('hey');
//window.scrollBy(0,500);
window.document.body.scrollTop = document.body.scrollHeight;
});
page.render('myn'+k+'.png');
}
else { // Found
//Do what you want
//console.log('len123');
console.log('count block');
page.evaluate(function() {
// Scrolls to the bottom of page
});
try {
var links = page.evaluate(function() {
return [].map.call(document.querySelectorAll('a.clearfix'), function(link) {
return 'http://www.myntra.com'+link.getAttribute('href');
}); });
} catch (e) {
console.log(e.message); return [];
}
console.log(links.join(','));
var result = links.join(',');
console.log(links.length);
page.render('myntra.png');
phantom.exit();
}
}, 5000); // Number o ms to wait between scrolls
But I'm getting only first six rows scraped. Apparently, the page is not loaded after it is scrolled down.
Actually you don't seem to be scrolling no where..
page.evaluate(function() { // Scrolls to the bottom of page
});
is an empty function. why should it scroll?
here are some pointers on how to scroll: http://stackoverflow.com/questions/11715646/scroll-automatically-to-the-bottom-of-the-page basically its
window.scrollTo(0,document.body.scrollHeight);
you should experiment on what works for your scenario since your link doesn't scroll to the real end when this command is invoked but only to the next load point. so this should be in a loop, until the real page end is achieved: e.g when the number of li
in the product list === to the number in heading "XXX products found"
good luck
Note that you don't initialize k. for some languages that might work. but not for js.
open your console and type: k = k+1
(or k++
) and You'll get a ReferenceError
. so I'm not sure the loop is even running. what probably happens is that it goes straight to the else
clause (Since you specify an "And" && and not an "Or" ||) where there isn't any scrolling down (since the function is empty as I explained in the previous answer. ) and then some rows are scraped and that it. k is never called. the same rows are scraped again etc.
You should also note that a simple while loop (or a recrusive one where the if clause calls the calling function if there is still scrolling to do) would probably be much more helpful to you then this setInterval , which is probably an async nightmare to handle and very unefficient solution (It could also timeout before the scraping is done..)
And I really think you should move to using casper.js, a solution built on top of phantomjs and which is much easier to program with, at least for your kind of scraping work.
Congratulations!
Now that your task is posted let the world know
Make your task famous