PhantomJS QuerySelectorAll().textcontent Returns Nothing
Solution 1:
By default the virtual screen size of PhantomJS is 400x300.
var page = require('webpage').create();
console.log(page.viewportSize.width);
console.log(page.viewportSize.height);
400
300
There are sites that take note of that and instead of the normal version that you see in your desktop browser they present a mobile, stripped version of the HTML and CSS. But we can fix that by setting the desired viewport size:
page.viewportSize = { width: 1280, height: 800 };
There are also sites that do useragent sniffing and make decisions based on that. If they don't know your browser, they can show a mobile version to be on the safe side, or if they don't want to be scraped they could deny connection to PhantomJS, because it honestly declares itself:
console.log(page.settings.userAgent);
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
But we can set the desired user agent:
page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0';
When working with such fragile things and web scraping you really really should take notice of any errors ans system messages you can get.
So no PhantomJS script should be without onError and onConsoleMessage callbacks:
page.onError = function (msg, trace) {
var msgStack = ['ERROR: ' + msg];
if (trace && trace.length) {
msgStack.push('TRACE:');
trace.forEach(function(t) {
msgStack.push(' -> ' + t.file + ': ' + t.line + (t.function ? ' (in function "' + t.function +'")' : ''));
});
}
console.log(msgStack.join('\n'));
};
page.onConsoleMessage = function (msg) {
console.log(msg);
};
Another vital technique of PhantomJS scripts debugging is making screenshots. Are you sure that PhantomJS sees what you see in you Chrome?
page.render("google.com.png");
Before setting user agent:
After setting Firefox user agent
Post a Comment for "PhantomJS QuerySelectorAll().textcontent Returns Nothing"