2019-004-2-Cendertron,动态爬虫的滑动验证码绕过策略
Cendertron,动态爬虫的滑动验证码绕过策略
在《InfoSecurity-Notes》安全动态爬虫系列中我们依次介绍了安全爬虫的设计、爬虫的集群搭建,本篇则是讨论有关于滑动验证码的绕过策略。
本文采用的策略与代码来自 How to bypass “slider CAPTCHA” with JS and Puppeteer 一文。
爬虫中滑动验证的绕过
验证是常见的反爬虫策略之一,在现在的很多站点中我们会引入滑动验证的方式,来校验访问者的真实性。譬如下面著名的 jQuery 滑动插件:
在模拟登陆时,我们往往需要绕过这样的滑动验证,而基于 Puppeteer 的动态爬虫也给予了便利;往往我们需要进行以下步骤:移动到滑条中间,按下鼠标,移动鼠标,释放鼠标。
const puppeteer = require("puppeteer");
async function run() {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1366, height: 768 },
});
const page = await browser.newPage();
await page.goto("http://kthornbloom.com/slidetosubmit/");
await page.type('input[name="name"]', "Puppeteer Bot");
await page.type('input[name="email"]', "js@automation.com");
let sliderElement = await page.$(".slide-submit");
let slider = await sliderElement.boundingBox();
let sliderHandle = await page.$(".slide-submit-thumb");
let handle = await sliderHandle.boundingBox();
await page.mouse.move(
handle.x + handle.width / 2,
handle.y + handle.height / 2
);
await page.mouse.down();
await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, {
steps: 10,
});
await page.mouse.up();
await page.waitFor(3000);
// success!
await browser.close();
}
run();
在实际的案例中,我们可以以淘宝的注册界面为例:
const puppeteer = require("puppeteer");
async function run() {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1366, height: 768 },
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, "webdriver", {
get: () => false,
});
});
await page.goto("https://world.taobao.com/markets/all/sea/register");
let frame = page.frames()[1];
await frame.waitForSelector(".nc_iconfont.btn_slide");
const sliderElement = await frame.$(".slidetounlock");
const slider = await sliderElement.boundingBox();
const sliderHandle = await frame.$(".nc_iconfont.btn_slide");
const handle = await sliderHandle.boundingBox();
await page.mouse.move(
handle.x + handle.width / 2,
handle.y + handle.height / 2
);
await page.mouse.down();
await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, {
steps: 50,
});
await page.mouse.up();
await page.waitFor(3000);
// success!
await browser.close();
}
run();
另一种常见的滑块则是如下这种拼图性质的滑块:
const puppeteer = require("puppeteer");
const Rembrandt = require("rembrandt");
async function run() {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: { width: 1366, height: 768 },
});
const page = await browser.newPage();
let originalImage = "";
await page.setRequestInterception(true);
page.on("request", (request) => request.continue());
page.on("response", async (response) => {
if (response.request().resourceType() === "image")
originalImage = await response.buffer().catch(() => {});
});
await page.goto("https://monoplasty.github.io/vue-monoplasty-slide-verify/");
const sliderElement = await page.$(".slide-verify-slider");
const slider = await sliderElement.boundingBox();
const sliderHandle = await page.$(".slide-verify-slider-mask-item");
const handle = await sliderHandle.boundingBox();
let currentPosition = 0;
let bestSlider = {
position: 0,
difference: 100,
};
await page.mouse.move(
handle.x + handle.width / 2,
handle.y + handle.height / 2
);
await page.mouse.down();
while (currentPosition < slider.width - handle.width / 2) {
await page.mouse.move(
handle.x + currentPosition,
handle.y + handle.height / 2 + Math.random() * 10 - 5
);
let sliderContainer = await page.$(".slide-verify");
let sliderImage = await sliderContainer.screenshot();
const rembrandt = new Rembrandt({
imageA: originalImage,
imageB: sliderImage,
thresholdType: Rembrandt.THRESHOLD_PERCENT,
});
let result = await rembrandt.compare();
let difference = result.percentageDifference * 100;
if (difference < bestSlider.difference) {
bestSlider.difference = difference;
bestSlider.position = currentPosition;
}
currentPosition += 5;
}
await page.mouse.move(
handle.x + bestSlider.position,
handle.y + handle.height / 2,
{ steps: 10 }
);
await page.mouse.up();
await page.waitFor(3000);
// success!
await browser.close();
}
run();
这里我们采用了简单的图片对比的方式,即在滑动过程中,如果发现了有符合阈值的差异,则认为是已经滑动成功。
Spider 配置
在 Cendertron 中,提供了一类特殊的 Slider Captcha Monkey,在传入的 SpiderOption 中添加如下参数即可:
export interface SpiderOption {
allowRedirect: boolean;
depth: number;
// 页面插件
monkies?: {
sliderCaptcha: {
sliderElementSelector: string;
sliderHandleSelector: string;
};
};
}
延伸阅读
您可以通过以下任一方式阅读笔者的系列文章,涵盖了技术资料归纳、编程语言与理论、Web 与大前端、服务端开发与基础架构、云计算与大数据、数据科学与人工智能、产品设计等多个领域:
- 在 Gitbook 中在线浏览,每个系列对应各自的 Gitbook 仓库。
Awesome Lists | Awesome CheatSheets | Awesome Interviews | Awesome RoadMaps | Awesome MindMaps | Awesome-CS-Books |
---|
编程语言理论 | Java 实战 | JavaScript 实战 | Go 实战 | Python 实战 | Rust 实战 |
---|
软件工程、数据结构与算法、设计模式、软件架构 | 现代 Web 全栈开发与工程架构 | 大前端混合开发与数据可视化 | 服务端开发实践与工程架构 | 分布式基础架构 | 数据科学,人工智能与深度学习 | 产品设计与用户体验 |
---|