• January 7th, 2016
  • No Comments
  • admin

Intelligently Scale your Resilience to BS

Whew! It’s been awhile & loads of good things have happened over the past few months! Lots of folks needing all kinds of help with their web data solutions. With projects ranging across the board from workflow automation challenges, to product pricing intelligence platforms, and even some higher level data curation and analytical components. So needless to say, but I will any who, it’s been fun in exciting times and with that in mind I bring you this new short blog posting.

In every space there are a few terms that are bandied about, and the world of Web Scraping is no exception. You frequently hear people chirp about their Scale and that their solution is Resilient or that there is some level of Intelligent detection and processing, but what exactly is this and does it work… or better yet, does it matter? Unfortunately the answer to all these questions is “…it depends.” There is just not a one-size fit all approach when it comes to buying software, data, or a service.

What does scale mean?

I won’t sit here and posit that all platform companies (enterprise, cloud, desktop) out there are in the business of pulling the wool over your eyes, but they will definitely make truthful claims that have many meanings. “Oh yes we can scale…” can mean your configuration (license, servers, network) can go from one website to 20 with no real troubles, or it can mean that the software has no known bottlenecks when adding five additional servers to support those 20 additional sites. I am sure you can see where I am going here: it boils down to costs.  What does licensing look like when you have to add five additional servers, or what is the cost for the infrastructure, bandwidth, or even human capital? That idea of scale can quickly end up costing you more $$$ then you had ever planned to spend on your solution.

So find out what it means to scale. Have your solution providers demonstrate this against relevant sources, content, and data flow. As the $$$ of a deal goes up so should your proof points and their reception to those. Don’t feel uncomfortable for asking for a protracted evaluation period. If at any point things start becoming wishy washy throw your hands up and let them know you are calling BS and give them the latitude to respond.  After all they did just expend time, resources, and opportunity to bring you along this far.

What does it mean to be resilient?

Can you take a punch? It’s that simple: can data move around on the page or in the source code and can the software still programmatically sniff this out? This isn’t a “Yes, it can!” sort of verification process from your part.  Much like testing for scalability, determining the resilience of a solution is something that needs to be demonstrated. Ask for this just like you would ask for a demonstration of programmatic extraction.  After all, it is one of the hot topics in the space.

What is Intelligent web extraction?

So you have a smart thinking machine that is going to do the work for me? Tell me more… Artificial Intelligence (AI) and Machine Learning (ML) is not a myth! But does your dear vendor actually have algorithms that demonstrate an autonomous level of sophistication able to make a judgement call between a transposed Sales Price and List Price? Or better yet, what does it really mean to you on a practical level? I would hazard to bet that almost every data extraction platform all the way up to analytics stacks have some such quantity of machine learning algorithms built-in to help process the identification and flow of data. But how does that translate into real $$$ with what you are buying? Is this something your should pay more for? Are you seeing benefit from it because there is a measured increase in resilience, performance, or…? Its up to you whether or not this is just a bright shiny button that does or does not launch global thermonuclear war.

Finally, and my favorite, where does it break?

This is the fun proof point and where you really start determining if you are just another customer. A true solution seller should be walking down the aisle with you hand-in-hand and that means that your prospective vendor should not be trying to “sell” you anything but instead helping demonstrate and build a future with their product cozily encapsulated in yours. As such there should be no hesitation from their end, once you have earned the trust/value, to help tease apart your use-cases, sites, or workflow and point out not only where potential problems may arise but also solutions and/or workarounds to them. Unfortunately you usually only find this behavior true to form when there are some serious $$$ involved.

Series Note: This is the fifth in a series of publications on web data extraction. The goal of this series is to help shed some light on the full life cycle of web data extraction. If you have any specific questions feel free to contact me tom@frigginyeah.com or visit my website https://www.frigginyeah.com