Thursday, June 25, 2020

Web Scraping: Leave it All to AI or Add a Human Touch

What was once used by the US military as ARPANET (Advanced Research Projects Agency network) is today known as the Internet. With data grew from few gigabytes to 1.2 million terabytes today. In 1995 the internet was used by 16 million users. Today there are more than 4600 million users on internet and numbers are growing with each passing second. The last two years alone has made up for 90 per cent of the internet data today.  

This growth of internet users and their information has increased the data storage exponentially. Whatever you do on the internet, you will leave a digital trail. Even a random search by a random user will count in internet trend and affects the indexing of search engines. The data servers are now occupying space of football fields. Major companies like Google, Amazon etc are providing with cloud computing and cloud storage services to tap internet users’ data storage demand. With the need to store replicate data in case of natural catastrophe; more space is consumed by dedicated servers.

The surfing of the internet as much as it can be fun for regular users like us, for data scientists and businesses that desire some relevant information can become an uphill task. To find a needle in a haystack is easier than finding desired data on internet manually. The amount of data created and stored by a single large company is so vast that private data centres are employed. By this, we can envision how much data is available on the internet.

The role of data science, data mining and data scraping has increased tremendously. Web scraping services are used majorly for data extraction and data analysis. Web scraping is used for diverse purposes like business competition, research and analysis, consumer insights, security purposes, government purposes etc. 

What is Web Scraping?


what is web scraping

The extraction of data from websites is called web scraping or web harvesting. The specific data is copied from websites to local database or spreadsheet. Web scraping services or data scraping services use hypertext protocol or Extensible hypertext protocol for data extraction. The scraping can be done manually by visiting the particular page and copying data manually into a spreadsheet.

The manually scraping is possible when we are working for personal usage and data we are working with is limited. When we are dealing with a large amount of data an automated process is essential. It is implemented using a bot.

Web scraping and web crawling often mistaken for same but are different. Web crawling is done by search engines for indexing of hyperlinks, whereas web scraping does the data extraction. Web crawling is used in web scraping for fetching pages.

The websites now a day are highly advanced with using gifs, scripts, flash animations etc in an integrated ecosystem. Websites are developed, keeping human in mind, not bots therefore data extraction become a challenging task. The data extraction is based on the data stored by websites in text form. The mark-up languages such as HTML and XHTML are used for the development of a basic framework for any website. The specialised software use this rich text data for extraction. 

There are simple plug-ins such as Scraper, Data Scraper for Google chrome used for web scraping. There specialised software such as ParseHub, OutwitHub, etc employed for slightly advance level of web scraping. The major e-commerce companies such as Amazon and social networking companies such as Facebook provide their APIs (Application programming interface) for public data extraction. 

web scraping requiring inustrties

AI is a necessary evil in data scraping. The quantity of data has forced the implementation of AI. The AI as helpful it can be, unnerve people with wild sci-fi fantasy that pales the Matrix trilogy in comparison.


The legality of web scraping


“Just like the wild west, the Internet has no rules”. The times have changed in the wild west and on the Internet. The computer fraud and abuse laws criminalise any act of breaking into any private computer systems and accessing non-publically available data. In 2016 hiQ  Labs, a data science company web scrapped the publically available LinkedIn profiles. LinkedIn terms this as a violation of the company’s policy on data usage without permission and authorization. The hiQ took LinkedIn to court. In a landmark judgement for web scraping legality, the court ruled in favour of hiQ stating, “web scraping of public data is not a violation of computer fraud and abuse act.” 


The morality of web scraping


The web scraping is used in business for online price monitoring, price comparison, product review data. The real estate companies use it to gather competitor real state listing. The websites use other website public data for their convenience without having to work for it. The web scraping lies in the grey area of morality where few times its use cab be justified with internet policy and sometimes complete violation of basic internet ethics.  

If you are searching for cheaply available phones with a certain price range and use web scraping tool on a major e-commerce website for data extraction then it quite ethical and can be justified. When you extract data for a content-based site with its USP being uniquely available content such as blogging websites and created a mirror site then it cannot be justified.

A basic moral conscience is necessary for making a righteous judgment in the age of the internet where the lines are quite blurred.

Concept of the good bot and bad bot


website traffic

The supporting of web scraping often linked with freedom of the internet and fair use of public data but the picture is not as rosy as it seems. There are many bad bots ie malicious automated software available which can steal data by breaking into user accounts, overload servers with providing junk data and harm websites.

The AI bot gets a bad reputation due to malicious bot crawling the internet space. Many websites prohibit web scraping. The websites use advance tools for bot detection and prevent them from viewing their pages. This solution to this is the use of DOM parsing, simulation of human behaviour etc to extract data from sites.

Does it require adding a magic touch?


artificial intelligence

We are leaving in the age of artificial intelligence. It is the intelligence demonstrated by machines. The machines are incapable of thinking by themselves. The highly complex software is used to develop machine intelligence that learns, adapt and collects data. The AI is now used in several areas from traffic regulation, pilot training in the aviation industry, critical fields such as nuclear reactors etc. The AI has made possible rooming of the rover on Mars. 

People are apprehensive of AI and believing a new world order where machine rules human. These make up for a good sci-fi script or story but the reality is too mechanical. AI has made it possible to work in an environment where humans could not survive. The sensitive area such as military, national security etc relies on AI for information processing. Human lives depend on AI proper working.

The internet is brimming with boundless data. The manual data extraction can be tedious in past but with data storage reaching in terabytes, it is nearly impossible. We have to implement AI for web harvesting and data mining services. The AI can extract store and process data from thousands of pages in a few seconds. The manual scraping does only a few hundred pages in days.  The AI has made it possible to scrape websites with a gigantic database and analyses it for forming business strategies and predictions. 

Does that mean the AI has replaced human in web scraping area at least? Well, the answer is not binary. The AI does a spectacular job in web harvesting but the human touch is indispensable. When data is extracted just like an ore is extracted. It has to go through various processes of floatation, smelting etc to be useful. The data gathered from the site could be repetitive, redundant and in the wrong format. When we are extracting this kind of data we are overloading storage with unnecessary data. Data verification and data scrubbing will cleanse the inaccurate and corrupt records from the extracted data. These are quite state of art tools but the ultimate power lies in the hands of a human.

The intelligence of the machine is called artificial for a reason. The AI extracting data cannot determine its necessity for a purpose like a human does. Let us suppose a company want to launch a new clothing line for teenage girls. They are extracting data for what teenage girls find fashionable. Many times websites want to remain on the forward listing of search engines pages and use the metadata incorrectly. The AI being AI will extract the data for teenage fashion and data will imply something else. 


Many websites prevent web crawling by using CAPTCHAS, embedding information in media objects, login access requirement, changing website HTML regularly etc. The AI right now cannot trespass these mechanisms of prevention of web scraping. 

In a situation like these, human touch became essential. As they say, “The artificial intelligence has the same relation to intelligence as an artificial rose is to real rose.”

Tuesday, June 16, 2020

Scaling your eCommerce start-up in the age of Amazon


The popularity of Amazon is indisputable. From a humble bookselling start-up in 1994 to eCommerce business giant, Amazon has come a long way. Amazon is holding its share of business in the cut-throat eCommerce business. No other competitor in many countries is anywhere near it. Amazon itself has an inventory of about 12 million items across all its categories and services but on a broader level all the items that Marketplace sellers list, that number expands to about 350 million. Amazon was responsible for 45% of US eCommerce spending in 2019 and is expected to rise to 47% in 2020.

amazon spending

With the driving force of Amazon web services, it seems impossible to stop this juggernaut. Amazon is a market place where cheap manufacture meets sellers for branding. Amazon’s strategy of flooding of cheap products and free shipping has toughened the survival of small and medium scale competitors. The small e-commerce businesses neither have the capital nor reach of Amazon. Some e-commerce businesses opt for the strategy of competitive pricing. A competitor based pricing strategy can be sustained during the initial stages of market entry but as you progress you cannot use it effectively. The eCommerce business with low pricing, high operating cost, and expanding its supply chain and distribution network will suffer from the loss.

amazon competitor


The above statistics paint a dire picture indeed but all hope is not lost. Amazon competitor in many countries has proven that even the dragon giant as Amazon can be slain using the sword of ecommerce solutions and data mining services.

ecommerce solution

The ecommerce solutions to this modern-day problem can be found in the traditional business model which is still surviving and thriving in this online global marketplace.

Knowing your self


Amazon is become so large and selling everything under the sun that its brand image is turned generic. No consumer in this wanted to link to a bland and vapid brand image. It is a chunk in the armor that can be explored. Why ecommerce businesses like Esty, Myntra, Casper, etc are successful because they know what they are offering. In the bookselling market, Barnes and Noble are giving Amazon run for its money. Knowing your brand and building its image can help small ecommerce businesses in holding their own in this competition. Be courageous enough to be authentic. Just as a shopkeeper will know what it is offering that shop around the corner is not.

Knowing your market


It is like opening a shop of frozen yogurt where everyone is offering ice-cream. The massiveness of Amazon can be used for business advantage as it cannot focus on every market opportunity. When you sell from toothpick to table lamp focused on the quality of every listing becomes sloppy. Another ecommerce giant like Flipkart in India has covered the market of mobile and electronics extensively which Amazon has not focused on. Myntra has covered the market for fashion that Amazon till now is not offering much. Cosmetic market, which is now a fast-growing market, ecommerce brand like Nyaka has covered it. There are many unexplored avenues in e commerce business that can be explored.

Seeing through the eyes of the customer


A traditional shop owner will keep the focus on the ambiance and neatness of the shop. That traditional customer experience is replaced with user experience UX in ecommerce. E-commerce business websites should be eye-pleasing and user friendly. An out of place color icon can even turn off some customer and can vow to never return. A single bad experience on a website makes users 88 % less likely to visit the website again. The data mining techniques can be utilized that capture various parameters of user behavior on the website. With the analysis of this data, various inferences to visitor behavior can be made.

 abandonment

Ecommerce sites can look for customer’s stay time on-site to track whether it’s up to visitor liking or not. If visitor time of too short or too long both are problems. If it’s too short or abandonment in middle signify that it has not to find a site to his liking. In the case of abandonment, something before abandonment was not working for him. If staying too long means that either website was confusing or it has not found something he was looking for.

Not getting lost in the crowd


Amazon is offering consumers with quantity with flooding of products but users prefer quality too sometimes. This trade-off can be managed if the quality product offers at somewhat competitive pricing. But presented with quality products people will choose even if pricy. The demand for Apple products is a living example of this. If you are offering a genuinely good product, without advertising, you will be lost in the sea of cheap products. In traditional business, advertising is done by pamphlets and hoarding. In ecommerce SEO will help you reach in forward pages of search engine listing. The majority of web users are likely to view up to three pages of search engine listing. Your presence of different social media outlets can be like Instagram, Facebook, etc will build brand image, new product advertisement, customer reach, and in turn will increase site visits. As the customer base, today is millennial with ages from 18 to 34 social media presence is essential. Brand storytelling through videos and customer experience promotion have a high social impact on the customer. You can also reach in the prime listing of the search engine when users search for no particular product in mind just through social media presence. When users hit your site even when no sale is made, it can improve your search engine presence.


Knowing your customer


When we visit a shop and shopkeeper remember our name, preference, or item we have recently purchased as a customer we feel validated. The same could be offered through sites too. The Amazon also utilizes these tools but customer engagement is deep when business is small and customer connects to brand image. With the utilization of customer behavior data analysis tools customer behavior can be tracked. Customizable site theme, a suggestion based on previous searches, pitching for more items based on online searches, a suggestion for you may also like gives the customer an individualized experience and could hike sales of connected items. Beautiful packaging adds to customer satisfaction. By making packaging interesting so that opening the product itself is an experience for the customer and inspire him to post on social media so that your product gets to advertise for free. Amazon free shipping is difficult to compete but with the help of an efficient and reliable logistic service, this hindrance can be overcome. The clear shipping policy, package tracking, and clear return policy can keep customers happy without free shipping.

Customer a true promoter and reviewer


Some shops offer customers with suggestion boxes and reviews shop experience that collecting feedback will help in the improvement of services and removing loopholes that can be used by competitors. On b2b website reviews, storytelling, and suggestion by the customer can be included so that engagement is felt. Sometimes customers like your product but forget to review it. Email appending servicesemail verification services, or email data validation can be used and Emails can be sent to the customer as a reminder for review. Online visitors are more prone to customer reviews in deciding for purchase of product then product description by seller. Customer loyalty programs like providing with pre-sale benefit, exclusive discount offer, limited deals will incentivize customer loyalty and they will felt connected and appreciated. The sellers can be provided with basic and premium membership services through which they can create their homepage, replying to buying leads, and contact buyers. Amazon uses this strategy through Amazon prime services. There are over 150 million Amazon Prime members around the world and they typically spend over $1,000 a year. The estimated percentage of Amazon prime customers is 63%.

Customer forum helps customers connecting and making even pitch for product sales. FAQ section can help in resolving doubt. Live chat tools can provide that and may redress many customer issues. A site should provide a room to vent customer frustration or another will. Proper customer complaint mechanism is important and so is its prompt resolution.


The foundation of Ecommerce Empire


All the strategies are just a building block for e-commerce business but the driving for is SEO, data analytics tool, and skip tracing servicesData verificationaddress searchweb scraping services, data scrubbing, and data appending services play a major role.

These tools will help in b2b lead generation. Many web services provide ecommerce solutions, cloud computing services, and data servers for storage. With limited investment in dedicated servers, these online services could be used. Analytics tools are crucial to today’s marketing success. Google Analytics, Bitly, Piwik, Open Web Analytics, etc tools are available for free.

Amazon is truly far-reaching as Amazon River with a dominating presence in 16 countries. B2B giants such as Alibaba, Indiamart, e world trade, etc prove that as uphill the task may be, it could be done. 

Friday, June 5, 2020

5 Ways to Increase sales After the Corona Holidays

As the end of corona holidays are near, the world rejoices as life returns to its normal route. People are also filled with apprehension. With some countries like South Korea, Taiwan, Georgia, etc have managed to contain the outbreak of COVID 19, other countries are still reeling from this pandemic. The death toll as of June is 380265 and still infecting millions.  India despite sincere effort has been unable to contain it with the death toll due to the coronavirus rose to 5598. As vaccination of COVID 19 is still an elusive dream, learning to live with it is the only option.

Corona outbreak has shown the fragility of human life against nature. Corona has impacted the world on all levels whether it be social, economic, and psychological. Corona effects have been positive on the environment and flora- fauna. The other repercussions are nowhere positive.

The world economic ecosystem has got disrupted and from the looks of it, recovery will take years. The world economy has estimated to lose 9 trillion dollars due to the coronavirus. The World Bank has estimated India’s growth rate to be lowest in three decades. The lockdown has affected people socially and psychologically. As the world adjusts with social distancing, distancing with profit has filled business with uncertainty. According to the IMF report, the world economy will shrink by 3 percent in 2020. This slowdown is sharpest since the great depression in the 1930s.

covid imapct statistic

The world economy is heading towards a deep recession. The COVID 19 induced economic shutdowns and market instability have to lead to a sharp rise in unemployment, the decline in government income, collapse of tourism and hospitality industries, stress on supply chains, and reduction in consumption of products and services. The majority of companies are forced to reduce or completely stop production as demand has fallen. The new start-ups are struggling for funds as the market has dried up. Stock markets in India have announced their worst losses in history. The plight of migrant workers is not hidden. The government is obligated to spend its revenue on food security and medical facilities. The prime minister has announced a package of 20 lakh crore rupees. The effects of loss of lives are immense. 

The halting of industrial production, exports, and allied economic activities has plummeted b2b sales. Even after corona holidays get over, the momentum of b2b sales as prior still unpredictable.  There is one thing in business that is certain, is uncertainty. The uncertainty fuels business with opportunities. With well thought out action plan and courage, rebound for these turbulent times is possible too. The five ways that will keep you afloat in stormy COVID 19 times are

Keeping the spirit high


keeping high spirit

We cannot control situations in our life but we can control our reactions to them. An upbeat attitude will keep us sailing through tough corona time. Corona pandemic has affected people immensely. People’s hesitation and reluctance are genuine. The halting of the economic machine has broken the back of many and dried upmarket with money flow. Many have faced income cut down while others have been laid off. The manufacturer and seller both have faced major losses. The manufacturer lost by stopping production and seller lost by steep fall in the purchase of products. The instability of the market has tied the hands of both manufacture and sellers. These factors have in turn led to a drop in b2b sales and new b2b leads are abysmal.

These are the facts that should be accepted for what they are. These are the problems that every business i0s facing right know. The economic packages of government are ventilators for an economy that is struggling with corona symptoms but completer recovery uncertain. Acceptance of the situation will open the doors for change. The belief that too shall pass is the need of the hour. The eradication of corona is a difficult nut to crack. People along with businesses have to learn to live with.


Sense and sensitivity


sense and sensitivity

With the world still recovering from the pandemic and your marketing strategy, oblivion to the sensitivity of matter will create a negative impact. Aggressive marketing approach in these times will leave a bad taste in your client’s mouth. Business in this time should be looked upon as empathic and concerning not money hungry and selfish. Brand image will take a serious hit if even after this global tragedy no compassion is shown. 

B2b website content and social media communication channel should reflect the sincerity of the brand. The creation of video content to educate people could be included on-site and social media. The advertisements on-site should be as per condition right now. The people are apprehensive to travel in their town and your site is advertising discounted world tours. 

The valuable steps were taken by the company during corona times to help people and emergency service personnel will demonstrate the humane nature of the brand.  Like the majority of companies has come forward in making and distribution of mask, sanitizer, gloves, respirators, and other essential medical equipment. The steps taken as CSR (corporate social responsibility) should be publicized so that others are inspired. The people get connected to the brand which is aligned with their values and morals, not some entity that is cold and unfeeling.

For b2b business related to selling, in homepage listing, hygiene and medical products could be included. All e-commerce companies are taking this approach. Even in the appeals section personal protective equipment (PPE), apron, coveralls, etc are found in the first listing.

Website themes and homepage content subliminally convey the intent of a business entity. These are small changes to be made but making a series of minor changes could change the big picture too. The emails forwarding to the current customer base with hoping for the wellness of customer and brand eagerness to standby them through tough times like this will create goodwill.

Without sensitivity in these times, a business will look like Evil Corporation no one want to associate with.

Also Read: 4-ways to amaze your Clients with Interactive Emails and Increase Engagement

Tried and tested


tried and tested

The corona has also affected online trends. Keywords that were once getting website on first search pages have lost their charm. In this uncertainty of b2b lead generation, data mining and data analysis tools are useful like never before. What business intuition cannot predict data analysis of customer behavior and SEO tool can predict. These customer insights are crucial in times like this when stable sources of leads and sales are unstable.

Emails appendingdata scrubbing, address search,data scraping, and data verification are all tools as disposal. The quarantine and lockdown have altered customer behavior. These are shown through online trending topics, twitter trending, etc.

Using data gathered through these tools can help businesses to include the keyword in their content which will increase their visibility of the online stage. With people relying on the web for news and updates on corona, correct utilization of data mining services works wonder. If b2b leads are not generated as per expectation online presence will surely increase. That too will be beneficial in the long run.

Exploring the unexplored


exploring the unexplored

B2b leads and sales have tools a serious hit during corona times. Already established marketing channels cannot tap into the lead generation. As corona holidays are over. New leads on old channels are dried up. Unexplored marketing channels can be explored. Never had your business presence on Esty,  it’s time to do so. Never held a business account on VK, it never too late. These new exploring will open gateway for new customer base and new business opportunities. What is not in demand in your current client base can be in high demand for new clients. The customer outreach program through an online discussion forum could be done. Many broadcasting sites are present to which are not explore with a marketing point of view, that can be done. Take this time to explore other channels for advertising. It will take time to return to the normal course of life and with free time in hand online world is safe to explore for people. While they do their part you can do yours too.

Value the bird in the hand


value the bird in hand

While hoping for the best in new leads it may be possible that it takes time. The loyal customer stronghold should be safeguarded first. The current customer database can be used to identify the high-value clients and sales strategy focused on them. The high-value clients offered with discounts, premium membership benefits, and exclusive offers will respond positively. The clients still ordering during corona times should be valued. Once someone made a purchase they can be pitched for other purchases or reorder. It takes time to develop customer relations. A loyal customer base should always be cherished.

Every b2b is different. What can work for SAAS cannot work for an e-commerce b2b. Keeping that in mind, these five ways are generalized for all b2b. It’s up to you that you choose the way that keeps your boat afloat. The corona times have been tough and right now circumstances are grim. Opportunities can be found in the darkest of times if we remember to turn on the light. When staying homes can save lives, staying in-game can save the business as well.