Pierre is an indie hacker living in France who has been building products with his lifelong friend, Kevin, for over the last months. After two failed businesses, the two co-founders built ScrapingBee, a web scraping API that handles proxy, chrome headless and captchas for users. Nowadays, the tool is making +$3k/month and counting.
Hi Pierre! What's your background, and what are you currently working on?
Hey! My name is Pierre de Wulf, I’m 27 and I live in Paris, France. 18 months ago I quit my full-time job as a data-engineer to go the indie hacker way with my lifelong friend, Kevin Sahin.
4 months ago, and after trying many many things, we built ScrapingBee, a web scraping API that handles proxy, chrome headless and captchas for you. Prior to this, we tried many different things and businesses, but I’ll talk about it later.
ScrapingBee is particularly useful for people or companies who quickly needs to scrape the web at scale and who don’t want to spend too much time and energy handling this part but rather focusing on extracting value from this data. It might be someone that needs to aggregate some data for their next blog post about the aerospatial industry or a big company that needs to monitor thousands of restaurants’ reviews online.
At first, our API only returned HTML of any URL you passed to it, but recently we’ve launched an API store that extracts structured data from specific services such as Instagram or Google.
Both Kevin and I work full-time on this, and remotely. We both have 100% technical background and at first, we were both equally tackling marketing and product tasks. Lately, Kevin naturally took the lead on the sales/marketing and operational aspects of the business and I did on the tech/product side.
What's your backstory and how did you come up with the idea?
I think we can say that the whole ScrapingBee story actually began almost 3 years ago when Kevin and I decided to build this one fun side project together. I've known Kevin for the last 15 years as we were in the same high school and later in the same university. Early on he introduced me to the whole Startup and Indie Hacking ecosystem by showing me the website and books he loved. This piqued my interest and Kevin and I constantly talked about startup ideas we had.
The thing was, we were only talking about or coding some small stuff here and there but we never really launched something. But three years ago, my girlfriend gave me an idea that I found interesting: a product that would allow you to save all the products you are interested in buying online in one place and get a notification once they are on sale. I really loved the idea, as always, I talked about this idea to Kevin and we decide to build it and this time we would not make the same mistakes we made with our previous and unfinished projects:
- We decided this time to use a boilerplate code so we did not have to deal with code login, email validation, user management, from scratch.
- We only implemented 2 important features: save the product from everywhere on the web with a Chrome extension and receive a notification once price drops.
- Even if the results are not perfect, launch anyway.
At that time, I was working full time in Paris and Kevin was writing his book about Java web scraping. We built ShopToList in about 1 month, mainly during the weekend and decided to simply post it on three different subreddits and it blew off. I remember that this is 'r/frugalmalefashion', a subreddit dedicated to male fashion discount, that loved our ideas the best, we quickly reached 1000 upvotes and in about 4 hours we had almost 600 users. We were ecstatic, for the first time, we had built something people liked, but more importantly, something people used!
The next week was all about r/posting and some content marketing. We also did a ProductHunt launch that went well, especially considering the state of the app. Quickly we realized that while this app was probably a good idea, making a living out of it would be very difficult, and making a living out of our project was always the goal for us. It would be hard for mainly three reasons:
- The product needed to be free, so monetization would come from affiliation or ads and both solutions need a huge user base to be profitable. We estimated that we needed at least 100 active users to reach that point with ShopToList.
- ShopToList is a B2C app and customers are really used to incredibly well-polished app, and ShopToList was not. We knew that in order to build a successful app we would also need to build a mobile app and add many social features.
- The acquisition is really expensive and hard, especially in the free app B2C market where your marginal revenue per new user is almost 0.
- Many ShopToList clones existed or used to exist, and it really seemed that only a few of them were making money.
Three months after launch, we decided to put ShopToList on hold. Even if it was not a million-dollar project we considered it a success because it showed us that Kevin and I were both capable of working together and it gave us the confidence to build other products and projects and showing them to the world.
Fast forward a few months, out of curiosity we look at ShopToList production database to understand our users. We wanted to know, on average, how many products people added, what was the most famous website... We quickly discovered that out of 3000 users, almost a dozen of them had added around 1000 products each! This was huge, especially considering the fact that we did not offer a way to 'bulk' add products in your account. We went into "FBI mode" and understood that those users were actually e-commerce owners who were monitoring their competitor's pricing. EUREKA we though with Kevin, why not build a price monitoring system? This idea seemed really good on paper for many reasons:
- It is way easier to make money with B2B than B2C.
- Many price monitoring tools existed, which means that there are consumers for it.
- Those tools were all very very complicated to use.
- We could leverage ShopToList users to find our first clients.
We did things the usual way, we started with building a landing page, shared it on a dozen Facebook groups. We built an MVP in about 2 months and thanks to a successful PH launch we quickly managed to have 150 users for our free beta.
In January, we decided to go out of beta and basically forced our customers to pay for the product if they still wanted to use it. I still perfectly remember that 10 minutes after sending this email, we had our first paying customers. Needless to say, things were looking great.
The next following month we would try to get as many users as possible by testing different marketing strategies, mainly content marketing. Our main problem was conversion, it was very very low, approximately 1% (trial -> customer). From day one we knew marketing would be the key and that getting users to our website would be very challenging. But I think one of our mistakes was to overlook how hard it can be to transform a user into customers.
We've tried many things to increase the conversion rate, we've rebuilt our onboarding, wrote tons of content, had many many calls and it was just not working. We saw three big reasons for that:
- We failed to explain the value of price monitoring. We did not know eCommerce very well and I think that this lack of knowledge played a key part in our failure explaining PricingBot value to our users.
- Onboarding was long and tedious as users had to match their products with their competitors. We offered to do it for a fee but no-one was ready to pay for this, see 1).
- We never knew who was our target. We never managed to understand what kind of e-commerce needed our tool: Was it a small e-commerce website? Was it drop shippers? Was it a niche website or not? We failed in finding our market.
So basically it all came down to our lack of knowledge about e-commerce and e-merchant. In June, we saw that PricingBot would never have the success we hoped it had and we decided to launch something else.
And now I’ll finally talk about ScrapingBee.
Kevin and I are tech guys, we like to code and to build things. The common denominator of all our previous company work and side projects was web-scraping. We did a lot of it and saw many pain-points that needed to be addressed. Because we used to pay for one and that we were not happy with the product we decided to build an API that would handle headless browser, proxy, and CAPTCHAs for you.
We thought it was a good idea because:
- Web scraping is a really competitive market, if you have doubts about it just type "web scraping" in Google and count the number of ads. It was a good sign because it meant there was a lot of demand for it.
- This time we knew who our customers would be and how to reach them.
- Because we knew a lot in scraping we had a lot to tell about it and we felt confident in our ability to explain the value of our product to our potential customers.
How did you build ScrapingBee?
When building the initial version of ScrapingBee we had to keep 2 important things in mind. First, the solution should be scalable and cost-efficient from day one. We are a 100% bootstrapped company and we could not afford to lose money. Secondly, we had to build ScrapingBee quickly, very quickly, our previous project PricingBot was not a success and we could not afford to spend another 9 months with 0 cash coming in.
With that in mind, we used our knowledge and past experiences to make the most cost-efficient and time-efficient choice. We reused PricingBot web app code for the (light) front-end and billing management part of the app.
We deployed the whole thing on Heroku, web application and database, the only thing we externalized was Redis as we needed lots of concurrent connexions and those were really really expensive on Heroku.
Doing this allowed us to have a usable version in a few days. But we had one big problem, we could not handle more than 10 concurrent connexions at that time without spending tons of money. The reason was that our API requests a URL through a real Chrome browser in order to be able to scrape web application with a lot of JavaScript. And those computations are really really CPU intensive. This led us to serverless, we basically deployed a Chrome headless on steroid on AWS lambda. We could now scale almost at will as AWS lambda authorize up to 1000 concurrent connections. The problem with AWS lambda is that it is crazy-expensive. To mitigate that aspect we paid the $750 Pro annual plan of ProductHunt's ship product to get $5000 of AWS credits. Of course, that solution would only work for some time but it left us enough time to build our custom scalable system.
Regarding proxies, we mixed our own private pool of IP with several proxy providers from all around the world. The problem with proxy providers is that they are not all equal and knowing which one to trust is a challenge.
That was it for our main product. For the landing page, thanks to Landen, a really nice landing page builder, it only took us 1 hour. Internally we used heavily Slack for all synchronous communication between Kevin and I and Notion as a Knowledge Base + CRM + Project management tool + Finance. Those two tools are really incredible by both their versatility and quality.
Which were your marketing strategies to grow your business?
Our strategy came down to this: we knew web scraping and the developer’s "world" very well, so let’s talk to them about web scraping. Of course, in the beginning, we posted ScrapingBee on dozens of startup listing website, but our core strategy was this.
We put CTA on both our blog to redirect traffic to ScrapingBee and also wrote 3 very (from our point of view) successful piece of content, with each having more than 10,000 views:
- A small tutorial on how to scrape single page application
- An extensive general guide about web scraping without getting blocked
- A complete introduction to web scraping with Python
We don't believe in content marketing for the sake of content marketing. Sure it is easy to write or pay someone to do it for you, three pieces of content per week that nobody reads but that Google will love. We did not want to do that, we really believed we could teach something to our readers, and hopefully sell them ScrapingBee. This is why we almost spent 30 hours per article every time, it is time-consuming, but we think it is well worth it as it creates improves our SEO, brings huge traffic and builds our community.
One company we think that does it very well is Ahref. Of course, they have the resources to do those kinds of posts almost two times per week, but if you read their blog you will learn a lot about SEO, it is actually interesting and insightful content, not just marketing nonsense.
We did not have the money to do Facebook or Google Ads, and if you type web scraping in Google and count the number of advertisements you can imagine how high the cost per click might be there. Instead, we massively discuss web scraping on Facebook, Reddit, Dev.to, Hacker News, and some more niche forums. We don't always bring up ScrapingBee, but when we do, it converts pretty well.
What are your goals for the future?
As of right now, ScrapingBee has an MRR of $2900. It is a weird position where of course we can't say that this is a failure, but this is way too soon to talk about success. Our primary goal is to be able to make a living out of ScrapingBee, to reach ramen profitability as Paul Graham would say. We aim to reach this point as soon as possible. For us, it means reaching $5000.
Our second goal is to really write many good pieces of content about web scraping and to finish our whole Python guide before the end of the year. We also want to seriously build a community around web scraping, we still don't know if we should do it on Linkedin or Facebook but we want to do it.
We also have an API store that we want to develop by supporting more and more websites, but we won't focus on this part before Q2 2020. We basically want to focus on as few things as possible and try to do them as correctly as possible. This is probably the most difficult part about running a business solo or with only one partner, there are just so many things we want to try and explore, but at the end of the day there are only 24 hours and we have to prioritize. So from now on, our focus will be solely on content marketing and building a community.
On a more personal side, both Kevin and I want to prove ourselves that we made the right choice of leaving our company a few years ago. We’ve been reading about startup and indie hacking for the last 6 years and want to know that it was not done in vain.
What were the biggest challenges you faced and the obstacles you overcame?
The first biggest misadventure so far we had with ScrapingBee was a legal one. Early on, ScrapingBee was called ScrapingNinja. We actually launched on ProductHunt with this name. 1 month after the PH launch we received an email from a French company claiming to own this name and asking us to change right away. This caused a lot of stress and made us lose a lot of time, and quite a bit of legal fees because we had to do a whole rebrand in less than one month. We had to say goodbye to all our SEO efforts, find a brand new name, do good legal research about it, find a new logo and communicate about it.
We are the only one to blame for that mistake, had we made more thorough brand name research before launching we would not have chosen ScrapingNinja. We will definitely not make this same mistake again.
The second one was a motivation one, after working 8 months on PricingBot and seeing that nothing worked, we were really on the verge of quitting. We’ve known several people who were not able to give up a bad business idea soon enough and they ended up staying for as long as 5 years. One of our biggest fear was to be one of those people. On the other end, we knew that building a new business takes time and that you should not hope for overnight success because it almost never happens. So we were at the point where we did not know if we had to quit and we did not know if we had enough time to launch something else. We ended up building ScrapingBee because we knew we would never be sure of anything we’d better build something out of this uncertainty instead of just wondering.
Which are your greatest disadvantages? What were your worst mistakes?
I’d say that our greatest disadvantage is that Kevin and I both have a full technical background. Neither of us has ever worked in web marketing before and we had to learn everything from the ground up. On the other hand, we’ve known each other for the last 15 years, so what we don’t have in complementarity, we like to think that we make up for it in terms of communication and getting along with each other very well.
Secondly, we are completely bootstrapped, meaning that we can’t afford to waste money on costly experimentation. At least at the beginning, we had to be very careful about every one of our purchases. But as always, something good came out of this, because we could not afford to spend too much money we chose to get the maximum out of every tool we were already paying for, 80% of our business is now done solely with Notion and Slack and we learned a lot about the powerfulness of those tools. Now things are better, we had the chance to sell PricingBot last month, and in 2020 we will have much more money to spend. We can’t wait!
I think that our biggest mistakes were made with PricingBot as explained before. I like to think that we did not make the same mistake twice with ScrapingBee. But right now, ScrapingBee is at a level we’ve never reached before with PricingBot, so we have lots of new mistakes to make, we just haven’t had enough time to make them yet.
If you had the chance to do things differently, what would you do?
If I had the chance to talk about my 5 years ago self I think I would tell him not to do a price monitoring application, to begin your web scraping stuff as soon as possible AND to be really careful about brand names.
I would also tell him to really do double-check his math correctly regarding ProductHunt launch. We actually launched on hour late and it cost us the “product of the day” place.
What are some sources for learning you would recommend for entrepreneurs who are just starting?
Two books I’ve read really stood out those last two years:
About marketing, I can’t recommend enough "Traction" by Gabriel Weinberg and Justin Mares. What they basically say is that at the beginning, it is easy to lose a lot of time and energy trying every different acquisition channel (SEO, ads, content marketing, ...) out there. What you should do instead is find one channel that works well, or well enough, and stick with it to make the most out of it before exploring other solutions. We tried to do exactly that with ScrapingBee and it helped us tremendously.
The second book that really helped me was "Hello Startup" by Yevgeniy Brikman. It's a technical book that explains all the things you should take care of, product-wise, to build maintainable and scalable product. I read it five years ago and learned a lot from it, especially for someone like me who, at that time, had never worked in a big company with a top-notch engineering team. Because it is hard to know what you don’t know, this book taught me a lot about critical parts in building application, (logs, feedback, cloud, CVS, …). I’d recommend this book only to people who have never worked in a tech company before.
About website, we’ve been IndieHackers users, readers, and contributors for most of the last 3 years and it really changed everything. 5 years ago I thought that in order to build a successful company you need to be a genius and have tons of money. How wrong was I. IndieHackers taught me most of the “startup wisdom” I currently “have”.
I’ve recently been more involved in some facebook groups, such as “SaaS Growth Hack” and I am surprised how useful and “ad-free” the contribution is over there.
I also learned to simply ask. Now, as soon as I see a fellow IndieHackers or even an entrepreneur doing things I am curious about, I just send them an email or even a LinkedIn message with my questions. So far I am at 100% response rate, people are very inclined to help each other, you just have to be respectful of their time (don’t ask them tons of things) and genuinely curious about them and their business.
Where can we go to learn more?
If you are interested in web scraping, I sincerely think you should check out our blog. We really do our best to write interesting content and had really good results in it. Our most successful posts so far are our web scraping guide and our Python web scraping tutorial.
We are also trying to build a community on Twitter, you can follow us here. We currently have less than 100 followers, but considering we are not using any bots or tools to grow our audience, we are happy about the results. We will focus on that a lot in 2020.