Perspective journey, backend at scale

Lately I decided to make a change, you can call it change of direction, my goal is to learn and explore the magical, secret and complicated dark side (backend) and use my knowledge (frontend and mobile) to observe things from a different perspective. I had the opportunity to do backend several time across my career path, but most of it was for specific services that not build for scale, and not supporting multi regions, usually it was for startups in the beginning of the way, so in this stage no one thinking about it.

This post will be mostly about our life as engineers at Agoda and I got the idea to share it as I am getting a lot of questions from our candidates as a part of the interviews, Tell me what is Agoda? how it works? and actually it is very interesting and hopefully it will help you to understand us a little bit better.

Backend have a lot of challenges; how to solve high load traffic, different types of calls, sync vs async, communicating with other services (internal / external), deal with peaks, billions of messages and more. At Agoda we are working in a principle of microservices, and we divided to scrums (Agile), when every scrum has his responsibilities for specific services. I got the opportunity to work in customer communication scrum and team responsibility is to deliver messages to the customers, if it is through emails, SMS or push notifications. Imagine that our service needs to process millions of messages, and successfully deliver to our users.

From my first days at Agoda, I learn that if you want to do it in a scale it should have clean code, good maintainable and extensible architecture and not less important is to automate the maximum you can. For the past few years our teams grown a lot and these points helped us a lot in this process, a specially the automation part. For example, some project starts from several developers and if the code build in the right way from the beginning, (modular structure for example) it will be much easier to maintain and extend it in the future when the team will have more developers. If the process fully automated, we are removing the blocker of manual test or manual deployment and the developers can focus on the most important part that is development.

Yes, you can tell that it is nice concept and hard to do it, and you are right! but in the moment when you need to move fast and grow fast you will stuck and your team will not be able to scalable in the moment you will need it.

Just to give you example our KPI’s are very strict, we need to review PR’s (Pull requests) not more than 8 hours (especially if it is coming from other teams to contribute to our code), it is meaning, if developer finished his work, it should take not more than one day to be in production. For it to happen we are using GitHub as our code storage solution and TeamCity as our CI/CD tool. We are using the latest tools in the market for internal communication, so we are using Slack that is extendable, so we developed bots in order to improve our process.

The review looks like that, Developer finished his development, opening a PR and assigning developers from the team to review. reviewers getting notification about it, Email (Too many emails, people usually filtering it out :)) and notification from a bot in a slack. After the review is done (after couple of iterations if needed), the PR is approved (For example LGTM — look good for me, or special label) another bot is picking this PR from Git and triggering TeamCity build to run the unit tests, regression tests, automation tests, code analyzers and code coverage. If everything passes the next step will be to create the package and auto deploy.

Today it is a market standard and it helping a lot, maybe in the beginning it is a little hard to setup but after that you are ready to scale you team up, as I mentioned before you are focusing them more on the code and not testing and deployments.

The next step is how our teams are built. we are following Agile methodology, our scrums are self-contained (All the necessary knowledge to complete the team tasks), developers, QA and PO (Product owner), designers if needed working with us by need but they are not part of the scrum. The goal of the scrum is to find the right balance between the technical dept they have and maintenance they need to do to the system, and business needs. For our case to improve and make more bookings. We are working in sprints of 2 weeks, we have preplanning, planning, morning stand ups to share progress and daily goals, retrospective to learn and improve, and then the cycle start from the beginning.

Most of the tasks we are running are under experiments, as our goal most of the times is to increase the amount of the bookings, we are measuring our success by success of these experiments. Time to time it is a little bit challenging to find the right balance between the business needs and technical tasks that we have, and it is part of the team decision how to plan the work as we need to achieve our KPIs every quarter.

The most interesting part in these questions in the interviews is to answer about the technology stuck we are using at Agoda. Most of our backend is written in Scala, it is helping as a lot when for example we need some functionality for other teams systems or getting support (We have strong internal community), it will not block us from extending their system to have single point of truth because no need to learn a new language. At Agoda we driven a lot by data, it helps us to take the right decisions, for that we are using Hadoop to store our logs, events and process it with Spark and Kafka. This is just a small taste of what we have in the company but a good starting point to see that we are pretty catch up with the best and latest solutions.

Specifically, for me it was very interesting to see and learn how our backend is working cross DCs (Data centers) and processing this big number of requests. It is amazing how the request is coming and getting the response in milliseconds, and how many systems are involved in the process in order to provide this data.

Stay tuned as in the next articles I will describe a little more about how we are designing our backend and different architectural solutions we are using in order to handle load in sync and async options, handle availability of data across multiple DC’s and more.

If you find it interested and want to join our team, please apply at https://careersatagoda.com

Head of engineering @ Omise