Scaling Node.js in the Enterprise

Posted by Ancestry Team on March 31, 2015 in Development, Web

Last year we began an effort internally at Ancestry to determine if we could scale out Node.js within the frontend applications teams. Node.js is a platform that we felt could solve a lot of our needs as a business to build modern, scalable, distributed applications using one of our favorite languages: JavaScript. I want to outline the steps we took, the challenges we have faced and what we learned to this point after six months of officially scaling out a Node.js ecosystem within an enterprise organization.

Guild

We introduced the concept of a guild initially in Q4 of 2014 to get those who are doing anything with or who are interested in Node.js. The guild concept comes from Spotify and their agile engineering model which is a group of people who are passionate about a particular subject. In this case, we wanted to get everyone together to identify the steps we need to take to get Node.js adopted within the organization. We meet once a month and introduce topics related to Node.js which promotes a high level of transparency across the company and anyone is welcome to join and recommend topics. Once we established the guild, it was a great starting point to get passionate people in the same room.

Training

Before we began to invest in Node.js as a platform we wanted to ensure we had a consistent level of knowledge across our engineering group on building Node.js applications. We organized two training sessions for a small group of engineers both in our Provo, UT office and San Francisco offices which was lead by the awesome guys over at nearForm. The group was about 15 engineers in each session. The idea of keeping it small was that we wanted to provide a wide enough level of influence so that the individuals who were part of the training effectively start to build applications and in turn spread their own knowledge. This worked well as we had teams immediately starting to think about components that can be done in Node.js.

Interoperability

As you accumulate multiple technologies in your ecosystem you need to ensure they are all interopable. This means you need to decouple some of your systems, ensure you’re communicating over a common protocol everyone understands such as HTTP and using a good transport such as JSON. We have a lot of backend services in our infrastructure that were built with C#, so in order to support multiple technologies we needed to work with the dependent service teams to ensure we have pure REST services exposed.

We also distribute service clients via NuGet which is the standard package management system for C#, but this is not going to work for any other languages. You will need to ensure that you are building extremely thin clients with well documented API specifications. We want to treat our internal clients like we would with any external consumer of an API. This allows any platform to build on top of our backend services and allows us to prepare and scale for the future on any emerging technologies.

Monolithic Architecture

One of the biggest anti-patterns for Node.js applications are monolithic architectures. This is a pattern of the old where we build large applications that handle multiple responsibilities. The responsibilities were typically hosting the client-side components such as HTML, CSS and JavaScript, hosting an application API with many endpoints and responsibilities, managing the cache of all of the services, rendering the output of each page and so forth. This type of architecture has several problems and risks.

First, it’s extremely volatile for continuous deployment. Rolling out one feature of the application potentially can break the whole application, thus disrupting all of your customers.

It’s also extremely difficult to refactor or rewrite an application down the road if it’s all built as one big large application; having 3 or 4 separate components is easier to rebuild or throw away than 1 large one.

Last, everything should be a service. Everything. Having a large web application that is a combination of different responsibilities goes against this and they should be seperate.

As you begin to break down your monolithic applications, one recommendation is to use a good reverse proxy to route external traffic to new and separate applications while still maintaining integrity on your URI and endpoints.

Documentation

You need to document everything. We created an internal guide on anything and everything related to building Node.js applications at Ancestry. From architecture, best practices, use cases, web frameworks, supported versions, testing, deployment. Anyone within our engineering team who is interested in adopting Node.js is able to use this guide as a first step to get up and running. It ensures that we have an open and transparent model of how to setup, configure, build, test and deploy your applications. The document is an evolving document that we review often together as a group.

Define Governance

Since Node.js is so evolving, it would be wise to establish a small governance group to manage it within your organization. This group should be responsible for defining the standards, adoption of new frameworks, optimizations to architecture and so forth. Again, keep it transparent and open to provide a successful ecosystem. For example, this group decides which web application framework we use such as Express or Hapi.

Scaffolding

It’s extremely important to help engineers get started on a new platform. With technology stacks like Microsoft ASP.NET or Java Spring MVC the convention is a lot more defined. In the Node.js world, there are many different ways to do one thing so we want to make this process a bit more standardized and simple. We also want to ensure all engineers are including common functionality in their applications without having to individually add it in themselves one by one.

So we have built generators by using a tool called Yeoman. It allows you to define templates, or generators as they call them, to scaffold out new Node.js applications easily. This ensures consistency with the Node.js architecture, all common components and middleware is included, an initial set of unit tests with mocks and stubs are added, build tools are configured (such as Grunt to Gulp) and even scripting out your local hosting environment with Vagrant and Docker configuration.

Internal Modules

As your engineering teams begin to scale out efforts in Node.js, you will begin to need cross cutting functionality. One of the principals of Node.js is that it’s great at doing very lean things well. This is a core unix philosophy. In the case of Node.js it should also apply to your common functionality. The package management system for Node.js is NPM. When you build applications you’re essentially building a composite application from open source modules in the community. Today all of these are hosted on npmjs.org. But for larger companies who have security policies in place you do not want to publish your common functionality out to the public so you will need a way to host your modules internally.

Initially we went with Sinopia. It’a an open source NPM registry server that allows you to publish your modules internally to. It also acts as a proxy so that if the module isn’t hosted internally it will go fetch it from npmjs.org and cache it. This is great for hosting all of your common code as well as providing performance improvements since your build system doesn’t have to fetch the package every time.

Over time as more teams begun to publish packages we needed something that would scale better. We introduced Artifactory which provides a lot more functionality and also hosts many other package management systems such as NuGet, Maven, Docker, etc. This allows us to define granular rules around package whitelists, blacklists, aggregation of multiple package sources and more.

Ownership

Building common shared functionality across teams can be difficult to maintain. Our approach was more of an open source model. Each team has the ability to build common functionality that they need to implement a Node.js application, but they must follow a few rules to allow features, bug fixes and enhancements to go into modules. First, they have to define a clear readme.md in their git repository. Second, each module always has an active maintainer. This maintainer is listed right at the top of the readme.md and is the go to for questions or even pull requests. This allows for a flexible ownership model and transparency on these common bits of functionality. You absolutely must agree on your process as an organization for this to work.

Security

When you adopt any new platform you need to ensure security is a top concern. We’ve done this by using the helmet module which gives you a lot of protection against common web attacks like XSS and so forth which suites most of the OWASP Top 10. It’s easy enough for anyone to use and comes as Express middleware. We are also investing in authentication at our middleware layer as well.

You also want to make sure that the modules you’re using are trusted modules. Since the Node.js community is built by free and open source modules there is a risk that an engineer will use one without validating if it’s trusted or secure. We want to use only modules from trusted sources we know or who have a high level of confidence on npmjs.org. This is also where our internal npm registry comes in so that we can effectively blacklist npm modules that do not fit our criteria.

Last, ensure modules are validating your licensing model. Using a module that is MIT license is good. But as an enterprise you may have more strict requirements on other licenses. I recommend looking into off the shelf software to do this or initially investing in some open source tools. There are some npm modules that can do this for you.

DevOps

In your DevOps organization, you will need to make adjustments to support Node.js deployments most likely. A Node.js application deployment works differently than other applications but it’s actually quite simple. Here we use Chef for our provisioning of deployments, so we needed to make adjustments to our Chef recipes to add support for Node.js.

We needed to provision our servers to install Node.js, install Supervisor and install Nginx. We use this setup to gain the hiighest amount of throughput in a production environment.

Supervisor manages the Node.js process to ensure if it dies it is automatically restarted. It also manages the amount of instances of Node.js than run on the server. We take advantage of multiple cores on the server to scale both vertically and horizontally.

Nginx manages the inner process balancing of the incoming requests across the Node.js instances. Nginx is extremely efficient and is able to scale web requests really well. We prefer to use the tools that do a specific job and do it well.

If you have already used Node.js you are aware of the cluster module. The concern with using the cluster module to load balance your requests is that it’s still experimental according to the Node.js stability index. We prefer to build a long lasting model around deploying and managing Node.js instances in the case the cluster module changes it’s API or gets deprecated one day.

Community

The Node.js is a really amazing community. We leverage this community as much as we can in many ways. One way is we have reached out to others in the community to collaborate with them on how they overcame challenges with their adoption of Node.js. We’ve also brought in a few speakers to talk with our engineering group about the same topic as well as build a relationship with others in the community. For example, we’ve invited both Groupon and PayPal in to talk with our group which provided a lot of insight and you recognize than everyone has different business models but we’re after a lot of the same goals in regards to technology such as scalability, performance, security and so forth.

Envy

As we have continued to make progress and start to ship Node.js applications to production, something interesting started to happen. We’ve had other teams begin to want to do new applications in Node.js, prototype some new ideas in Node.js which effectively has started to create engineer envy. The way we want to roll an emerging platform out is through this model. If your engineering team feels there is a problem that is being solved, and it will help them be better at their job then they are much more inclined to adopt it. Happy engineers can ultimately lead to amazing products and new ideas.

Future

So what are our next steps in scaling Node.js here at Ancestry?

We’re continuing to invest in more common and cross cutting concerns. This is crucial to ensure as teams have common dependencies we get them built and in the right way. Optimizations in our architecture. Ensuring everything is exposed as a service communicating over common protocols and transports is crucial for some applications. We continue to make some more introductions to other industry leaders in the Node.js space and be more visible which is extremely helpful. More presence at Node.js meetups. We are also working to host Node.js meetups in our SF office soon.

This year we are also pushing to build our application service architecture around the microservices architecture. This includes also optimizing our application delivery platform with containerization and Docker.

Conclusion

Overall, it’s been an awesome learning experience for us but we are just getting started. But Node.js doesn’t come as free lunch and takes work. Hopefully this may help your adopt it efficiently and give you some tips. Oh, and we’re hiring!

Comments

Gleb

April 4, 2015 at 1:19 am

hi,
Great post. How do you handle testing and code quality across the organization?

Do you participate in local node meet ups and conferences related to nodejs and JavaScript

Reply
- Robert Schultz
  
  April 5, 2015 at 4:49 am
  
  Thanks for the feedback.
  
  Our testing and code quality is handed by a separate test automation team. Every engineer owns their quality and tests though as we like to ensure testing is part of the culture and is really a feature.
  
  I do personally, a few others as well in SF. We’re working to host a Node.js meetup soon in our SF office.
  
  Reply
David

November 21, 2015 at 10:09 pm

I have no clue what your actual involvement with Ancestry.com was last year, but I will say that I know what a disaster their new and improved services are. Apart from the fact that thousands of users find the new design revolting, website stability is a joke, constant functional problems arise daily. I don’t think I would be bragging about any involvement in their operation. It’s a joke and a fiasco. That of course is just my opinion.

Reply
George

December 17, 2015 at 7:43 pm

How do you store such huge volumes of interconnected data if not centrally located?

Reply

Join the Discussion

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated. For help with a specific problem, please contact customer service.