Analytics with MongoDB (or.. who sharded??)
Previously, I told you about how much we @LocalResponse love Node.js. Well, last night we had out first NYC Node.js Meetup and it totally rocked! It’s great to see the energy and the passion of developers for new technologies. Today, I want to delve a bit deeper into another technology that we love… MongoDB.
First a super quick primer on MongoDB and how it differs from typical relational databases. Mongo is Schema-less, which means that the database is not confined a strict Schema like those imposed on MySQL DB’s. As a concrete example, consider your standard MySQL DB that has several columns, each with a type and rows to fill those columns. If you’ve defined a column named amount as an int, then whatever data you put in there must always be an int or else bad, bad things happen. With Mongo, there is no such constraint. Within one collection (the Mongo equivalent of a MySQL table), you can have completely different documents (MySQL’ers call these rows) with wildly differing data sets! I won’t get into more detail than that except to say there are TONS of awesome resources out there to learn more about MongoDB:
So one problem that we have is: how do we access LOTS of data quickly? We threw around a lot of ideas and the one that we settled on (for now) is to split one large collection into thousands and thousands of smaller collections (not that these smaller collections are that small - each one can grow to ~50,000 documents). The way MongoDB shards, it allows these different collections to be stored on different machines and we don’t need to worry about those technical details. Go Mongo!
We are now able to perform (near) real-time analytics on thousands (and thousands and thousands) of data points with minimal load on our system.
Share your thoughts, community! You can tweet me: @shamoons and let’s the conversation going. I’d love to hear how others have solved similar problems.
Till next time,
Shamoon Siddiqui, Director of Technical Magic