Apr 14

MongoDB Presentation for CS202

You can download my MongoDB presentation that is prepared for my CS202 course, here.

DB Presentation.001

Feb 14

MongoDB notes

Nowadays, I am working on an academic research, which a database server is required. Because of the amount of data we will have, and being able to easily store JSON files directly to the database, we have selected MongoDB which is a NoSQL open source database, very popular one actually,

Knowing the traditionally way to query in a database like MySQL, SQLite etc., it was a kind of a harsh learning curve for me to understand the basic functionallity at the beginning. Once I grasped that, learning other things are significantly faster than the beginning.

So, there are some small notes that I gathered from lot’s of different sources that I wanted to share.

For me handling and designing data is like a medical surgery. Let’s assume you have an increasing amount of data, records increases over hundreds of thousands. While recording the data, data source may give some fields corrupted or not the way that you expect, even you have read the all things on documentation of that source. (There is a very small percentage of finding a well-documented data sources, if you think you found, it may also not always perfect.) For myself, I was fetching a data which has geolocation features(latitude, longitude) for more than one source. I used inheritance and abstraction for creating a base point, and implemented new classes which inherits this abstract class to convert these data based on differences.

However later implementing these services, after several days I wanted to make 2dsphere based queries on Mongo to search records that are close to my-desired-location. It didn’t worked. So I figured out that, that source was the only one that gave coordinates in String form. Hopefully, I didn’t have so many records. I simply write this to the MongoDB console and magic happened! And I changed my subclass immediately. I learned that I should not get lost in the joy of sending a JSON file without looking the data. Testing, testing, evaluating results and testing again is very important.

After for making queries like, “get me records that my distance to these records are at max is this ..” were easy. (Note that $maxDistance is always meters for $geometry = GeoJSON format else radians)

Before writing this query, you must make sure that you have indexed this “loc” field as “2dsphere”. Indexing is required for this action.

Other handful notes:

This one says, find me 5 items where “city” field is Istanbul and print prettily.

This case, you may have a customer data, you expected customers to enter two mails or write only a primary mail. Code snippet at below, gets all customers’ primary and secondary mail fields who actually have “primary mail” field or “secondary mail” field. Making logical operations like this are very powerful and save your time.

You wanted to get rid of a field, that is easy:

For more details about writing a query you should also check this page.

I wished that you could have written more about MongoDB and of course using it with a Python Adapter. May be next time.