Our capabilities can be widely judged as being able to all types of data, numeric or text, structured or unstructured, big or small. We have a highly capable team of data scientists, well versed in the principles of statistics as well as expert programming in R and Python.

Some of the problems which we have solved are as follows:


Text Mining

Text mining refers to analyzing a corpus of text data and finding useful patterns from it. For example, we may collect tweets with keywords presidential elections over a week and try to find out how public sentiments as represented by tweeting population are changing about various candidates. This is called sentiment analysis. Similarly, we may be interested in finding out the major topics of discussions among public with regard to these candidates. This is called topic identification. Moreover, we may be interested in classifying these tweets to already created classes like political, social, crime etc. and then classify them based on the content within. This problem is called ‘classification’. We have created a framework which downloads tweets and then perform sentiment analysis, topic identification and classification in them. Tweets are particularly difficult to handle as they are short messages and people use a lot of slangs in them.

Recommendation Systems

When we visit looking for a book, the site not only shows the book we are looking for, it also recommends us to other books which we may be interested in. This recommendation is a win-win for the visitor as well as Amazon. Amazon hosts millions of books and it is not possible for the user to go through all of them and find out which ones he may like. Thus, recommending books based on what user may like is a great service to the user. It is also a great way to sell for Amazon. Most of these recommendations are made based on collaborative filtering. We have created user-user based collaborative filtering algorithms and applied these to publicly available Movie Lens data set.