kbabuji

Topic modeling with Gensim

I first started doing topic modeling when I used to play around with the nips dataset. The first time I tried it, I used scikit-learn for this. I used LDA and NMF for this, and I received results that I was happy with. In this way, I think scikit-learn is one of the most appropriate tools available for exploratory data science tasks. But I had bigger plans, of tackling even bigger datasets. Then I got introduced to another python library gensim which is focused on topic modeling. Among many features it provides, it includes transformations such as online LDA, LSA and HDP, and wrappers to other popular libraries like scikit-learn, vowpal wabbit, and Mallet.

The code can be viewed at my Github repository.

Read more ...

Shelter Animal Outcomes

A puppy taken in at a shelter

Picture taken from SOS

Shelter animal outcomes is a knowledge competition hosted by kaggle. My goal was to learn and get familiarized with the different techniques, methods and tools required to solve classification problems. In my adventures I used pandas, matplotlib, seaborn, jupyter and scikit-learn. The goal of this competition is to predict the outcome of a cat or a dog, given it’s age, gender(also whether spayed/neutered or not), breed, color. The outcomes are return-to-owner, adoption, transfer, euthanasia and died. In some cases we are provided with outcome subtypes too.

All my work related to this can be found in this github repository, it houses my jupyter notebooks.

Read more ...