Thursday, April 14, 2016

Logistic regression on R - faillll

^Xiaobai, one of the dogs that my family is interested in adopting from ASD! 

Today I thought I try logistic regression to find the variables that might predict whether a dog gets adopted. (Using the same dataset as my previous post, from kaggle)

dataset = read.csv ("train.csv", header=TRUE, na.strings=c(""))
dog = subset(dataset, AnimalType == "Dog")
dog = subset(dog, OutcomeType!="Return_to_owner")
#add column to check if adopted or not
for (i in 1:nrow(dog)) {
if (dog[i,4] == "Adoption")
dog[i,"adopted"] = TRUE
else
dog[i,"adopted"] = FALSE
}
#remove columns that I don't want in the model
dog_filtered = dog[-c(1:3,5,6)]
#logistic regression
model <- glm(adopted ~.,family=binomial(link='logit'),data=dog_filtered)
summary(model)
#model did not converge, too many breeds and color
levels(dog_filtered$Color)
#returned 365 results
levels(dog_filtered$Breed)
#returned 1380 results

But it failed terribly. R returned a message "model did not converge". I should have realised it when I checked the levels for color and breed and both returned me with over 100 unique values.

This is a screenshot of the results returned. super fail ):


The next step forward for me is to create columns that indicates whether that color of the dog is present. For instance, I will have a column for white, black, brown. If a dog is white/black, it'll have "1" under white and black columns and "0" under brown. 

PS: I'm really taking baby steps with this blog! 

Related Articles

0 comments:

Post a Comment