Logistic regression on R - faillll
^Xiaobai, one of the dogs that my family is interested in adopting from ASD!Today I thought I try logistic regression to find the variables that might predict whether a dog gets adopted. (Using the same dataset as my previous post, from kaggle)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dataset = read.csv ("train.csv", header=TRUE, na.strings=c("")) | |
dog = subset(dataset, AnimalType == "Dog") | |
dog = subset(dog, OutcomeType!="Return_to_owner") | |
#add column to check if adopted or not | |
for (i in 1:nrow(dog)) { | |
if (dog[i,4] == "Adoption") | |
dog[i,"adopted"] = TRUE | |
else | |
dog[i,"adopted"] = FALSE | |
} | |
#remove columns that I don't want in the model | |
dog_filtered = dog[-c(1:3,5,6)] | |
#logistic regression | |
model <- glm(adopted ~.,family=binomial(link='logit'),data=dog_filtered) | |
summary(model) | |
#model did not converge, too many breeds and color | |
levels(dog_filtered$Color) | |
#returned 365 results | |
levels(dog_filtered$Breed) | |
#returned 1380 results |
But it failed terribly. R returned a message "model did not converge". I should have realised it when I checked the levels for color and breed and both returned me with over 100 unique values.
This is a screenshot of the results returned. super fail ):
The next step forward for me is to create columns that indicates whether that color of the dog is present. For instance, I will have a column for white, black, brown. If a dog is white/black, it'll have "1" under white and black columns and "0" under brown.
PS: I'm really taking baby steps with this blog!