Miscarriage rates in US
I applied for the Google Squared Data and Analytics programme, and the test we had to do was an exploratory analysis on the Natality dataset.
It's on the Google Server and we had to access it using their BigQuery application, which is pretty cool.
https://bigquery.cloud.google.com/table/publicdata:samples.natality But because we only had 3 days, and I was busy studying for my six sigma quiz at the same time, I didn't have time to fully explore and push the level of analysis to a notch higher.
Since we were only given around 5 slides to present the findings, I chose to focus on a specific topic, miscarriage rates and its possible causes. And attached are my findings.
The data are retrieved using the BigQuery application using SQL queries. Surprisingly, I didn't have too much problems with it. The downloaded data were then managed in excel and charts were generated using excel as well. I would have used R, to feel more "pro", but honestly, I think excel would be the best way to customise the charts, and thus I ended up working on the data mostly in excel too.
Some things I would have done if I had the luxury of time.
- To represent the data in a map format, using QGIS
- To build a R Shiny application that allows interactivity of the data (this would probably take me a month though ahaha)
But overall, I think I did okay, and I found the process really fun actually. It has been awhile since I really have a set of data which I can work on in my own ways. I always felt slightly restricted working on data from projects in school, because I feel confined to the methods my group agrees upon. But it does have its advantage in providing a wider perspective compared to the narrower one I would have taken, had I delved into it alone.