Playing around with Apple’s App Store Data and NoSQL Databases and Some Random Fun Facts
I’ve been playing around with Apple’s App Store Data (Enterprise Feed) and initially tried dumping it into RavenDB. I like the idea of a Lucene powered NoSQL database but after about 5 hours I decided to punt on RavenDB for a couple of reasons.
– The RavenDB administration user interface was not intuitive to me
– RavenDB costs 999$ to use
The C# API to RavenDB was fine, but it did seem slow to insert all the data.
So I settled on MongoDB which did a marvelous job of inserting. I found a 3rd party GUI Tool called MongoVUE that seems decent enough although it costs money after two weeks. The first thing I did was find You Doodle in the App Store data and it was there safe and sound. Then I thought “What is the largest app in the Apple App Store?”…
Answer: The Witcher 2: Assassins of Kings Enhanced Edition
Comes in at a whopping 21,040,027,421 bytes (that’s about 21GB – yikes, hope you have an iPad 3 or 4 with 64 or 128 GB :))
And the smallest tiniest app…
Well that depends. If you throw out all the zeroes then it is…
HiCon Lite at a measly 16,623 bytes. I was skeptical so I downloaded it. Before I could blink it was ready to open on my device. This cute little app lets you take a photo and turns it black and white.
Anyway, I noticed a couple problems with Apple’s enterprise feed that I have notified them about.
– All the screenshots are not included
– They do not include release notes for past updates
– The app pricing is bundled with the itunes music and collection pricing which requires a staggering 200GB download. I wish that it was a separate file.
– The items in the data files are not sorted on primary key which required me to merge sort each file on the primary key before aggregating everything together
– Keywords are not included – I know this is a sensitive issue since this is how search works in the App Store, but come on, what if I want to make a search service for App Store apps? Guess I can do title, genre and description.
– Reviews are not included
It’s nice that they provide the data but it’s obviously some neglected process that isn’t given much love. Even the documentation on Apple’s website doesn’t match what’s in the files in some cases.