HLLs and Polluted Registers
Introduction It’s worth thinking about how things can go wrong, and what the implications of such occurrences might be. In this post, I’ll be taking a look at the HyperLogLog (HLL) algorithm for...
View ArticleHyperLogLog++: Google’s Take On Engineering HLL
Matt Abrams recently pointed me to Google’s excellent paper “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” [UPDATE: changed the link to the...
View ArticleDoubling the Size of an HLL Dynamically – Unions
Author’s Note: This post is related to a few previous posts dealing with the HyperLogLog algorithm. See Matt’s overview of the algorithm, and see this post for an overview of “folding” or shrinking...
View ArticleDoubling the Size of an HLL Dynamically – Extra Bits…
Author’s Note: This post is related to a few previous posts on the HyperLogLog algorithm. See Matt’s overview of the algorithm, and see this for an overview of “folding” or shrinking HLLs in order to...
View ArticleFoundation Capital and Aggregate Knowledge Sponsor Streaming/Sketching...
We, along with our friends at Foundation Capital, are pleased to announce a 1 day mini-conference on streaming and sketching algorithms in Big Data. We have gathered an amazing group of speakers from...
View ArticleData Science Summit – Update
I don’t think I’m going out on a limb saying that our conference last week was a success. Thanks to everyone who attended and thanks again to all the speakers. Muthu actually beat us to it and wrote up...
View ArticleSketch of the Day: Frugal Streaming
We are always worried about the size of our sketches. At AK we like to count stuff, and if you can count stuff with smaller sketches then you can count more stuff! We constantly have conversations...
View ArticleDifferential Privacy: The Basics
Hi! I’m Anthony Tockar. I am a masters student at Northwestern University and have been working with the Science team for the summer. This is the first of two posts I will contribute on the topic of...
View ArticleRiding with the Stars: Passenger Privacy in the NYC Taxicab Dataset
In my previous post, Differential Privacy: The Basics, I provided an introduction to differential privacy by exploring its definition and discussing its relevance in the broader context of public data...
View ArticleNeustar at the 2015 Grace Hopper Celebration of Women in Computing
Author’s Note: Hello readers! I’m Julie Hollek, and I am a data scientist at Neustar, where I focus on understanding questions around identity. Before joining Neustar, I received my PhD in astronomy...
View Article
More Pages to Explore .....