Hanukkah of Data

Hanukkah of Data is a set of data puzzles based around a storyline, released in 2022. I found out about this a few months past its release, and recently solved them using SQLite.

The styling for this page is inspired by the look of Jupyter Notebooks, however I solved these in the SQLite command line shell.

0. Noah’s Market

The story revolves around finding the missing rug of Noah, who’s the owner of a store called Noah’s Market in Manhattan. The dataset is a backup of the Noah’s Market database, created by Alex in 2017. However, it is password protected.

“Alex set up the backups to be password-protected. I can never remember the password itself, but it’s just the year in the Hebrew calendar when Alex set up the database.”

Some searching later, I found the password - 5777. With the dataset unlocked, I had a look at the tables and their schema.

.open noahs.sqlite
.schema
CREATE TABLE products (sku text,desc text,wholesale_cost decimal(10,2),dims_cm array);
CREATE TABLE customers (customerid integer,name text,address text,citystatezip text,birthdate text,phone text,timezone text,lat decimal(10,5),long decimal(10,5));
CREATE TABLE orders (orderid text,customerid text,ordered timestamp,shipped timestamp,total decimal(10,2),items array);
CREATE TABLE orders_items (orderid integer,sku text,qty integer,unit_price decimal(10,2));

I found myself having to join these tables pretty often, so I saved the output as a view for easy reference. I also learnt about NATURAL JOIN as a shorthand to JOIN ON primarykey = foreignkey, making the command shorter (and it also removes duplicate columns!)

CREATE VIEW IF NOT EXISTS noahsrug AS
SELECT * FROM customers
NATURAL JOIN orders
NATURAL JOIN orders_items
NATURAL JOIN products;

The next set of puzzles have a similar structure - the description contains clues to a customer of Noah’s Market who may have the rug, and submitting the customer’s phone number solves the puzzle.

1. The Investigator

This person’s phone number is the numeric representation of their last name. So each letter corresponds to the digits printed on the phone buttons - “abc” to 2, “def” to 3, and so on.

While the solution seemed easy conceptually, there’s no existing SQLite function that maps characters to numbers, making this a lot harder than I expected. This led me to look into custom extensions, and I used the rot13 extension code as the basis for mine, phonenum.

phonenum.c
#include "sqlite3ext.h"
SQLITE_EXTENSION_INIT1
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

static void phonenum(sqlite3_context *context, int argc, sqlite3_value **argv) {
  assert(argc == 1);

  if (sqlite3_value_type(argv[0]) == SQLITE_NULL)
    return;

  const unsigned char *in = sqlite3_value_text(argv[0]);
  int len = sqlite3_value_bytes(argv[0]), i = 0;
  unsigned char out[len];

  for (; i < len; i++) {
    unsigned char c = tolower(in[i]);
    switch (c) {
    case 'a':
    case 'b':
    case 'c':
      c = '2';
      break;
    case 'd':
    case 'e':
    case 'f':
      c = '3';
      break;
    case 'g':
    case 'h':
    case 'i':
      c = '4';
      break;
    case 'j':
    case 'k':
    case 'l':
      c = '5';
      break;
    case 'm':
    case 'n':
    case 'o':
      c = '6';
      break;
    case 'p':
    case 'q':
    case 'r':
    case 's':
      c = '7';
      break;
    case 't':
    case 'u':
    case 'v':
      c = '8';
      break;
    case 'w':
    case 'x':
    case 'y':
    case 'z':
      c = '9';
      break;
    }
    out[i] = c;
  }
  out[i] = 0;
  sqlite3_result_text(context, (char *)out, i, SQLITE_TRANSIENT);
}

#ifdef _WIN32
__declspec(dllexport)
#endif
int sqlite3_phonenum_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi) {
  int rc = SQLITE_OK;
  SQLITE_EXTENSION_INIT2(pApi);
  
  (void)pzErrMsg; /* Unused parameter */
  
  if (rc == SQLITE_OK) { 
    rc = sqlite3_create_function(db, "phonenum", 1, 
                                SQLITE_UTF8 | SQLITE_INNOCUOUS | SQLITE_DETERMINISTIC,
                                0, phonenum, 0, 0);
  }
  
  return rc;
}

A few hours of debugging later, I compiled the code as a shared object and loaded it in SQLite.

.shell gcc -fPIC -I/opt/homebrew/opt/sqlite/include -shared -o phonenum phonenum.c
.load phonenum

I used a combination of substr() and instr() to get the last name from the full name, before passing it to phonenum().

SELECT name, phone
FROM customers
WHERE phonenum(substr(name, instr(name, ' ') + 1)) == replace(phone, '-', '')
LIMIT 1;
Sam Tannenbaum|826-636-2286

2. The Contractor

Theis person’s initials are ‘JP’ and they shopped from Noah’s Market in 2017, while they were tasked with the cleaning of the rug. One of the purchased items mentioned were cleaning supplies, so I first looked for related items in the products database.

SELECT sku, desc FROM products WHERE desc LIKE '%rug %';
HOM2761|Rug Cleaner
SELECT name, phone
FROM noahsrug
WHERE name LIKE 'J% P%'
  AND strftime('%Y', shipped) == '2017'
  AND desc == 'Rug Cleaner'
LIMIT 1;
Joshua Peterson|332-274-4185

3. The Neighbor

The person lives in the the same neighborhood as the Contractor, and is a Cancer born in the Year of the Rabbit.

First, I got the name of the neighborhood using the phone number I found earlier.

SELECT citystatezip FROM customers WHERE phone == '332-274-4185';
Jamaica, NY 11435

Then coming to the clues about the person’s birthdate. The astrological sign, Cancer, gives us a range for the day and month (21 June – 22 July), while the Chinese zodiac, Rabbit, gives a range for the birth year.

I checked for the earliest and latest birth years in the dataset, to add all valid birth years in the WHERE clause.

SELECT min(birthdate), max(birthdate) FROM customers;
1935-01-29|2001-12-31
SELECT name, phone
FROM customers
WHERE ((strftime('%d', birthdate) >= '21'
        AND strftime('%m', birthdate) == '06')
       OR (strftime('%d', birthdate) <= '22'
           AND strftime('%m', birthdate) == '07'))
  AND citystatezip == 'Jamaica, NY 11435'
  AND strftime('%Y', birthdate) IN ('1939', '1951', '1963', 
                                    '1975', '1987', '1999');
Robert Morton|917-288-9635

4. The Early Bird

This person met The Neighbor at 5AM with pastries they just bought from Noah’s. This means that the order happened somewhere close to 5AM. I filtered for pastry orders by checking if the sku starts with BKY for bakery items.

The database also contains the ordered and shipped timestamps for each order. Since they were purchased from the store (as opposed to being ordered from home for example), the ordered and shipped time would be the same for this order.

SELECT name, phone
FROM noahsrug
WHERE strftime('%H', shipped) == '04'
  AND sku LIKE 'BKY%'
  AND ordered == shipped
LIMIT 1;
Renee Harmon|607-231-3605

5. The Cat Lady

The Early Bird decided to give away the rug during their “Marie Kondo phase”, which would be sometime after 2019. They gave it to a person from Staten Island, who was wearing a “Noah’s Market” sweatshirt and owned 10 or 11 old cats.

This means they purchased lots of cat food from Noah’s. There are products for Adult and Senior cats, so I filtered for those, and searched for the person with the most cat related orders.

SELECT name, phone
FROM noahsrug
WHERE citystatezip LIKE 'Staten Island%'
  AND desc REGEXP '.*(Adult|Senior) Cat.*'
  AND strftime('%Y', shipped) >= 2019
GROUP BY name 
ORDER BY count(*) DESC
LIMIT 1;
Nicole Wilson|631-507-6048

The sweatshirt clue turned out to be a red herring, which I did spend quite some time on. In hindsight, focusing on the puzzle’s title would have helped save some time. 😅

The speedrun description of this puzzle didn’t have the Staten Island clue, so I removed it and…the solution didn’t work. After some attempts at trying different queries and nothing really working, I looked solutions online. Turns out, I misunderstood the phrase “most cat related orders”.

My solution finds the maximum number of orders…but doesn’t consider the “quantity” of each order. Since the person owns 10 or 11 cats, they would buy 10 or 11 quantities of the same item! Knowing this made the solution much simpler than the original, and worked for both the original puzzle and the speedrun.

SELECT name, phone
FROM noahsrug
WHERE desc REGEXP '.*(Adult|Senior) Cat.*'
  AND qty >= 10
LIMIT 1;
Nicole Wilson|631-507-6048

6. The Bargain Hunter

I was stuck on this for a long time, and I ended up slightly peeking at another solution to get unstuck. The solution comes from this one line in the description:

In fact I like to tease her that Noah actually loses money whenever she comes in the store.

This line always felt relevant from the beginning, but I didn’t know what to look for. Now that I was sure, I focused my attention on the columns in the database. Two of them stood out, wholesale_cost (the price Noah’s bought the item for) and unit_price (the price Noah’s sold the item for).

If the unit price is lesser than the wholesale cost, that would mean a loss. The person having the maximum transactions resulting in loss would be The Bargain Hunter.

SELECT name, phone
FROM noahsrug
WHERE unit_price < wholesale_cost
GROUP BY name
ORDER BY count(*) DESC
LIMIT 1;
Sherri Long|585-838-9161

7. The Meet Cute

This person was at Noah’s on the same day as the Bargain Hunter, and purchased the same item in a color that the Bargain Hunter wanted.

There are no clues on what day this happened, so I started with finding color related items purchased by the Bargain Hunter. All products with color variations have the sku starting with COL. Ordered and shipped time will also be the same here.

SELECT orderid, desc
FROM noahsrug
WHERE sku LIKE 'COL%'
  AND name == 'Sherri Long'
  AND ordered == shipped;
70503|Noah's Poster (azure)
124821|Noah's Action Figure (green)

Using these queries, I checked for orders around the same time for the same item manually, by comparing the substring of the product names (without the color) and checking for the last 5 orders before the Bargain Hunter’s order.

SELECT name, phone, orderid
FROM noahsrug
WHERE ordered == shipped
  AND desc LIKE 'Noah''s Poster%'
  AND orderid BETWEEN 70498 AND 70502;
Carlos Myers|838-335-7157|70502

The first one returned an answer, and it happens to be the previous order ID! However, just to be sure I checked for the Action Figure query too, which returned no result.

SELECT name, phone, shipped
FROM noahsrug
WHERE ordered == shipped
  AND desc LIKE 'Noah''s Action Figure%';
  AND orderid BETWEEN 124816 AND 124820;

However, this solution didn’t work for the speedrun for two reasons.

  1. There are items apart from the ones starting with COL that have color variations. The format of the items is Item name (color), so I changed the WHERE condition to look for a bracket in the description instead.

  2. The speedrun dataset returned more queries than the two I got here, so manually checking each order was time-consuming. I looked for a way to automate the checking, which I did using the WITH clause to save the initial orders as a temporary table. The table is then joined with noahsrug, and the WHERE conditions become JOIN conditions.

WITH a AS 
  (SELECT orderid, desc
   FROM noahsrug 
   WHERE desc LIKE '%(%' 
     AND name == 'Sherri Long'
     AND ordered == shipped)
SELECT b.name, b.phone
FROM a 
JOIN noahsrug AS b 
  ON substr(a.desc, 1, instr(a.desc, '(')) == substr(b.desc, 1, instr(b.desc, '(')) 
 AND b.orderid BETWEEN a.orderid - 5 AND a.orderid - 1;
Carlos Myers|838-335-7157

8. The Collector

Ending on an easier note after some tough puzzle. This person has a set of all Noah’s collectibles. These are items that contain “Noah’s” in their name, or more precisely, the sku starts with COL - short for collectibles. Finding the person with the maximum collectibles purchases solves this puzzle.

SELECT name, phone
FROM noahsrug
WHERE sku LIKE 'COL%'
GROUP BY name
ORDER BY count(*) DESC
LIMIT 1;
James Smith|212-547-3518

Speedrun Mode

Completing all the puzzles unlocks the speedrun mode, which lets you attempt the same puzzles with a different dataset and slightly different clues. I also got the idea of solving all puzzles at once while reading Ella Kaye’s writeup and wanted to do something similar for SQLite.

I considered writing a Python script, but then remembered that I can execute multiple queries written in a SQL file from SQLite directly. That seemed about right, so I started by adding all of the solution queries to noahs-speedrun.sql.

I saved each solution query to a temporary view, so that I could reference them in puzzles where information from a previous puzzle is required. Lastly, I added SELECT statements which display all of the solutions together.

noahsrug-speedrun.sql
.load phonenum
.mode list
.headers off

CREATE VIEW IF NOT EXISTS noahsrug AS
SELECT * FROM customers
NATURAL JOIN orders
NATURAL JOIN orders_items
NATURAL JOIN products;

CREATE TEMP VIEW investigator AS 
SELECT phone
FROM customers
WHERE phonenum(substr(name, instr(name, ' ') + 1)) == replace(phone, '-', '')
LIMIT 1;

CREATE TEMP VIEW contractor AS
SELECT phone, citystatezip
FROM noahsrug
WHERE name LIKE 'D% S%'
  AND strftime('%Y', shipped) == '2017'
  AND desc == 'Rug Cleaner'
LIMIT 1;

CREATE TEMP VIEW neighbor AS
SELECT phone
FROM customers
WHERE ((strftime('%d', birthdate) >= '23'
        AND strftime('%m', birthdate) == '09')
       OR (strftime('%d', birthdate) <= '22'
           AND strftime('%m', birthdate) == '10'))
  AND citystatezip == (SELECT citystatezip FROM contractor)
  AND strftime('%Y', birthdate) IN ('1943', '1955', '1967', 
                                    '1979', '1991');

CREATE TEMP VIEW earlybird AS
SELECT phone 
FROM noahsrug
WHERE sku LIKE 'BKY%'
  AND strftime('%H', shipped) == '04'
  AND ordered == shipped 
LIMIT 1;

CREATE TEMP VIEW catlady AS
SELECT phone
FROM noahsrug
WHERE desc REGEXP '.* (Adult|Senior) Cat.*'
  AND qty >= 10
LIMIT 1;

CREATE TEMP VIEW bargainhunter AS
SELECT name, phone
FROM noahsrug
WHERE unit_price < wholesale_cost
GROUP BY name
ORDER BY count(*) DESC
LIMIT 1;

CREATE TEMP VIEW meetcute AS
WITH a AS 
  (SELECT phone, desc, orderid 
   FROM noahsrug 
   WHERE desc LIKE '%(%' 
     AND name == (SELECT name FROM bargainhunter)
     AND ordered == shipped)
SELECT b.name, b.phone
FROM a
JOIN noahsrug AS b
  ON substr(a.desc, 1, instr(a.desc, '(')) == substr(b.desc, 1, instr(b.desc, '(')) 
 AND b.orderid == a.orderid - 1;

CREATE TEMP VIEW collector AS
SELECT name, phone
FROM noahsrug
WHERE sku LIKE 'COL%'
GROUP BY name
ORDER BY count(*) DESC
LIMIT 1;

SELECT phone FROM investigator;
SELECT phone FROM contractor;
SELECT phone FROM neighbor;
SELECT phone FROM earlybird;
SELECT phone FROM catlady;
SELECT phone FROM bargainhunter;
SELECT phone FROM meetcute;
SELECT phone FROM collector;
.open noahs-speedrun.sqlite
.read noahsrug-speedrun.sql
767-365-7269
838-351-0370
914-594-5535
716-789-4433
347-835-2358
838-295-7143
516-544-4187
516-638-9966