Big Data HS2025 — Q&A Session Quiz (Attempt 1) — Notion Notes


Q1 — True/False (Data + Big Data + RA)

Statement 1

Claim: Data independence refers to the decoupling of the logical view of data from its physical storage. Answer: ✅ True Why: Data independence means you can change physical storage (files, indexes, partitions, hardware) without changing the logical schema seen by users/apps.

Statement 2

Claim: Big Data technologies primarily address the growing discrepancy between storage capacity, throughput, and latency. Answer: ✅ True Why: We can store tons of data cheaply (capacity grows fast), but moving/processing it (throughput) and accessing it quickly (latency) is harder. Big Data systems (distributed storage + parallel compute) are built around this bottleneck.

Statement 3

Claim: Renaming columns in SQL does not have an equivalent relational algebra operator. Answer: ❌ False Why: Relational algebra has the rename operator ρ (rho), which can rename relations and attributes. SQL AS corresponds to ρ.

Statement 4

Claim: Big Data systems often discard normalization to improve scalability and performance. Answer: ✅ True Why: In distributed systems, joins are expensive (network + shuffle). Denormalization trades redundancy for fewer joins and better read/write performance at scale.


Q2 — SQL (Discogs): Artist with most different release countries

Task: Select artist name with albums released in the most different countries (tie-breaker: alphabetical), excluding "Various Artists". Answer:Technotronic

How to think about it: