How can data be made smaller, and when does compression lose information?
Topic 2.2 Data Compression: compression reduces the number of bits used to store data; lossless compression preserves all information, while lossy compression discards some to save more space.
A focused answer to AP CSP Topic 2.2, covering why compression matters, lossless versus lossy compression, run-length encoding as a lossless example, the trade-offs of lossy compression for images and audio, and how to choose between them.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 2.2) wants you to understand data compression: reducing the number of bits used to store or transmit data. You must distinguish lossless compression (the original data can be reconstructed exactly) from lossy compression (some data is permanently discarded to save more space), and reason about which to use for a given situation. Compression matters because smaller data is cheaper to store and faster to send.
Why compress data
Lossless compression
A simple lossless technique is run-length encoding, which replaces a run of repeated values with the value and a count. For example the pixel run W W W W W can be stored as 5W, which is fewer bits but fully reversible: from 5W you recover exactly W W W W W.
Lossy compression
The key fact is irreversibility: once data is thrown away, it is gone. Compress a photo aggressively and you cannot recover the original pixels.
Choosing between them
The decision is a trade-off between size and fidelity:
- If the data must be restored exactly (text, contracts, code, medical records), use lossless.
- If much smaller size is the priority and small quality loss is acceptable (photos, music, video), use lossy.
Try this
Q1. Why is lossless compression required for a computer program's source code? [2 points]
- Cue. Source code must be reconstructed exactly to run correctly; even one changed character could break it, so no information can be lost, which requires lossless compression.
Q2. State one advantage lossy compression has over lossless compression for storing photos. [1 point]
- Cue. Lossy compression can achieve much smaller file sizes than lossless, saving more storage and transmission cost.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2021 (style)1 marksMultiple choice. A user wants to compress a text document containing the exact wording of a legal contract so that it can be perfectly restored. Which type of compression should be used, and why?
(A) Lossy, because it produces the smallest file.
(B) Lossless, because the original data must be reconstructed exactly.
(C) Either, because both restore the data exactly.
(D) Neither, because text cannot be compressed.
Show worked answer →
The answer is (B).
A legal contract must be restored exactly, so the compression must lose no information: that is lossless compression, which allows the original data to be perfectly reconstructed. (A) lossy compression discards data and cannot perfectly restore a contract. (C) is wrong: only lossless restores exactly; lossy does not. (D) is wrong: text compresses well (often losslessly).
Markers reward matching the requirement "restore exactly" to lossless compression.
AP 2023 (style)2 marksFree response (short). A photo-sharing app must store millions of user photos using as little storage as possible, and small visual imperfections are acceptable. State which type of compression is appropriate and explain the trade-off involved.
Show worked answer →
A 2-point question on the lossy trade-off.
Point 1: Lossy compression is appropriate, because the app prioritizes minimizing storage and small imperfections are acceptable. Lossy compression can reduce file size much more than lossless by discarding data the human eye is unlikely to notice.
Point 2: The trade-off is that some information is permanently lost: the original photo cannot be reconstructed exactly, and aggressive compression can produce visible quality loss. The app trades perfect fidelity for greatly reduced storage. A common error is to claim lossy can still restore the original, which it cannot.
Related dot points
- Topic 2.1 Binary Numbers: computers represent all data with bits (binary digits); numbers, text, images and sound are encoded in binary, and fixed bit-widths cause overflow and rounding.
A focused answer to AP CSP Topic 2.1, covering bits and bytes, binary-to-decimal conversion, why all data is represented in binary, analog versus digital, fixed bit-width consequences (overflow and rounding errors), and abstraction in data representation.
- Topic 2.3 Extracting Information from Data: information is extracted from data through processing, filtering, transforming and combining data sets, and correlation does not imply causation.
A focused answer to AP CSP Topic 2.3, covering the difference between data and information, processing data to find patterns and trends, filtering and transforming, metadata, combining data sets, and the limits of data including correlation versus causation.
- Topic 2.4 Using Programs with Data: programs process large data sets through cleaning, filtering, classifying and transforming data, often using lists and iteration to scale to large amounts of data.
A focused answer to AP CSP Topic 2.4, covering why programs are essential for large data sets, cleaning and classifying data, filtering with conditionals, using lists and iteration to process data at scale, and visualizing results, with worked pseudocode.
- Topic 4.1 The Internet: the Internet is a network of networks that moves data in packets using protocols such as IP and TCP, with addressing, routing and standards enabling scalable communication.
A focused answer to AP CSP Topic 4.1, covering the Internet as a network of networks, IP addresses, packets and packet switching, protocols (IP, TCP, HTTP, DNS), bandwidth and latency, redundancy in routing, and why open standards enable scalability.
- Topic 3.2 Data Abstraction: data abstraction manages complexity by giving a collection of data a single name, most commonly using a list to represent many values as one variable.
A focused answer to AP CSP Topic 3.2, covering what data abstraction is, how a list represents many values under one name, the benefits for managing and modifying programs, the link to procedural abstraction, and why abstraction manages complexity.
Sources & how we know this
- AP Computer Science Principles Course and Exam Description — College Board (2025)