How do we turn raw data into useful information, and what makes data meaningful?
Topic 2.3 Extracting Information from Data: information is extracted from data through processing, filtering, transforming and combining data sets, and correlation does not imply causation.
A focused answer to AP CSP Topic 2.3, covering the difference between data and information, processing data to find patterns and trends, filtering and transforming, metadata, combining data sets, and the limits of data including correlation versus causation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 2.3) wants you to understand how raw data becomes useful information. Data on its own is just values; processing it (filtering, transforming, combining, summarizing) reveals patterns, trends and relationships that answer questions. You must also understand metadata, the value of combining data sets, and a critical limit: correlation does not imply causation. Programs and tools make this processing scalable.
Data versus information
Ways to process data
Extracting information involves transforming raw data:
- Filtering. Keep only the records relevant to a question (for example, only one bus route, or only weekdays).
- Transforming. Convert values into a more useful form (timestamps into delays, raw scores into percentages).
- Combining. Merge multiple data sets to reveal relationships none shows alone (weather data plus sales data).
- Summarizing. Compute statistics (average, maximum, count) or detect trends and patterns over time.
- Visualizing. Present data as charts so patterns are easier for humans to see.
Metadata
Correlation is not causation
A program can find that two quantities change together, a correlation, but that does not mean one causes the other. A third, hidden factor may drive both. Ice cream sales and swimming injuries both rise in summer because of hot weather, not because one causes the other. Treating a correlation as proof of cause is a classic data-analysis error the CED tests.
Try this
Q1. Explain the difference between data and information. [2 points]
- Cue. Data is raw, unprocessed values; information is the meaning or insight extracted from data by processing it (filtering, summarizing, finding patterns).
Q2. Give one example of metadata for a digital photo and say how it could be useful. [2 points]
- Cue. The GPS location (or date taken) is metadata; it lets you organize or search photos by where (or when) they were taken.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2022 (style)1 marksMultiple choice. A data set shows that as ice cream sales rise, the number of swimming-related injuries also rises. Which conclusion is best supported?
(A) Eating ice cream causes swimming injuries.
(B) Swimming injuries cause people to buy ice cream.
(C) The two are correlated, but a third factor such as hot weather may explain both.
(D) The data set contains an error because the two are unrelated.
Show worked answer →
The answer is (C).
The data show a correlation (the two rise together), but correlation does not imply causation. A likely third factor, hot weather, increases both ice cream sales and swimming. (A) and (B) wrongly assume one causes the other. (D) is wrong: a correlation can be real without one causing the other.
Markers reward recognizing that correlation does not imply causation and that a confounding factor may explain both.
AP 2021 (style)2 marksFree response (short). A city collects a large data set of bus arrival times. Describe two distinct processing steps the city could perform to extract useful information that could improve the bus service.
Show worked answer →
A 2-point question on processing data to extract information.
Point 1 (filtering or grouping): The city could filter the data to a single route or group arrivals by time of day, isolating the records relevant to a particular question (for example, only morning peak times on route 5).
Point 2 (transforming or summarizing): The city could transform the times into a summary statistic, such as the average delay per route, or detect a trend (delays growing on certain days). This turns raw timestamps into information that identifies where the service is worst. Any two distinct, valid processing steps that yield useful information earn the marks.
Related dot points
- Topic 2.1 Binary Numbers: computers represent all data with bits (binary digits); numbers, text, images and sound are encoded in binary, and fixed bit-widths cause overflow and rounding.
A focused answer to AP CSP Topic 2.1, covering bits and bytes, binary-to-decimal conversion, why all data is represented in binary, analog versus digital, fixed bit-width consequences (overflow and rounding errors), and abstraction in data representation.
- Topic 2.2 Data Compression: compression reduces the number of bits used to store data; lossless compression preserves all information, while lossy compression discards some to save more space.
A focused answer to AP CSP Topic 2.2, covering why compression matters, lossless versus lossy compression, run-length encoding as a lossless example, the trade-offs of lossy compression for images and audio, and how to choose between them.
- Topic 2.4 Using Programs with Data: programs process large data sets through cleaning, filtering, classifying and transforming data, often using lists and iteration to scale to large amounts of data.
A focused answer to AP CSP Topic 2.4, covering why programs are essential for large data sets, cleaning and classifying data, filtering with conditionals, using lists and iteration to process data at scale, and visualizing results, with worked pseudocode.
- Topic 3.10 Lists: a list is an ordered collection of elements accessed by index; AP CSP lists are 1-indexed and support traversal and modification with APPEND, INSERT and REMOVE.
A focused answer to AP CSP Topic 3.10, covering lists as ordered collections, 1-based indexing in AP CSP pseudocode, accessing elements, traversing with FOR EACH and REPEAT, list operations (APPEND, INSERT, REMOVE, LENGTH), and why lists scale data processing.
- Topic 5.3 Computing Bias: computing innovations can reflect existing human biases through biased data or design choices, and bias can be embedded intentionally or unintentionally.
A focused answer to AP CSP Topic 5.3, covering how bias enters computing systems through biased data and design, intentional versus unintentional bias, real effects on people, why biased data produces biased outputs, and how bias can be identified and reduced.
Sources & how we know this
- AP Computer Science Principles Course and Exam Description — College Board (2025)