Skip to main content
United StatesComputer Science PrinciplesSyllabus dot point

How do we turn raw data into useful information, and what makes data meaningful?

Topic 2.3 Extracting Information from Data: information is extracted from data through processing, filtering, transforming and combining data sets, and correlation does not imply causation.

A focused answer to AP CSP Topic 2.3, covering the difference between data and information, processing data to find patterns and trends, filtering and transforming, metadata, combining data sets, and the limits of data including correlation versus causation.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. Data versus information
  3. Ways to process data
  4. Metadata
  5. Correlation is not causation
  6. Try this

What this topic is asking

The College Board (Topic 2.3) wants you to understand how raw data becomes useful information. Data on its own is just values; processing it (filtering, transforming, combining, summarizing) reveals patterns, trends and relationships that answer questions. You must also understand metadata, the value of combining data sets, and a critical limit: correlation does not imply causation. Programs and tools make this processing scalable.

Data versus information

Ways to process data

Extracting information involves transforming raw data:

  • Filtering. Keep only the records relevant to a question (for example, only one bus route, or only weekdays).
  • Transforming. Convert values into a more useful form (timestamps into delays, raw scores into percentages).
  • Combining. Merge multiple data sets to reveal relationships none shows alone (weather data plus sales data).
  • Summarizing. Compute statistics (average, maximum, count) or detect trends and patterns over time.
  • Visualizing. Present data as charts so patterns are easier for humans to see.

Metadata

Correlation is not causation

A program can find that two quantities change together, a correlation, but that does not mean one causes the other. A third, hidden factor may drive both. Ice cream sales and swimming injuries both rise in summer because of hot weather, not because one causes the other. Treating a correlation as proof of cause is a classic data-analysis error the CED tests.

Try this

Q1. Explain the difference between data and information. [2 points]

  • Cue. Data is raw, unprocessed values; information is the meaning or insight extracted from data by processing it (filtering, summarizing, finding patterns).

Q2. Give one example of metadata for a digital photo and say how it could be useful. [2 points]

  • Cue. The GPS location (or date taken) is metadata; it lets you organize or search photos by where (or when) they were taken.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2022 (style)1 marksMultiple choice. A data set shows that as ice cream sales rise, the number of swimming-related injuries also rises. Which conclusion is best supported? (A) Eating ice cream causes swimming injuries. (B) Swimming injuries cause people to buy ice cream. (C) The two are correlated, but a third factor such as hot weather may explain both. (D) The data set contains an error because the two are unrelated.
Show worked answer →

The answer is (C).

The data show a correlation (the two rise together), but correlation does not imply causation. A likely third factor, hot weather, increases both ice cream sales and swimming. (A) and (B) wrongly assume one causes the other. (D) is wrong: a correlation can be real without one causing the other.

Markers reward recognizing that correlation does not imply causation and that a confounding factor may explain both.

AP 2021 (style)2 marksFree response (short). A city collects a large data set of bus arrival times. Describe two distinct processing steps the city could perform to extract useful information that could improve the bus service.
Show worked answer →

A 2-point question on processing data to extract information.

Point 1 (filtering or grouping): The city could filter the data to a single route or group arrivals by time of day, isolating the records relevant to a particular question (for example, only morning peak times on route 5).

Point 2 (transforming or summarizing): The city could transform the times into a summary statistic, such as the average delay per route, or detect a trend (delays growing on certain days). This turns raw timestamps into information that identifies where the service is worst. Any two distinct, valid processing steps that yield useful information earn the marks.

Related dot points

Sources & how we know this