🎉 GoReplay is now part of Probe Labs. 🎉

Published on 8/13/2026

Create a Stunning Scatter Plot in R From Start to Finish

A photo-realistic data science setup with a blurred computer screen showing a faint grid and scattered data points, featuring "Scatter Plot R" text centered on a solid background block in the golden ratio position, with subtle statistical icons and coding elements framing the composition

When it comes to visualizing the relationship between two variables, the humble scatter plot is your best friend. In R, you can create one with a single line of code using the base plot() function or build stunning, layered graphics with the powerful ggplot2 library.

These tools map two continuous variables onto an x-y grid, giving you a raw, unfiltered look at how they interact. It’s the fastest way to spot trends, outliers, and clusters hiding in your data.

Why Scatter Plots Are Foundational to Data Science

Before you even think about writing code, it’s worth remembering why data scientists lean so heavily on scatter plots. They are the cornerstone of any good exploratory data analysis (EDA), offering an immediate visual story that raw numbers in a table just can’t match.

Imagine you’re analyzing server performance. A scatter plot could instantly show you the relationship between server response times and the number of concurrent users. Do response times creep up slowly as more users log on, or do they hit a wall and spike dramatically? That’s the kind of insight a scatter plot delivers in seconds.

Comparing R Plotting Methods at a Glance

This guide will walk you through three main ways to create a scatter plot in R. The right choice really depends on what you’re trying to do—a quick gut check, a polished graphic for a presentation, or an interactive dashboard for others to explore.

Each method has its own strengths, from the speed of Base R to the aesthetic control of ggplot2 and the interactivity of plotly.

FeatureBase R plot()ggplot2plotly
Primary Use CaseQuick, exploratory analysisPublication-quality, layered graphicsInteractive, web-based dashboards
Ease of UseVery easy for basic plotsSteeper learning curve, very powerfulModerate, builds on ggplot2
CustomizationGood, but can be cumbersomeExcellent, highly flexibleExcellent, focuses on interactivity
InteractivityNoneNone (but can be converted)Native (hover, zoom, pan)
DependenciesNone (built-in)Requires ggplot2 packageRequires plotly package

Ultimately, knowing when to use each tool is key. Base R is for your eyes only, ggplot2 is for telling a story to others, and plotly is for letting your audience explore the story themselves.

This flowchart can help you decide which tool to reach for.

A decision tree flowchart guiding users on selecting the appropriate R plot type for quick look, storytelling, or interactive tasks.

As the visual shows, ggplot2 is the clear winner for “Storytelling” because it excels at creating polished, narrative-driven charts. This power is a big reason why many developers are taking a second look at R. In fact, some argue that R is seeing a major comeback, driven largely by its incredible data visualization ecosystem. You can read more about R’s growing influence on plainenglish.io.

A good visualization isn’t just a picture of data; it’s a story about the data. A scatter plot is often the first chapter, revealing the main characters and their relationships before the real plot unfolds.

By getting comfortable with these three approaches, you’ll be ready to tackle just about any data visualization task that comes your way.

Your First Scatter Plot Using Base R

The quickest way to get a scatter plot on your screen in R is with the built-in plot() function. It’s a real workhorse that doesn’t require loading any extra packages, making it my go-to for fast, exploratory analysis.

Let’s jump right in with the famous iris dataset, which comes included with every R installation. If it’s not already in your environment, just run data(iris). This dataset gives us measurements for 150 iris flowers, and we’ll focus on the relationship between Sepal.Length and Sepal.Width.

Creating the plot takes just one line of code.

A simple scatter plot of Sepal Length vs. Sepal Width

plot(x = iris$Sepal.Length, y = iris$Sepal.Width)

That single command produces an instant visualization. Each point on the plot is a single flower, mapping its sepal length (x-axis) to its sepal width (y-axis). It’s a fast and effective way to get an initial feel for how your variables relate to each other.

Customizing Your Base R Plot

The default plot gets the job done, but it’s not exactly presentation-ready. The axis labels are pulled straight from the code (iris$Sepal.Length), which looks sloppy. Thankfully, we can clean this up by adding a few arguments right inside the plot() function.

Let’s give our plot a proper title using main and much clearer axis labels with xlab and ylab. We can also tweak the look of the data points themselves. The pch argument controls the point shape—we’ll use 19 for a solid circle—and col sets the color.

Here’s how you can create a more polished version of the same scatter plot in R:

A customized scatter plot with title, labels, and color

plot(x = iris$Sepal.Length, y = iris$Sepal.Width, main = “Iris Flower Sepal Dimensions”, xlab = “Sepal Length (cm)”, ylab = “Sepal Width (cm)”, pch = 19, col = “steelblue”) See? Much better. The plot now explains itself, with a clear title and properly labeled axes. Small tweaks like these are what make your visualizations truly effective.

Pro Tip: R’s base plotting system has 25 built-in point shapes you can use with the pch argument. It’s worth experimenting to see what works for your data. For instance, pch = 1 gives you an open circle, and pch = 3 creates a plus sign.

These simple changes are just the start, but they show how much direct control you have, even with base R functions. Mastering these fundamentals is a huge advantage before you move on to more complex systems like ggplot2.

R’s powerful visualization tools have always been a core strength, and it’s a key reason why R is making a comeback in data science. With just a few arguments, we’ve already turned a raw data output into a clear, informative graphic.

Building Richer Plots with ggplot2

A laptop on a wooden desk displays a Base R scatter plot with blue, red, and green data points.

While base R is great for a quick look, ggplot2 is what you’ll reach for when you need to tell a compelling story with your data. As a core part of the tidyverse, it’s built around the “grammar of graphics” philosophy. This just means you build plots layer by layer, starting with your data and progressively adding visual elements.

It’s a lot like building with LEGO bricks. You start with a baseplate (ggplot()), decide which variables control the x-axis, y-axis, and color (aes()), and then snap on the visual parts, like points (geom_point()). This layered approach is precisely what gives ggplot2 its power and makes creating a detailed scatter plot in R so intuitive.

Let’s jump in with the mpg dataset that comes with ggplot2. It contains fuel economy data for 38 car models and is perfect for this kind of exploration.

Your First ggplot2 Scatter Plot

First things first, make sure you have the tidyverse package installed and ready to go by running library(tidyverse). Now, we can whip up a basic plot to see how a car’s engine displacement (displ) relates to its highway fuel efficiency (hwy).

Create a basic ggplot2 scatter plot

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point()

This bit of code sets up the plot’s foundation, mapping displ to the x-axis and hwy to the y-axis. The + operator then adds the geom_point() layer, which tells ggplot2 to draw the data as points. What you get is a clean scatter plot that immediately shows a negative relationship: bigger engines tend to have worse gas mileage.

Adding Visual Depth with Aesthetics

The real magic of ggplot2 happens when you start mapping other variables to aesthetics. Aesthetics are the visual properties of your plot—think color, size, or shape. This is how you begin to uncover patterns hidden in your data.

For example, does the relationship between engine size and efficiency hold true for all types of cars? We can find out by mapping the class variable to the color aesthetic right inside the aes() function.

Map car class to the color aesthetic

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + geom_point()

Just like that, the plot is infinitely more insightful. ggplot2 even creates a legend for you. You can instantly see that SUVs and pickups cluster on the right (large engines, low efficiency), while compact cars are grouped on the left.

By mapping categorical variables to aesthetics like color, shape, or size, you can encode a third dimension of information onto your 2D scatter plot. This is a fundamental technique for multivariate analysis and data storytelling.

A scatter plot is great for spotting a trend, but sometimes you need to draw the line—literally. ggplot2 makes this trivial with another layer: geom_smooth(). By adding + geom_smooth() to your plot, you can overlay a trend line to make the relationship explicit.

But what if you want to compare these relationships side-by-side for each car class instead of just coloring them? This is a job for faceting. The facet_wrap() function creates a grid of subplots, one for each category in a variable.

Let’s pull all these ideas together:

  1. Map displ to x and hwy to y.
  2. Color the points by class.
  3. Add a linear regression line with geom_smooth(method = "lm").
  4. Create separate plots for each class with facet_wrap().

A complex plot with smooth lines and facets

ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + geom_point() + geom_smooth(method = “lm”, se = FALSE) + # Add linear model line, no confidence interval facet_wrap(~ class)

This single block of code produces a professional, multi-panel visualization that would be a huge headache to build in base R. This is the power of the grammar of graphics—each piece is a simple, logical addition that builds toward a complex and insightful result.

Adding Interactivity with Plotly

A monitor displays an R ggplot2 scatter plot with data points and colored layers, next to a 'Ggplot 2 Layers' book.

While ggplot2 is fantastic for creating beautiful, publication-ready graphics, those plots are static. They’re just images. But what if you need more? For web dashboards or exploratory reports, giving your audience the power to interact with the data directly creates a much richer experience.

This is where the plotly package comes in. It’s designed to turn your static charts into dynamic, interactive visualizations that feel alive.

The real magic of plotly is how perfectly it works with ggplot2. You don’t have to learn a whole new grammar of graphics. A single function, ggplotly(), breathes life into the plots you already know how to build. It’s a game-changer for quickly creating a powerful, interactive scatter plot in R.

Let’s see just how easy it is. First, you’ll need the plotly package, which you can get with install.packages("plotly"). Now, we can take the ggplot2 plot we built earlier and make it interactive in one step.

From Static to Dynamic with One Command

The trick is to first create your ggplot2 plot and store it in a variable. Then, you just pass that variable to plotly.

library(tidyverse) library(plotly)

Create our ggplot2 object first

p <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = class)) + geom_point() + labs(title = “Engine Size vs. Highway Fuel Efficiency”)

Now, convert it to an interactive plot

ggplotly(p)

Just by wrapping your ggplot2 object p in the ggplotly() function, your plot is instantly upgraded. You can now hover over any point to see a tooltip with its exact values, zoom into a specific cluster, and pan around the chart. It’s a remarkably simple way to add a professional touch and deeper analytical capability.

This ability to let users explore data is invaluable. Imagine you’re building a real-time analytics dashboard to monitor system performance. Interactive plots let stakeholders drill down into anomalies on their own, no hand-holding required.

Building Native Plotly Charts

For even more fine-grained control, you can build a scatter plot in R using plotly’s native syntax. The main function here is plot_ly(), which is plotly’s equivalent of ggplot(). While the syntax looks a bit different, the logic is similar: you specify your data, map variables, and define the plot type.

A native plotly scatter plot

plot_ly(data = mpg, x = ~displ, y = ~hwy, color = ~class, type = “scatter”, mode = “markers”)

The tilde ~ before each variable name is just plotly’s way of saying “use this column from the data frame.” This direct approach unlocks advanced interactive features that aren’t available through the ggplotly() conversion, like fully custom hover text and animations.

Creating interactive visualizations is a critical skill for modern data analysts. It bridges the gap between static analysis and user-driven exploration, empowering non-technical users to find their own insights without needing to write any code.

Expert Tips for More Effective Scatter Plots

Anyone can generate a basic chart, but making one that’s truly insightful takes a bit more finesse. Moving beyond the default settings is what separates a decent plot from a great one.

These are the techniques I’ve learned over the years to make every scatter plot in R not just good, but publication-ready. Let’s dig in.

One of the biggest headaches with scatter plots, especially as your datasets get larger, is overplotting. This is what happens when so many data points are stacked on top of each other that you can’t see what’s really going on. It just looks like a solid blob, hiding the actual distribution.

Tackling Overplotting with Transparency and Jitter

A simple but incredibly effective way to fix overplotting is to add transparency, sometimes called alpha blending. By making your points semi-transparent, you can immediately see where they cluster most densely.

In ggplot2, you just need to add the alpha argument to geom_point().

geom_point(alpha = 0.3)

An alpha value of 1 is totally solid, while 0 is invisible. I’ve found that a value around 0.2 or 0.3 is a perfect starting point for most crowded plots.

Another great trick is jittering. This technique adds a tiny bit of random noise to each point’s position. It’s a lifesaver when you’re working with discrete or rounded data that causes points to fall into perfect lines or grids. Instead of geom_point(), you can switch to geom_jitter().

geom_jitter(width = 0.1, height = 0.1)

The width and height arguments let you control just how much you want to spread the points out. This small tweak can instantly reveal the true density of your data without messing up the bigger picture.

Choosing Meaningful Colors and Exporting Your Work

Color isn’t just for making things pretty—it’s a tool for telling a story. When you pick a color palette, always think about accessibility. Palettes like Viridis or those from the RColorBrewer package are designed specifically to be colorblind-friendly and easy to interpret.

Thoughtful color selection makes your plot more inclusive and ensures your visual story is interpreted correctly by the widest possible audience. Avoid default rainbow palettes, which can be misleading and difficult to read.

Once your masterpiece is ready, you need a good way to save it. For ggplot2 plots, the ggsave() function is your best friend. It’s smart enough to figure out the file type from the extension you use (.png, .pdf, .svg) and gives you complete control over the output.

For a high-resolution image that will look sharp in any presentation or report, you can set the dimensions and DPI (dots per inch).

ggsave("fuel_economy_plot.png", width = 10, height = 6, dpi = 300)

This command saves your latest ggplot2 plot as a PNG file that’s 10 inches wide, 6 inches tall, and rendered at a crisp 300 DPI. Mastering ggsave() is the final step to ensuring your hard work looks professional wherever it’s displayed.

Strong visualizations are also a critical piece of any good monitoring system. For more on that, check out our guide on building a great performance dashboard.

Common Questions When Making a Scatter Plot in R

A laptop screen shows a scatter plot, with a notebook displaying a line graph. A banner says 'Pro Visualization Tips'.

As you start creating more complex plots, you’ll inevitably hit a few common snags. Think of this section as your quick-fix guide for the most frequent issues I see pop up when people are building a scatter plot in R.

How Do I Add a Trendline?

A trendline is essential for showing the underlying relationship in your data. How you add one depends on whether you’re working in base R or with the far more flexible ggplot2.

With ggplot2, it’s incredibly straightforward. You just add geom_smooth() as another layer. By default, you’ll get a LOESS smooth curve, but you can easily switch to a linear model by setting method = "lm".

ggplot2 trendline example

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + geom_smooth(method = “lm”, color = “red”)

If you’re sticking with base R, it’s a two-step process. First, build your linear model with the lm() function. Then, you can overlay the resulting line onto your existing plot using abline().

Base R trendline example

plot(mpg$displ, mpg$hwy) model <- lm(hwy ~ displ, data = mpg) abline(model, col = “red”)

How Can I Fix Overplotting?

Ever create a plot that just looks like a giant, solid blob? That’s overplotting, and it happens when your data points are so dense they completely hide each other. Luckily, there are a couple of great fixes.

  • Transparency: My go-to solution is to make the points semi-transparent. This immediately reveals where the data is most dense. In ggplot2, just add an alpha setting to geom_point(). A value between 0.1 and 0.5 is usually a good starting point.

  • Jittering: Sometimes, all you need is a tiny bit of random noise to separate overlapping points. Instead of geom_point(), try using geom_jitter(). It’s a simple swap that can make a world of difference.

How Do I Customize Colors in Ggplot2?

Manually setting your colors gives you complete creative control. For this, scale_color_manual() is the function you need.

First, you map a categorical variable to the color aesthetic inside aes(). Then, you tell ggplot2 exactly which color to use for each category with the scale_color_manual() layer.

Manually setting colors in ggplot2

ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point() + scale_color_manual(values = c(“setosa” = “#E69F00”, “versicolor” = “#56B4E9”, “virginica” = “#009E73”))

Pro Tip: Always try to use colorblind-friendly palettes. The hex codes in the example above are from a well-known accessible palette, which helps ensure your visualizations are clear and readable for everyone.

How Do I Save My Plot as a High-Quality File?

When your ggplot2 masterpiece is ready for the world, ggsave() is the only function you need. It’s smart enough to figure out the file type (like .png or .pdf) right from the filename you provide.

To get a crisp, publication-quality image, you can also specify the dimensions and resolution in DPI (dots per inch).

Saving a ggplot as a high-resolution PNG

ggsave(“my_iris_plot.png”, width = 8, height = 5, dpi = 300) This one line of code saves the last plot you displayed as a high-resolution PNG, perfect for dropping into a report, presentation, or publication.


Tired of finding issues in production? GoReplay helps you catch bugs before they impact users by replaying real production traffic in your testing environment. Find problems early, deploy with confidence, and keep your applications stable. Explore the open-source tool at https://goreplay.org.

Ready to Get Started?

Join these successful companies in using GoReplay to improve your testing and deployment processes.