- 1. 练习: 带有分面和颜色的价格直方图
- 2. 练习: 价格与按切工填色的table
- 3. 练习: 典型表值
- 4. 练习: 价格与体积和钻石净度
- 5. 练习:新建友谊的比例 (使用
ifelse
) - 6. 练习: Prop_initiated 与使用时长
- 7. 平滑化 prop_initiated 与使用时长
- 10. 经过分组、分面和填色的价格/克拉
- 11. Gapminder 多变量分析
1. 练习: 带有分面和颜色的价格直方图
- 要求:
# Create a histogram of diamond prices.
# Facet the histogram by diamond color
# and use cut to color the histogram bars.
# The plot should look something like this.
# http://i.imgur.com/b5xyrOu.jpg
# Note: In the link, a color palette of type
# 'qual' was used to color the histogram using
# scale_fill_brewer(type = 'qual')
- 代码与图形,按
color
切割为多个面,即多个图,按照cut
区分各个直方图中的颜色:
ggplot(aes(x=price),data=diamonds) +
geom_histogram(aes(color=cut)) +
facet_wrap(~color,ncol = 2)
2. 练习: 价格与按切工填色的table
# Create a scatterplot of diamond price vs.
# table and color the points by the cut of
# the diamond.
# The plot should look something like this.
# http://i.imgur.com/rQF9jQr.jpg
# Note: In the link, a color palette of type
# 'qual' was used to color the scatterplot using
# scale_color_brewer(type = 'qual')
- 散点图,
scale_color_brewer(type = 'qual')
指描绘使用的颜色种类,通过?scale_color_brewer
看帮助。
ggplot(aes(x=price,y=table),data=diamonds) + geom_point(aes(color=cut)) +
scale_color_brewer(type = 'qual')
table
的含义是:width of top of diamond relative to widest point (43–95)
。
3. 练习: 典型表值
大多数完美切工钻石的典型表范围是多少? 大多数优质切工钻石的典型表范围是多少?在之前练习中创建的图表查看答案。无需进行汇总。
4. 练习: 价格与体积和钻石净度
# Create a scatterplot of diamond price vs.
# volume (x * y * z) and color the points by
# the clarity of diamonds. Use scale on the y-axis
# to take the log10 of price. You should also
# omit the top 1% of diamond volumes from the plot.
# Note: Volume is a very rough approximation of
# a diamond's actual volume.
# The plot should look something like this.
# http://i.imgur.com/excUpea.jpg
# Note: In the link, a color palette of type
# 'div' was used to color the scatterplot using
# scale_color_brewer(type = 'div')
diamonds$volumn <- diamonds$x * diamonds$y * diamonds$z
ggplot(aes(x=diamonds$volumn,y=log10(price)),data=diamonds) +
geom_point(aes(color=clarity)) +
xlim(0,quantile(diamonds$volumn,0.99))
5. 练习:新建友谊的比例 (使用ifelse
)
# Your task is to create a new variable called 'prop_initiated'
# in the Pseudo-Facebook data set. The variable should contain
# the proportion of friendships that the user initiated.
pf$prop_initiated <- ifelse(pf$friend_count>0,pf$friendships_initiated / pf$friend_count,0)
summary(pf$prop_initiated)
6. 练习: Prop_initiated 与使用时长
# Create a line graph of the median proportion of
# friendships initiated ('prop_initiated') vs.
# tenure and color the line segment by
# year_joined.bucket.
# Recall, we created year_joined.bucket in Lesson 5
# by first creating year_joined from the variable tenure.
# Then, we used the cut function on year_joined to create
# four bins or cohorts of users.
# (2004, 2009]
# (2009, 2011]
# (2011, 2012]
# (2012, 2014]
# The plot should look something like this.
# http://i.imgur.com/vNjPtDh.jpg
# OR this
# http://i.imgur.com/IBN1ufQ.jpg
library("dplyr")
# ① 按照tenure分组数据
tenure_groups <- group_by(subset(pf,!is.na(tenure)), tenure)
# ② 针对tenure_groups数据集,重新组织数据,注意这里不要使用`pf$prop_initiated`.
pf.fc_by_tenure <- summarise(tenure_groups,
median_prop = median(prop_initiated),
n=n())
# 根据tenure天数,计算加入的年份
pf.fc_by_tenure$year_joined <- 2014 - ceiling(pf.fc_by_tenure$tenure / 365)
# ③ 切断数据
pf.fc_by_tenure$year_joined.bucket <- cut(pf.fc_by_tenure$year_joined,breaks = c(2004,2009,2011,2012,2014))
ggplot(aes(x=tenure,y=median_prop),data=pf.fc_by_tenure) +
geom_line(aes(color=pf.fc_by_tenure$year_joined.bucket)) +
scale_x_continuous(breaks = seq(0, 3500, 500)) +
theme(legend.text=element_text(size=10),legend.title=element_text(size=10)) + theme(legend.position="top")
- ① 按照tenure分组数据,比较分组后的数据和pf原始数据,分组后的数据再pf原始数据上增加了一些属性:
tenure_groups <- group_by(subset(pf,!is.na(tenure)), tenure)
- ② 针对tenure_groups数据集,重新组织数据
prop_initiated
参考上面一个问题
pf$prop_initiated <- ifelse(pf$friend_count>0,pf$friendships_initiated / pf$friend_count,0)
pf.fc_by_tenure <- summarise(tenure_groups,
median_prop = median(prop_initiated),
n=n())
head(pf.fc_by_tenure,1000)
- ③ 切断数据
经过下面的数据后,数据结构变成:
pf.fc_by_tenure$year_joined <- 2014 - ceiling(pf.fc_by_tenure$tenure / 365)
pf.fc_by_tenure$year_joined.bucket <- cut(pf.fc_by_tenure$year_joined,breaks = c(2004,2009,2011,2012,2014))
head(pf.fc_by_tenure,1000)
7. 平滑化 prop_initiated 与使用时长
# Smooth the last plot you created of
# of prop_initiated vs tenure colored by
# year_joined.bucket. You can bin together ranges
# of tenure or add a smoother to the plot.
基于前一部分产生的数据,使用如下代码得到一个平滑线:
ggplot(aes(x=tenure,y=median_prop),data=pf.fc_by_tenure) +
scale_x_continuous(breaks = seq(0, 3500, 500)) +
theme(legend.text=element_text(size=10),legend.title=element_text(size=10)) + theme(legend.position="top") +
geom_smooth(aes(color = year_joined.bucket))
10. 经过分组、分面和填色的价格/克拉
# Create a scatter plot of the price/carat ratio
# of diamonds. The variable x should be
# assigned to cut. The points should be colored
# by diamond color, and the plot should be
# faceted by clarity.
# Note: In the link, a color palette of type
# 'div' was used to color the histogram using
# scale_color_brewer(type = 'div')
ggplot(aes(x=cut,y=price/carat),data=diamonds) +
geom_point(aes(color=color)) +
scale_color_brewer(type = 'div') +
facet_wrap(~clarity) +
theme(legend.position="right")
ggsave("mtcars.png")
11. Gapminder 多变量分析
略