大厂面试SQL题(一) ——连续登陆(字节)
在日常工作进程中,分析师会经常剖析用户对于公司产品的粘性及留存状况,对应至数据指标层面,常见的有“连续 xx 天登陆用户数”、“7 天/30 天留存用户数”等。正因如此,这类题目在数据面试中属于较为常见的用户类 SQL 题目。接下来,让我们一同来看下这类题目的常见 SQL 解法:
(1)存在一张用户视频投稿行为表,如下所示,旨在找出连续 7 天发布视频、连续 30 天发布视频的用户,以及最大连续发布天数等。
具体的数据实例如下:
-- ---------------- start 生成临时测试表,如已有可用表忽略此步骤------------------------------
WITH author_publish_table AS (
select 802264760 AS author_id,'2024-01-01' AS publish_date
union all
select 802264760 AS author_id,'2024-01-02' AS publish_date
union all
select 802264760 AS author_id,'2024-01-03' AS publish_date
union all
select 802264760 AS author_id,'2024-01-04' AS publish_date
union all
select 802264760 AS author_id,'2024-01-05' AS publish_date
union all
select 802264760 AS author_id,'2024-01-06' AS publish_date
union all
select 802264760 AS author_id,'2024-01-07' AS publish_date
union all
select 802264760 AS author_id,'2024-01-08' AS publish_date
union all
select 802264760 AS author_id,'2024-01-09' AS publish_date
union all
select 970014504 AS author_id,'2024-01-01' AS publish_date
union all
select 970014504 AS author_id,'2024-01-02' AS publish_date
union all
select 970014504 AS author_id,'2024-01-05' AS publish_date
union all
select 96620207 AS author_id,'2024-01-01' AS publish_date
union all
select 96620207 AS author_id,'2024-01-02' AS publish_date
union all
select 96620207 AS author_id,'2024-01-03' AS publish_date
union all
select 96620207 AS author_id,'2024-01-04' AS publish_date
union all
select 96620207 AS author_id,'2024-01-05' AS publish_date
union all
select 96620207 AS author_id,'2024-01-06' AS publish_date
)
-- ----------------------------- 测试表生成 end -----------------------
select * from author_publish_table
生成的数据表查询结果实例图:
就诸如连续投稿、连续登陆等等在一段时间内连续进行某种行为的此类问题而言,一种比较理想的理解在于,一位连续投稿的用户,其每次连续行为的基准日期是一致的。所以,倘若能够统计出每次连续行为的基准日期个数,便能够统计出连续投稿的天数。
类比参考 等差数列 的特性,倘若一个等差数列的首项记为 a1 ,公差记为 d ,那么该等差数列第 n 项 的一般项为,那么,基准日期即为,公差d=1,在SQL中生成公差d=1,的数据序列,最简单的函数即排序函数,因而能够凭借日期差与序号来处理这类问题。
参考SQL如下:
SELECT author_id,
publish_date,
`(n-1)d`,
date_sub(publish_date, `(n-1)d`) AS `a1` -- 基准日期
from
(SELECT author_id,
publish_date,
row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
FROM author_publish_table)
基于以上中间结果,其实可以解决出以下场景的各类问题:
- 找出连续发布的最大视频日期、连续发布的最小视频日期、连续发布视频天数。
select author_id,
`连续投稿基准日期` ,
MAX(publish_date) as `本次连续的最大投稿日期`,
min(publish_date) as `本次连续的最小投稿日期`,
count(`连续投稿基准日期`) as `本次连续投稿天数`
from (SELECT author_id,
publish_date,
date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期`
from
(SELECT author_id,
publish_date,
row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
FROM author_publish_table) a
)
GROUP BY author_id,`连续投稿基准日期`
查询结果:
author_id | 连续投稿基准日期 | 本次连续的最大投稿日期 | 本次连续的最小投稿日期 | 本次连续投稿天数 |
970014504 | 2023-12-31 | 2024-01-02 | 2024-01-01 | 2 |
970014504 | 2024-01-02 | 2024-01-05 | 2024-01-05 | 1 |
96620207 | 2023-12-31 | 2024-01-06 | 2024-01-01 | 6 |
802264760 | 2023-12-31 | 2024-01-09 | 2024-01-01 | 9 |
- 找出作者最大连续投稿天数
select author_id,
max(`本次连续投稿天数`) as `作者最大连续投稿天数`
from (select author_id,
`连续投稿基准日期` ,
count(`连续投稿基准日期`) as `本次连续投稿天数`
from (SELECT author_id,
publish_date,
date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期`
from
(SELECT author_id,
publish_date,
row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
FROM author_publish_table) a
)
GROUP BY author_id,`连续投稿基准日期`
)
group by author_id
查询结果:
author_id | 作者最大连续投稿天数 |
802264760 | 9 |
970014504 | 2 |
96620207 | 6 |
- 找出连续7天发布视频的作者
select author_id,
`连续投稿基准日期` ,
count(`连续投稿基准日期`) as `本次连续投稿天数`
from (SELECT author_id,
publish_date,
date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期`
from
(SELECT author_id,
publish_date,
row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
FROM author_publish_table
) a
)
GROUP BY author_id,`连续投稿基准日期`
HAVING `本次连续投稿天数` >=7
查询结果:
author_id | 本次连续投稿天数 |
802264760 | 9 |
如有问题随时私信沟通~