大厂面试SQL题(一) ——连续登陆(字节)

yumo66610个月前 (03-30)技术文章116

在日常工作进程中,分析师会经常剖析用户对于公司产品的粘性及留存状况,对应至数据指标层面,常见的有“连续 xx 天登陆用户数”、“7 天/30 天留存用户数”等。正因如此,这类题目在数据面试中属于较为常见的用户类 SQL 题目。接下来,让我们一同来看下这类题目的常见 SQL 解法:

(1)存在一张用户视频投稿行为表,如下所示,旨在找出连续 7 天发布视频、连续 30 天发布视频的用户,以及最大连续发布天数等。

具体的数据实例如下:

-- ---------------- start 生成临时测试表,如已有可用表忽略此步骤------------------------------
WITH author_publish_table AS (
    select  802264760 AS author_id,'2024-01-01' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-02' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-03' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-04' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-05' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-06' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-07' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-08' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-09' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-01' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-02' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-05' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-01' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-02' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-03' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-04' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-05' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-06' AS publish_date
)
-- ----------------------------- 测试表生成 end -----------------------
select * from author_publish_table

生成的数据表查询结果实例图:

就诸如连续投稿、连续登陆等等在一段时间内连续进行某种行为的此类问题而言,一种比较理想的理解在于,一位连续投稿的用户,其每次连续行为的基准日期是一致的。所以,倘若能够统计出每次连续行为的基准日期个数,便能够统计出连续投稿的天数。

类比参考 等差数列 的特性,倘若一个等差数列的首项记为 a1 ,公差记为 d ,那么该等差数列第 n 项 的一般项为,那么,基准日期即为,公差d=1,在SQL中生成公差d=1,的数据序列,最简单的函数即排序函数,因而能够凭借日期差与序号来处理这类问题。

参考SQL如下:

SELECT  author_id,
        publish_date,
        `(n-1)d`,
       date_sub(publish_date, `(n-1)d`) AS `a1` -- 基准日期
from 
    (SELECT  author_id,
            publish_date,
            row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
    FROM    author_publish_table)

基于以上中间结果,其实可以解决出以下场景的各类问题:

  1. 找出连续发布的最大视频日期、连续发布的最小视频日期、连续发布视频天数。
select author_id,
       `连续投稿基准日期` ,
       MAX(publish_date)      as `本次连续的最大投稿日期`,
       min(publish_date)      as `本次连续的最小投稿日期`,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
            publish_date,
            date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
    from 
        (SELECT  author_id,
                publish_date,
                row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
        FROM    author_publish_table) a
    )
GROUP BY author_id,`连续投稿基准日期`

查询结果:

author_id

连续投稿基准日期

本次连续的最大投稿日期

本次连续的最小投稿日期

本次连续投稿天数

970014504

2023-12-31

2024-01-02

2024-01-01

2

970014504

2024-01-02

2024-01-05

2024-01-05

1

96620207

2023-12-31

2024-01-06

2024-01-01

6

802264760

2023-12-31

2024-01-09

2024-01-01

9

  1. 找出作者最大连续投稿天数
select author_id,
       max(`本次连续投稿天数`) as `作者最大连续投稿天数`
from (select author_id,
       `连续投稿基准日期` ,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
            publish_date,
            date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
    from 
        (SELECT  author_id,
                publish_date,
                row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
        FROM    author_publish_table) a
    )
GROUP BY author_id,`连续投稿基准日期`
)
group by author_id

查询结果:

author_id

作者最大连续投稿天数

802264760

9

970014504

2

96620207

6

  1. 找出连续7天发布视频的作者
select author_id,
       `连续投稿基准日期` ,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
                  publish_date,
                  date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
          from 
                (SELECT  author_id,
                         publish_date,
                         row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
                FROM    author_publish_table
                ) a
    )
GROUP BY author_id,`连续投稿基准日期`
HAVING `本次连续投稿天数` >=7

查询结果:

author_id

本次连续投稿天数

802264760

9


如有问题随时私信沟通~

相关文章

软件测试面试被问这些SQL语句,不会怎么办?于是推出这套SQL教程

作为一名软件测试人员,学习SQL语句是必备。为什么呢?Kitty小编告诉你:首先,从软件的三层架构谈起,大家都知道现在的软件基本要么是C/S架构,要么是B/S架构,但无论是C/S架构还是B/S架构最终...

每天一篇Java面试之MySQL热门面试答案汇总

面试官:MySQL中,如何定位慢查询?候选人:嗯~,我们当时做压测的时候有的接口非常的慢,接口的响应时间超过了2秒以上,因为我们当时的系统部署了运维的监控系统Skywalking ,在展示的报表中可以...