大厂面试SQL题(三) ——连续登陆(含间隔)(字节)

yumo6667个月前 (03-30)技术文章50

承继上篇文章 大厂面试SQL题(一) ——连续登陆,阐述了最为简易的连续登录题目。然而,倘若在面试题目中增添难度,将 隔天统计问题 纳入考量,该作何处理呢?

沿用第一篇题目的场景实例:

(1)存在一张用户视频投稿行为表,如下所示,旨在找出连续 7 天发布视频、连续 30 天发布视频的用户,间隔一天也算连续(解释:如果一个用户在 2024年1月1、2、4、6 号发布视频,则视为连续 6 天投稿)

-- 测试表生成SQL实例如下:
WITH author_publish_table AS (
    select  802264760 AS author_id,'2024-01-01' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-02' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-04' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-06' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-07' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-01' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-02' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-05' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-10' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-12' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-13' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-14' AS publish_date
)
-- ----------------------------- 测试表生成 end -----------------------
select * from author_publish_table


针对时间连续类题目,必然无法回避日期差的计算。我们从这方面入手:

首先需要解决是,获取每次行为记录与前一条行为记录的日期差值,查看其是否符合“连续行为”的标准(例如两次行为的日期差<xx 天)。

其次要解决的是,能够清晰明了每一次的连续行为,即能够区分出不同的连续片段。

而针对第一点:前后两条行为的日期差值,就有必要深入研讨几类行处理类窗口函数的应用场景,下面用一个简单小实例,辅助理解这几个函数,以便更顺畅向下阅读理解最终SQL方案。

lag(): 返回分区中当前行之前行(可以指定第几行)的值。 如果没有行,则返回null。

lead():返回分区中当前行后面行(可以指定第几行)的值。 如果没有行,则返回null。

first_value():返回相对于窗口中第一行的指定列的值。
last_value():返回相对于窗口中最后一行的指定列的值。

-- 简单小实例辅助更顺畅理解最终SQL:
WITH author_publish_table AS (
    select  802264760 AS author_id,'2024-01-01' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-02' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-04' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-06' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-01' AS publish_date
)
-- ----------------------------- 测试表生成 end -----------------------
SELECT  author_id,
        publish_date, -- 本次投稿日期
        lag(publish_date, 1, NULL) OVER (PARTITION BY author_id ORDER BY publish_date ASC) AS pre_publish_date,  -- 上次投稿日期
        lead(publish_date,1, NULL) OVER (PARTITION BY author_id ORDER BY publish_date ASC) AS next_publish_date, -- 后次投稿日期
        first_value(publish_date) OVER (PARTITION BY author_id ORDER BY publish_date ASC) AS first_publish_date, -- 首次投稿日期
        last_value(publish_date) OVER (PARTITION BY author_id ORDER BY publish_date ASC rows between current row and unbounded following)  AS last_publish_date   -- 末次投稿日期
FROM    author_publish_table
ORDER BY author_id ASC, publish_date ASC

以上小实例查询的呈现结果:

回归正文,针对此题目的解法实现:

  • 首先,可以通过 lag()函数获取前后两次投稿的日期差,倘若日期差值>3,则代表“非连续投稿”,也意味着新一轮连续行为的开端。
  • 其次,可以依据差值>3 的断层进行分组标识,以区分每次连续投稿的片段。
  • 而后,能够通过获取每个连续投稿片段内的最大与最小投稿日期,从而计算得出本次连续投稿的天数。

最终SQL如下:

SELECT author_id,
        publish_date,
        last_publish_date,
        publish_datediff,
        refer_group,
        max_publish_date,
        min_publish_date,
        datediff(max_publish_date,min_publish_date)+1 as continue_publish_days -- 连续投稿的天数
from (SELECT author_id,
        publish_date,
        last_publish_date,
        publish_datediff,
        refer_group,  -- 连续投稿片段分组
        max(publish_date) OVER (PARTITION by author_id,refer_group) max_publish_date, -- 连续投稿的结束日期
        min(publish_date) OVER (PARTITION by author_id,refer_group) min_publish_date  -- 连续投稿的开始日期
from (SELECT  author_id,
        publish_date,
        last_publish_date,
        publish_datediff,
        sum(if(datediff(publish_date, last_publish_date)<=2,0,1) ) OVER (PARTITION BY author_id ORDER BY publish_date ASC) refer_group -- 连续投稿片段分组
FROM  
   (SELECT author_id,
            publish_date,
            last_publish_date,
            datediff(publish_date, last_publish_date) publish_datediff -- 两次投稿日期差
    FROM  (
            SELECT  author_id,
                    publish_date, -- 本次投稿日期
                    lag(publish_date, 1, NULL) OVER (PARTITION BY author_id ORDER BY publish_date ASC) AS last_publish_date -- 上次投稿日期
            FROM    author_publish_table
            ORDER BY author_id ASC, publish_date ASC
        )
    )
  )
)

以上SQL查询数据结果呈现如下:

欢迎探讨,其他类似衍生题目补充中...

相关文章

MYSQL经典面试题汇总

1.非聚簇索引一定会回表查询吗?不一定,这涉及到查询语句所要求的字段是否全部命中了索引,如果全部命中了索引,那么就不必再进行回表查询。举个简单的例子:假设我们在员工表的年龄上建立了索引,那么当进行的查...

这些MySQL面试题集锦,据说知名互联网公司都用

1、一张表,里面有ID自增主键,当insert了17条记录之后,删除了第15,16,17条记录,再把Mysql重启,再insert一条记录,这条记录的ID是18还是15 ?  2、MySQL的技术特点...

每天一篇Java面试之MySQL热门面试答案汇总

面试官:MySQL中,如何定位慢查询?候选人:嗯~,我们当时做压测的时候有的接口非常的慢,接口的响应时间超过了2秒以上,因为我们当时的系统部署了运维的监控系统Skywalking ,在展示的报表中可以...