大厂面试SQL题(一) ——连续登陆(字节)

yumo6664个月前 (03-30)技术文章50

在日常工作进程中,分析师会经常剖析用户对于公司产品的粘性及留存状况,对应至数据指标层面,常见的有“连续 xx 天登陆用户数”、“7 天/30 天留存用户数”等。正因如此,这类题目在数据面试中属于较为常见的用户类 SQL 题目。接下来,让我们一同来看下这类题目的常见 SQL 解法:

(1)存在一张用户视频投稿行为表,如下所示,旨在找出连续 7 天发布视频、连续 30 天发布视频的用户,以及最大连续发布天数等。

具体的数据实例如下:

-- ---------------- start 生成临时测试表,如已有可用表忽略此步骤------------------------------
WITH author_publish_table AS (
    select  802264760 AS author_id,'2024-01-01' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-02' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-03' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-04' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-05' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-06' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-07' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-08' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-09' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-01' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-02' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-05' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-01' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-02' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-03' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-04' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-05' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-06' AS publish_date
)
-- ----------------------------- 测试表生成 end -----------------------
select * from author_publish_table

生成的数据表查询结果实例图:

就诸如连续投稿、连续登陆等等在一段时间内连续进行某种行为的此类问题而言,一种比较理想的理解在于,一位连续投稿的用户,其每次连续行为的基准日期是一致的。所以,倘若能够统计出每次连续行为的基准日期个数,便能够统计出连续投稿的天数。

类比参考 等差数列 的特性,倘若一个等差数列的首项记为 a1 ,公差记为 d ,那么该等差数列第 n 项 的一般项为,那么,基准日期即为,公差d=1,在SQL中生成公差d=1,的数据序列,最简单的函数即排序函数,因而能够凭借日期差与序号来处理这类问题。

参考SQL如下:

SELECT  author_id,
        publish_date,
        `(n-1)d`,
       date_sub(publish_date, `(n-1)d`) AS `a1` -- 基准日期
from 
    (SELECT  author_id,
            publish_date,
            row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
    FROM    author_publish_table)

基于以上中间结果,其实可以解决出以下场景的各类问题:

  1. 找出连续发布的最大视频日期、连续发布的最小视频日期、连续发布视频天数。
select author_id,
       `连续投稿基准日期` ,
       MAX(publish_date)      as `本次连续的最大投稿日期`,
       min(publish_date)      as `本次连续的最小投稿日期`,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
            publish_date,
            date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
    from 
        (SELECT  author_id,
                publish_date,
                row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
        FROM    author_publish_table) a
    )
GROUP BY author_id,`连续投稿基准日期`

查询结果:

author_id

连续投稿基准日期

本次连续的最大投稿日期

本次连续的最小投稿日期

本次连续投稿天数

970014504

2023-12-31

2024-01-02

2024-01-01

2

970014504

2024-01-02

2024-01-05

2024-01-05

1

96620207

2023-12-31

2024-01-06

2024-01-01

6

802264760

2023-12-31

2024-01-09

2024-01-01

9

  1. 找出作者最大连续投稿天数
select author_id,
       max(`本次连续投稿天数`) as `作者最大连续投稿天数`
from (select author_id,
       `连续投稿基准日期` ,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
            publish_date,
            date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
    from 
        (SELECT  author_id,
                publish_date,
                row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
        FROM    author_publish_table) a
    )
GROUP BY author_id,`连续投稿基准日期`
)
group by author_id

查询结果:

author_id

作者最大连续投稿天数

802264760

9

970014504

2

96620207

6

  1. 找出连续7天发布视频的作者
select author_id,
       `连续投稿基准日期` ,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
                  publish_date,
                  date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
          from 
                (SELECT  author_id,
                         publish_date,
                         row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
                FROM    author_publish_table
                ) a
    )
GROUP BY author_id,`连续投稿基准日期`
HAVING `本次连续投稿天数` >=7

查询结果:

author_id

本次连续投稿天数

802264760

9


如有问题随时私信沟通~

相关文章

初学者最热门和棘手的10个T-SQL问题

T-SQL有助于在物理或虚拟实例中创建和管理对象,以及插入、检索、修改和删除数据表。国际标准化组织(ISO)和美国国家标准协会(ANSI)发布的SQL标准定义了一种软件语言,Transact-SQL就...

网络安全面试题-SQL注入问题总结

一、知识储备类1.SQL与NoSQL的区别?SQL:关系型数据库NoSQL:非关系型数据库存储方式:SQL具有特定的结构表,NoSQL存储方式灵活性能:NoSQL较优于SQL数据类型:SQL适用结构化...

MySQL经典面试题 14 道

面试中经常被问到的 MySQL 题目,本文总结了面试中的经典问题。1.主键(primary key)和候选键(candidate key)有什么区别?候选键 - 候选键可以是任何列或可以作为数据库中唯...