大厂面试SQL题(一) ——连续登陆(字节)

yumo6663个月前 (03-30)技术文章28

在日常工作进程中,分析师会经常剖析用户对于公司产品的粘性及留存状况,对应至数据指标层面,常见的有“连续 xx 天登陆用户数”、“7 天/30 天留存用户数”等。正因如此,这类题目在数据面试中属于较为常见的用户类 SQL 题目。接下来,让我们一同来看下这类题目的常见 SQL 解法:

(1)存在一张用户视频投稿行为表,如下所示,旨在找出连续 7 天发布视频、连续 30 天发布视频的用户,以及最大连续发布天数等。

具体的数据实例如下:

-- ---------------- start 生成临时测试表,如已有可用表忽略此步骤------------------------------
WITH author_publish_table AS (
    select  802264760 AS author_id,'2024-01-01' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-02' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-03' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-04' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-05' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-06' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-07' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-08' AS publish_date
    union all
    select  802264760 AS author_id,'2024-01-09' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-01' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-02' AS publish_date
    union all
    select  970014504 AS author_id,'2024-01-05' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-01' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-02' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-03' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-04' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-05' AS publish_date
    union all
    select  96620207 AS author_id,'2024-01-06' AS publish_date
)
-- ----------------------------- 测试表生成 end -----------------------
select * from author_publish_table

生成的数据表查询结果实例图:

就诸如连续投稿、连续登陆等等在一段时间内连续进行某种行为的此类问题而言,一种比较理想的理解在于,一位连续投稿的用户,其每次连续行为的基准日期是一致的。所以,倘若能够统计出每次连续行为的基准日期个数,便能够统计出连续投稿的天数。

类比参考 等差数列 的特性,倘若一个等差数列的首项记为 a1 ,公差记为 d ,那么该等差数列第 n 项 的一般项为,那么,基准日期即为,公差d=1,在SQL中生成公差d=1,的数据序列,最简单的函数即排序函数,因而能够凭借日期差与序号来处理这类问题。

参考SQL如下:

SELECT  author_id,
        publish_date,
        `(n-1)d`,
       date_sub(publish_date, `(n-1)d`) AS `a1` -- 基准日期
from 
    (SELECT  author_id,
            publish_date,
            row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
    FROM    author_publish_table)

基于以上中间结果,其实可以解决出以下场景的各类问题:

  1. 找出连续发布的最大视频日期、连续发布的最小视频日期、连续发布视频天数。
select author_id,
       `连续投稿基准日期` ,
       MAX(publish_date)      as `本次连续的最大投稿日期`,
       min(publish_date)      as `本次连续的最小投稿日期`,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
            publish_date,
            date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
    from 
        (SELECT  author_id,
                publish_date,
                row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
        FROM    author_publish_table) a
    )
GROUP BY author_id,`连续投稿基准日期`

查询结果:

author_id

连续投稿基准日期

本次连续的最大投稿日期

本次连续的最小投稿日期

本次连续投稿天数

970014504

2023-12-31

2024-01-02

2024-01-01

2

970014504

2024-01-02

2024-01-05

2024-01-05

1

96620207

2023-12-31

2024-01-06

2024-01-01

6

802264760

2023-12-31

2024-01-09

2024-01-01

9

  1. 找出作者最大连续投稿天数
select author_id,
       max(`本次连续投稿天数`) as `作者最大连续投稿天数`
from (select author_id,
       `连续投稿基准日期` ,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
            publish_date,
            date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
    from 
        (SELECT  author_id,
                publish_date,
                row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
        FROM    author_publish_table) a
    )
GROUP BY author_id,`连续投稿基准日期`
)
group by author_id

查询结果:

author_id

作者最大连续投稿天数

802264760

9

970014504

2

96620207

6

  1. 找出连续7天发布视频的作者
select author_id,
       `连续投稿基准日期` ,
       count(`连续投稿基准日期`) as `本次连续投稿天数`
 from    (SELECT  author_id,
                  publish_date,
                  date_sub(publish_date, `(n-1)d`) AS `连续投稿基准日期` 
          from 
                (SELECT  author_id,
                         publish_date,
                         row_number() OVER (PARTITION BY author_id ORDER BY publish_date asc) AS `(n-1)d`
                FROM    author_publish_table
                ) a
    )
GROUP BY author_id,`连续投稿基准日期`
HAVING `本次连续投稿天数` >=7

查询结果:

author_id

本次连续投稿天数

802264760

9


如有问题随时私信沟通~

相关文章

70道SQL经典面试题大全,看这一篇就够了

吐血整理了一周SQL面试题,包括了选择题、问答题、实战题。每道题都给大家加了详细的独家解析。几乎每个知识点都涉及到了。刷完这套题,关于SQL的面试妥妥的。题目来源:各大厂面试题、牛客网选择题(1)基础...

总结面试:SQL常见面试题-1

刚刚走过了金三银四,相信许多小伙伴已经得到了自己心仪的offer,在这里先恭喜大家啦!但可能还有一些小伙伴刚刚开始找工作,作为一名数据分析师,还是想总结一些面试的经验及SQL的笔试面试题给各位小伙伴,...

大厂面试SQL题(三) ——连续登陆(含间隔)(字节)

承继上篇文章 大厂面试SQL题(一) ——连续登陆,阐述了最为简易的连续登录题目。然而,倘若在面试题目中增添难度,将 隔天统计问题 纳入考量,该作何处理呢?沿用第一篇题目的场景实例:(1)存在一张用户...

Mysql 常见的10道面试题

1、介绍一下事务的4个特性ACID事务特性含义解释Atomicity原子性同一个事务中的多条语句不可分割,要么全部成功,要么全部失败,不存在部分成功、部分失败Consistency一致性事务从一个一致...

十个你必须会的mysql面试题

什么是SQL?结构化的查询语言(Structured Query Language)简称SQL,是一种数据库查询语言。作用是用于存储数据、查询、更新和管理关系型数据库SQL中的约束有哪几种?1、非空约...

软件测试面试被问这些SQL语句,不会怎么办?于是推出这套SQL教程

作为一名软件测试人员,学习SQL语句是必备。为什么呢?Kitty小编告诉你:首先,从软件的三层架构谈起,大家都知道现在的软件基本要么是C/S架构,要么是B/S架构,但无论是C/S架构还是B/S架构最终...