How can we measure the effect of the function quantitatively?
How long does it take to create the function
Does this function make the product too complicated?
How dangerous is the function?
How innovative is the function?
Is it aligned with what users want?
Hypothesis that we need to check
Do users use the search function a lot? Among the numerous functions used by users, how commonly is the search function?
How often do users use the search function in one session?
(It might be a problem if they use it too often or too rarely)
How much do results clicks occur after searching?
where is the results that is clicked after a search?
The more users search, the more click they do?
What is the retention rate of the search function users?
Summary of problem
Explore whether the search function is necessary
Which part should be improved and how
Data Analysis
1. Percentage of search function usage
SELECT DATE_TRUNC('week', z.session_start) AS week,
COUNT(*) AS sessions,
COUNT(CASE WHEN z.autocompletes > 0 THEN z.session ELSE NULL END) AS with_autocompletes,
COUNT(CASE WHEN z.runs > 0 THEN z.session ELSE NULL END) AS with_runs
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
next_event >= INTERVAL '10 MINUTE' OR
last_event IS NULL OR
next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
e.occurred_at >= session.session_start AND
e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY week
2. The number of sessions with autocompletes
SELECT autocompletes, COUNT(*) AS sessions
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
next_event >= INTERVAL '10 MINUTE' OR
last_event IS NULL OR
next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
e.occurred_at >= session.session_start AND
e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY autocompletes
3. The number of sessions with full search
SELECT runs, COUNT(*) AS sessions
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
next_event >= INTERVAL '10 MINUTE' OR
last_event IS NULL OR
next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
e.occurred_at >= session.session_start AND
e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
GROUP BY runs
4. Distribution of the clickthrough after full search
SELECT clicks, COUNT(*)
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
next_event >= INTERVAL '10 MINUTE' OR
last_event IS NULL OR
next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
e.occurred_at >= session.session_start AND
e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE runs > 0
GROUP BY clicks
After full search, zero clickthrough is more than half.
SELECT runs, AVG(clicks) ::FLOAT AS average_clicks
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
next_event >= INTERVAL '10 MINUTE' OR
last_event IS NULL OR
next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
e.occurred_at >= session.session_start AND
e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE runs > 0
GROUP BY runs
The more users search, the more they click.
Above 13 runs, the amount of sessions decreased -> not accurate
5. The order of clicked contents
SELECT event_name, COUNT(user_id)
FROM tutorial.yammer_events
WHERE event_name LIKE 'search_click_result_%'
GROUP BY event_name
ORDER BY event_name
The order of contents that users click is distributed.
In chart 4, users who click only one result is 2.45%. It implies that it's not the matter of ranking, but the matter of results itself.
6. Retention users
SELECT searches, COUNT(*) AS users
FROM (
SELECT user_id, COUNT(*) AS searches
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id, session, MIN(occurred_at) AS session_start, MAX(occurred_at) AS session_end
FROM(
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id, event_type, event_name, occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR
next_event >= INTERVAL '10 MINUTE' OR
last_event IS NULL OR
next_event IS NULL) final
GROUP BY user_id, session) session
ON e.user_id = session.user_id AND
e.occurred_at >= session.session_start AND
e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement') x
GROUP BY x.session_start, x.session, x.user_id) z
WHERE z.runs > 0
GROUP BY user_id) z
GROUP BY searches
ORDER BY searches
Within a month after the first full search, 1,177 users used full search once more, and 428 users used it twice.
Within a month after the first autocomplete, 1,239 users used autocomplete once more, and 1,003 users used it twice.
Summary of analysis
Both autocomplete (222%) and full search(8%) are used in many sessions, so it's worthy to put more resources.
88% of users search more than twice in full search.
More than half of users don't click any result.
After full search, people used to click several results not only one or two.
The order of contents which users click is distributed.
It implies that users can't get the results they want easily.
Review
If there are not enough data, it can distort the result (especially in ratio)
Top comments (0)
Subscribe
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)