Backend-Kisum-MusicData — Analyst
Related: Promoters API · Data ownership
Backend-Kisum-MusicData — Analyst (artist_data)
Section titled “Backend-Kisum-MusicData — Analyst (artist_data)”Purpose
Section titled “Purpose”Centralize on-demand refresh of promoter-facing artist audience metrics into Mongo data_analyst.artist_data. No batch crontab — rows are created/updated only when a consumer requests stats for that artist_id.
Data ownership
Section titled “Data ownership”| Store | Key | Writer | Readers |
|---|---|---|---|
data_analyst.artist_data | artist_id (int32, kisum_data.artists_v2.id) | MusicData (GET /analyst/artists/:id/stats) when Promoters hybrid is enabled; else Promoters local refresh | Promoters GET /artists/:id/stats (fast path), agencies roster metrics, event-intelligence |
data_analyst.artist_popularity | artist_id + platform + date + source | MusicData (GET /analyst/artists/:id/popularity) on-demand backfill/incremental refresh | Same route (read path); Promoters GET /api/artists/:id/popularity/:platform (summarized BFF) |
Endpoints (machine auth)
Section titled “Endpoints (machine auth)”MUSICDATA_INTERNAL_API_KEY via X-Internal-API-Key, x-api-key, or Bearer.
| Method | Path | Behavior |
|---|---|---|
GET | /analyst/artists/{artistId}/stats | Load artist_data; if updatedAt within 7 days and no ?refresh=true, return mapped stats. Else 4-tier refresh (Viberate → Songstats → Soundcharts → native MusicData platforms), upsert Mongo, return flat stats in data. |
GET | /analyst/artists/{artistId}/stats/health | { exists, fresh, updatedAt, last_refresh_source, last_refresh_error } |
GET | /analyst/artists/{artistId}/chart-presence | Scan Mongo charts Kworb collections for placements of the Kisum artist. Resolves kisum_data.artists_v2 → artist_id, name, platforms.spotify.id; matches per collection (ID, slug, or normalized name in nested data[]). Returns data.charts[] grouped by chart family. |
GET | /analyst/artists/{artistId}/popularity | Mongo SoT: data_analyst.artist_popularity. Always returns stored items[]. On read: backfill from Soundcharts when no rows, or max(date) older than 60 days, or ?refresh=true. Full backfill 2020-01-01→today; incremental from max(date)→today. Query: platform (default spotify), optional read filters startDate/endDate. 404 when artist or external_data.soundcharts_uuid missing. |
Query: ?refresh=true on stats forces external refresh (same as Promoters bypass_cache).
Popularity vs Promoters: MusicData returns stored items[] plus meta (Soundcharts refresh cost/debug). Promoters GET /artists/{id}/popularity/{platform} proxies this route (no direct Soundcharts), then summarizes into yearly / peak / current for the Rankings UI. See MusicData package doc.
Promoters integration
Section titled “Promoters integration”Env on Backend-Kisum-Promoters:
MUSICDATA_BASE_URL— e.g.https://music.kisum.devMUSICDATA_INTERNAL_API_KEY— matches MusicData gate
Client: services/musicdata.client.js → ensureArtistStatsFresh(artistId, { refresh }).
Flow: Promoters serves fresh Mongo rows directly; delegates refresh to MusicData when stale/missing/bypass.
Implementation (MusicData)
Section titled “Implementation (MusicData)”src/services/analyst/mongo.js—data_analyst,kisum_data,data_spotify,charts.kworb_spotify_listenerssrc/services/analyst/artistDataMapper.js— DB ↔ API mapping (sync with Promotersutils/artist-data-stats.utils.js)src/services/analyst/artistStatsRefresh.js— external provider pipeline (tiers 1–3 + native tier 4)src/services/analyst/platformNativeStats.js— per-field native gap-fill (in-process MusicData modules only)src/services/analyst/artistStatsService.js— freshness + upsert orchestrationsrc/services/analyst/chartPresenceService.js— Kworb chart presence acrosschartsDB (registry of per-collection lookups)src/services/analyst/chartPresenceMatch.js— normalized artist name / slug matching for Kworb rowssrc/services/analyst/artistPopularityService.js— Mongo SoT read + on-demand Soundcharts refreshsrc/services/analyst/soundchartsPopularity.js— Soundcharts popularity paging (fetchAllSoundchartsPopularityPages)src/services/analyst/soundchartsStats.js— Soundcharts current stats + UUID resolve helper
Artist popularity (GET /analyst/artists/{artistId}/popularity)
Section titled “Artist popularity (GET /analyst/artists/{artistId}/popularity)”Purpose: Machine-facing popularity time series. Mongo data_analyst.artist_popularity is the source of truth — the route always returns stored points. Soundcharts is used only to backfill or refresh on read.
Refresh rules (on-demand, no crontab):
| Condition | Soundcharts fetch window |
|---|---|
| No Mongo rows | 2020-01-01 → today (paginated, upsert per date) |
max(date) older than 60 days | max(date) → today |
| Fresh within 60 days | Skip Soundcharts |
?refresh=true | Same as stale (force refresh) |
Mongo document: { artist_id, platform, date, value, source: 'soundchart', soundcharts_uuid, updatedAt } — unique on { artist_id, platform, date, source }.
Response: data.items[] from Mongo; meta.fresh, meta.lastDate, meta.refreshed.
UUID resolution: kisum_data → external_data.soundcharts_uuid (not data_platforms).
Env: SOUNDCHARTS_APP_ID, SOUNDCHARTS_API_KEY, DB_MONGO_PROD_KISUM.
Chart presence (GET /analyst/artists/{artistId}/chart-presence)
Section titled “Chart presence (GET /analyst/artists/{artistId}/chart-presence)”Purpose: Answer “which Kworb chart families does this artist appear in?” without the browser querying each charts.kworb_* collection.
Match strategy (per collection):
| Collection | Match |
|---|---|
kworb_spotify_top_artists | artist_id or spotify_id |
kworb_spotify_listeners | artist_id or spotifyId |
kworb_global_artists | artist_id, kworbSlug (from name), or name |
kworb_current_charts | artist_id or flat artist (#1 rows) |
kworb_spotify_country_charts | data[].artist_id, data[].spotifyArtistId, or normalized data[].artist |
billboard | data[].artist_id, data[].artist[].id (Kisum int32), or data[].artist[].name |
| iTunes / radio / Spotify period / YouTube nested charts | data[].artist_id or data[].artist (primary artist before feat.) |
Crontab enrichment: All Kworb and Billboard chart scrapers (/crontab/kworb, /crontab/billboard) set artist_id on each row (kisum_data.artists_v2.id, or null). Lookup: Spotify platform id when present, else case-insensitive name / sortName. Shared helper: Backend-Kisum-MusicData/src/services/crontab/kworb/artistLookup.js.
Response (data): artistId, name, spotifyId, chartCount, families[], scrapedAtMax, charts[] with source, label, family, rank / entryCount / placements[].
Performance note: Name-based collections use regex pre-filter + row filter; for high traffic consider crontab-built charts.artist_chart_presence index (future).
Refresh pipeline (4 tiers)
Section titled “Refresh pipeline (4 tiers)”On external refresh, metrics are merged per field. A positive value from an earlier tier is never overwritten by a later tier.
| Tier | Source | When |
|---|---|---|
| 1 | Viberate fanbase-distribution | external_data.viberate_uuid present |
| 2 | Songstats /enterprise/v1/artists/stats | gaps after tier 1 |
| 3 | Soundcharts current/stats + YouTube audience | gaps after tiers 1–2 |
| 4 | Native MusicData platform modules | gaps after tiers 1–3 |
last_refresh_source examples: viberate+songstats+soundcharts+native(genius,lastfm,qq).
Tier 4 — native platform gap-fill
Section titled “Tier 4 — native platform gap-fill”ID source: kisum_data.artists_v2.platforms.{platform}.id (fallback streamingPlatforms.*). Last.fm: platforms.musicbrainz.id → platforms.lastfm.id → artist.name. Bandsintown: artist.name. QQ: platforms.qq.id = singermid.
Tier A (high confidence): Spotify followers (spotifyApi), Spotify monthly listeners (charts.kworb_spotify_listeners.listeners), YouTube subscribers/views, Deezer nb_fan (public API), Genius followers_count, Last.fm listeners/plays.
Tier B (fixture-verified extract paths): NetEase data.fansCnt, QQ response.num, Instagram follower_count (Zylalabs), TikTok (RapidAPI by id or username), Bandsintown tracker_count.
Skipped (no audience metric in MusicData): Shazam, Apple Music catalog, Facebook, X, SoundCloud, Amazon, Tidal, Beatport, Traxsource, Songkick, Kuwo.
Removed: Legacy HTTP self-call to music.kisum.dev/instagram/... during refresh — Instagram gap-fill uses in-process Zylalabs fetch in tier 4.
Concurrency: max 4 parallel native fetches; 3s timeout per platform; failures are silent per platform.
New API ↔ DB fields (tier 4)
Section titled “New API ↔ DB fields (tier 4)”| API key | artist_data field |
|---|---|
genius | genius_followers |
lastfm | lastfm_listeners |
lastfm_plays | lastfm_plays |
netease | netease_fans |
qq | qq_followers |
Env (refresh)
Section titled “Env (refresh)”VIBERATE_TOKEN, SONGSTATS_API_KEY, SOUNDCHARTS_APP_ID, SOUNDCHARTS_API_KEY, DB_MONGO_PROD_KISUM, ZYLALABS_KEY (Instagram tier 4), Spotify/YouTube/Genius/Last.fm/NetEase/QQ credentials as used by existing MusicData routes
Instagram OAuth (/instagram/me)
Section titled “Instagram OAuth (/instagram/me)”MusicData stores the Kisum Instagram Graph long-lived token in Mongo auth.instagram (not caller-supplied).
| Method | Path | Gate | Purpose |
|---|---|---|---|
GET | /instagram/login | Public | Browser OAuth start |
GET | /instagram/callback | Public | Code exchange + long-lived token persist |
GET | /instagram/refresh | Public | Refresh long-lived token (~60 days) |
GET | /instagram/me | MusicData gate | Read authorized account profile; auto-refresh once on 401 |
Env: INSTAGRAM_CLIENT_ID, INSTAGRAM_CLIENT_SECRET, INSTAGRAM_REDIRECT_URI (must match Meta app redirect URI). Optional INSTAGRAM_SCOPES (default instagram_business_basic).
Ops: expired token → open /instagram/login in browser; before expiry → GET /instagram/refresh or rely on /instagram/me one-shot refresh.