Skip to content

Backend-Kisum-MusicData — Analyst

Related: Promoters API · Data ownership

Backend-Kisum-MusicData — Analyst (artist_data)

Section titled “Backend-Kisum-MusicData — Analyst (artist_data)”

Centralize on-demand refresh of promoter-facing artist audience metrics into Mongo data_analyst.artist_data. No batch crontab — rows are created/updated only when a consumer requests stats for that artist_id.

StoreKeyWriterReaders
data_analyst.artist_dataartist_id (int32, kisum_data.artists_v2.id)MusicData (GET /analyst/artists/:id/stats) when Promoters hybrid is enabled; else Promoters local refreshPromoters GET /artists/:id/stats (fast path), agencies roster metrics, event-intelligence
data_analyst.artist_popularityartist_id + platform + date + sourceMusicData (GET /analyst/artists/:id/popularity) on-demand backfill/incremental refreshSame route (read path); Promoters GET /api/artists/:id/popularity/:platform (summarized BFF)

MUSICDATA_INTERNAL_API_KEY via X-Internal-API-Key, x-api-key, or Bearer.

MethodPathBehavior
GET/analyst/artists/{artistId}/statsLoad artist_data; if updatedAt within 7 days and no ?refresh=true, return mapped stats. Else 4-tier refresh (Viberate → Songstats → Soundcharts → native MusicData platforms), upsert Mongo, return flat stats in data.
GET/analyst/artists/{artistId}/stats/health{ exists, fresh, updatedAt, last_refresh_source, last_refresh_error }
GET/analyst/artists/{artistId}/chart-presenceScan Mongo charts Kworb collections for placements of the Kisum artist. Resolves kisum_data.artists_v2artist_id, name, platforms.spotify.id; matches per collection (ID, slug, or normalized name in nested data[]). Returns data.charts[] grouped by chart family.
GET/analyst/artists/{artistId}/popularityMongo SoT: data_analyst.artist_popularity. Always returns stored items[]. On read: backfill from Soundcharts when no rows, or max(date) older than 60 days, or ?refresh=true. Full backfill 2020-01-01→today; incremental from max(date)→today. Query: platform (default spotify), optional read filters startDate/endDate. 404 when artist or external_data.soundcharts_uuid missing.

Query: ?refresh=true on stats forces external refresh (same as Promoters bypass_cache).

Popularity vs Promoters: MusicData returns stored items[] plus meta (Soundcharts refresh cost/debug). Promoters GET /artists/{id}/popularity/{platform} proxies this route (no direct Soundcharts), then summarizes into yearly / peak / current for the Rankings UI. See MusicData package doc.

Env on Backend-Kisum-Promoters:

  • MUSICDATA_BASE_URL — e.g. https://music.kisum.dev
  • MUSICDATA_INTERNAL_API_KEY — matches MusicData gate

Client: services/musicdata.client.jsensureArtistStatsFresh(artistId, { refresh }).

Flow: Promoters serves fresh Mongo rows directly; delegates refresh to MusicData when stale/missing/bypass.

  • src/services/analyst/mongo.jsdata_analyst, kisum_data, data_spotify, charts.kworb_spotify_listeners
  • src/services/analyst/artistDataMapper.js — DB ↔ API mapping (sync with Promoters utils/artist-data-stats.utils.js)
  • src/services/analyst/artistStatsRefresh.js — external provider pipeline (tiers 1–3 + native tier 4)
  • src/services/analyst/platformNativeStats.js — per-field native gap-fill (in-process MusicData modules only)
  • src/services/analyst/artistStatsService.js — freshness + upsert orchestration
  • src/services/analyst/chartPresenceService.js — Kworb chart presence across charts DB (registry of per-collection lookups)
  • src/services/analyst/chartPresenceMatch.js — normalized artist name / slug matching for Kworb rows
  • src/services/analyst/artistPopularityService.js — Mongo SoT read + on-demand Soundcharts refresh
  • src/services/analyst/soundchartsPopularity.js — Soundcharts popularity paging (fetchAllSoundchartsPopularityPages)
  • src/services/analyst/soundchartsStats.js — Soundcharts current stats + UUID resolve helper

Artist popularity (GET /analyst/artists/{artistId}/popularity)

Section titled “Artist popularity (GET /analyst/artists/{artistId}/popularity)”

Purpose: Machine-facing popularity time series. Mongo data_analyst.artist_popularity is the source of truth — the route always returns stored points. Soundcharts is used only to backfill or refresh on read.

Refresh rules (on-demand, no crontab):

ConditionSoundcharts fetch window
No Mongo rows2020-01-01 → today (paginated, upsert per date)
max(date) older than 60 daysmax(date) → today
Fresh within 60 daysSkip Soundcharts
?refresh=trueSame as stale (force refresh)

Mongo document: { artist_id, platform, date, value, source: 'soundchart', soundcharts_uuid, updatedAt } — unique on { artist_id, platform, date, source }.

Response: data.items[] from Mongo; meta.fresh, meta.lastDate, meta.refreshed.

UUID resolution: kisum_dataexternal_data.soundcharts_uuid (not data_platforms).

Env: SOUNDCHARTS_APP_ID, SOUNDCHARTS_API_KEY, DB_MONGO_PROD_KISUM.

Chart presence (GET /analyst/artists/{artistId}/chart-presence)

Section titled “Chart presence (GET /analyst/artists/{artistId}/chart-presence)”

Purpose: Answer “which Kworb chart families does this artist appear in?” without the browser querying each charts.kworb_* collection.

Match strategy (per collection):

CollectionMatch
kworb_spotify_top_artistsartist_id or spotify_id
kworb_spotify_listenersartist_id or spotifyId
kworb_global_artistsartist_id, kworbSlug (from name), or name
kworb_current_chartsartist_id or flat artist (#1 rows)
kworb_spotify_country_chartsdata[].artist_id, data[].spotifyArtistId, or normalized data[].artist
billboarddata[].artist_id, data[].artist[].id (Kisum int32), or data[].artist[].name
iTunes / radio / Spotify period / YouTube nested chartsdata[].artist_id or data[].artist (primary artist before feat.)

Crontab enrichment: All Kworb and Billboard chart scrapers (/crontab/kworb, /crontab/billboard) set artist_id on each row (kisum_data.artists_v2.id, or null). Lookup: Spotify platform id when present, else case-insensitive name / sortName. Shared helper: Backend-Kisum-MusicData/src/services/crontab/kworb/artistLookup.js.

Response (data): artistId, name, spotifyId, chartCount, families[], scrapedAtMax, charts[] with source, label, family, rank / entryCount / placements[].

Performance note: Name-based collections use regex pre-filter + row filter; for high traffic consider crontab-built charts.artist_chart_presence index (future).

On external refresh, metrics are merged per field. A positive value from an earlier tier is never overwritten by a later tier.

TierSourceWhen
1Viberate fanbase-distributionexternal_data.viberate_uuid present
2Songstats /enterprise/v1/artists/statsgaps after tier 1
3Soundcharts current/stats + YouTube audiencegaps after tiers 1–2
4Native MusicData platform modulesgaps after tiers 1–3

last_refresh_source examples: viberate+songstats+soundcharts+native(genius,lastfm,qq).

ID source: kisum_data.artists_v2.platforms.{platform}.id (fallback streamingPlatforms.*). Last.fm: platforms.musicbrainz.idplatforms.lastfm.idartist.name. Bandsintown: artist.name. QQ: platforms.qq.id = singermid.

Tier A (high confidence): Spotify followers (spotifyApi), Spotify monthly listeners (charts.kworb_spotify_listeners.listeners), YouTube subscribers/views, Deezer nb_fan (public API), Genius followers_count, Last.fm listeners/plays.

Tier B (fixture-verified extract paths): NetEase data.fansCnt, QQ response.num, Instagram follower_count (Zylalabs), TikTok (RapidAPI by id or username), Bandsintown tracker_count.

Skipped (no audience metric in MusicData): Shazam, Apple Music catalog, Facebook, X, SoundCloud, Amazon, Tidal, Beatport, Traxsource, Songkick, Kuwo.

Removed: Legacy HTTP self-call to music.kisum.dev/instagram/... during refresh — Instagram gap-fill uses in-process Zylalabs fetch in tier 4.

Concurrency: max 4 parallel native fetches; 3s timeout per platform; failures are silent per platform.

API keyartist_data field
geniusgenius_followers
lastfmlastfm_listeners
lastfm_playslastfm_plays
neteasenetease_fans
qqqq_followers

VIBERATE_TOKEN, SONGSTATS_API_KEY, SOUNDCHARTS_APP_ID, SOUNDCHARTS_API_KEY, DB_MONGO_PROD_KISUM, ZYLALABS_KEY (Instagram tier 4), Spotify/YouTube/Genius/Last.fm/NetEase/QQ credentials as used by existing MusicData routes

MusicData stores the Kisum Instagram Graph long-lived token in Mongo auth.instagram (not caller-supplied).

MethodPathGatePurpose
GET/instagram/loginPublicBrowser OAuth start
GET/instagram/callbackPublicCode exchange + long-lived token persist
GET/instagram/refreshPublicRefresh long-lived token (~60 days)
GET/instagram/meMusicData gateRead authorized account profile; auto-refresh once on 401

Env: INSTAGRAM_CLIENT_ID, INSTAGRAM_CLIENT_SECRET, INSTAGRAM_REDIRECT_URI (must match Meta app redirect URI). Optional INSTAGRAM_SCOPES (default instagram_business_basic).

Ops: expired token → open /instagram/login in browser; before expiry → GET /instagram/refresh or rely on /instagram/me one-shot refresh.