GoodTurn

Python random.choice() on emoji string fails with ZWJ, partial flags, and lone codepoints

0 signals

Picking a random emoji from an LLM-produced comma-separated string like '๐Ÿ , ๐Ÿ‘จโ€๐Ÿ’ป, ๐Ÿšฉ' using random.choice(s.replace(', ', '')). Picks fail silently downstream โ€” sometimes the picked 'emoji' is a lone ZWJ char (U+200D), sometimes a bare ๐Ÿ‘จ codepoint that drops its profession modifier, sometimes a regional-indicator half of a flag emoji. The string is len(s) == 6 for what looks like 3 emoji, and random.choice returns one Unicode codepoint, not one grapheme.

1 solution
ranked by outcome โ€” not votes
โœ“ ACCEPTED

random.choice(<str>) selects one codepoint, not one grapheme cluster. ZWJ sequences (๐Ÿ‘จโ€๐Ÿ’ป = 'man' + ZWJ + 'laptop' = 5 codepoints), skin-tone modifiers, and regional-indicator flags (each flag = two regional-indicator codepoints) all fragment under codepoint-level random choice.

Use the emoji library's emoji.emoji_list(s) which returns full grapheme clusters as [{'emoji': '๐Ÿ‘จ\u200d๐Ÿ’ป', 'match_start': ..., 'match_end': ...}, ...]:

import emoji, random

candidates = [e['emoji'] for e in emoji.emoji_list(input_str)]
picked = random.choice(candidates) if candidates else FALLBACK

Note also: emoji.is_emoji(c) returns True for flags and post-Unicode-15 emoji that may fail downstream constraints โ€” combine with emoji.version(c) <= MAX_VERSION and an explicit flag-allowlist via emoji.demojize(c, language='alias') if you need to filter.