API Reference
The following modules and submodules are available in SilverSpeak:
SilverSpeak: A professional library for text normalization and homoglyph detection/replacement.
This library provides tools for detecting and normalizing homoglyphs (characters that look similar but have different Unicode code points), which can be used for text normalization, security applications, and adversarial text generation.
Main components: - random_attack: Generate text with random homoglyph replacements - greedy_attack: Generate text with strategically chosen homoglyph replacements - normalize_text: Normalize text by replacing homoglyphs with standard characters - HomoglyphReplacer: Core class for homoglyph replacement operations
Author: Aldan Creo (ACMC) <os@acmc.fyi> License: See LICENSE file in the project root
- class silverspeak.HomoglyphReplacer(unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, random_seed: int = 42)[source]
Bases:
object
A class to replace characters with their homoglyphs and normalize homoglyph text.
This class is the core component of SilverSpeak, providing functionality to: 1. Replace characters with their visually similar homoglyphs 2. Normalize text by replacing homoglyphs with their standard characters 3. Support various normalization strategies based on Unicode script, block, and context
- unicode_categories_to_replace
Unicode categories of characters to replace.
- Type:
Set[str]
- types_of_homoglyphs_to_use
Types of homoglyphs to use.
- Type:
List[TypesOfHomoglyphs]
- replace_with_priority
Whether to replace with priority.
- Type:
bool
- random_state
Random state for reproducibility.
- Type:
random.Random
- chars_map
Mapping of characters to their homoglyphs.
- Type:
Dict[str, List[str]]
- reverse_chars_map
Reverse mapping of homoglyphs to original characters.
- Type:
Dict[str, List[str]]
- base_normalization_map
Base table for normalizing text.
- Type:
Dict[str, List[str]]
- normalization_translation_maps
Cache of normalization maps by script.
- Type:
Dict[str, Dict[str, str]]
- get_homoglyph_for_char(char: str, same_script: bool = False, same_block: bool = False, dominant_script: str | None = None, dominant_block: str | None = None, context: str | None = None, context_window_size: int = 10) str | None [source]
Get a homoglyph replacement for a character, matching its properties.
This method selects an optimal homoglyph replacement for a character by comparing the Unicode properties of the potential homoglyphs with those of the original character.
- Parameters:
char (str) – The character to replace with a homoglyph.
same_script (bool) – Whether to use only homoglyphs from the same Unicode script. Defaults to False.
same_block (bool) – Whether to use only homoglyphs from the same Unicode block. Defaults to False.
dominant_script (Optional[str]) – The dominant script to use for filtering. If None and same_script is True, will be automatically detected.
dominant_block (Optional[str]) – The dominant block to use for filtering. If None and same_block is True, will be automatically detected.
context (Optional[str], optional) – Kept for API compatibility, not used.
context_window_size (int, optional) – Kept for API compatibility, not used.
- Returns:
A homoglyph replacement, or None if no suitable replacement is found.
- Return type:
Optional[str]
- get_normalization_map_for_script_block_and_category(script: str, block: str | None = None, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, only_replace_non_normalized=False, **kwargs) Dict[str, str] [source]
Generate a normalization map for a specific script and block.
This method creates a mapping for normalizing homoglyphs based on a specific Unicode script and optional block, considering character categories.
- Parameters:
script (str) – The target Unicode script (e.g., ‘Latin’, ‘Cyrillic’).
block (Optional[str], optional) – The target Unicode block. Defaults to None.
unicode_categories_to_replace (Set[str]) – Unicode categories of characters to consider for replacement. Defaults to _DEFAULT_UNICODE_CATEGORIES_TO_REPLACE.
only_replace_non_normalized (bool) – If True, only replace characters that aren’t already in normalized form. Defaults to False.
**kwargs – Additional keyword arguments.
- Returns:
A normalization map for the specified script and block.
- Return type:
Dict[str, str]
- normalize(text: str, strategy: NormalizationStrategies, **kwargs) str [source]
Normalize text by replacing homoglyphs with their standard characters.
This method applies the specified normalization strategy to convert text containing homoglyphs back to standard characters. Different strategies consider different aspects like dominant script, block, local context, tokenization, or language model.
- Parameters:
text (str) – Text to normalize.
strategy (NormalizationStrategies) – The normalization strategy to apply.
**kwargs – Additional arguments passed to the specific strategy implementation.
- Returns:
Normalized text with homoglyphs replaced by standard characters.
- Return type:
str
- Raises:
NotImplementedError – If the specified strategy is unknown.
- class silverspeak.NormalizationStrategies(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Enumeration of text normalization strategies for homoglyph replacement.
- Values:
DOMINANT_SCRIPT: Normalize based on the dominant Unicode script in the text. DOMINANT_SCRIPT_AND_BLOCK: Normalize based on both dominant script and Unicode block. LOCAL_CONTEXT: Normalize based on surrounding character context. TOKENIZATION: Normalize based on tokenization of the text. LANGUAGE_MODEL: Normalize using a masked language model to determine the most likely characters. LLM_PROMPT: Normalize using a generative language model prompted to fix homoglyphs. SPELL_CHECK: Normalize using spelling correction algorithms with multilingual support. NGRAM: Normalize using character n-gram frequency analysis. OCR_CONFIDENCE: Normalize using OCR confidence scores or confusion matrices. GRAPH_BASED: Normalize using a graph-based character similarity network.
- DOMINANT_SCRIPT = 'dominant_script'
- DOMINANT_SCRIPT_AND_BLOCK = 'dominant_script_and_block'
- GRAPH_BASED = 'graph_based'
- LANGUAGE_MODEL = 'language_model'
- LLM_PROMPT = 'llm_prompt'
- LOCAL_CONTEXT = 'local_context'
- NGRAM = 'ngram'
- OCR_CONFIDENCE = 'ocr_confidence'
- SPELL_CHECK = 'spell_check'
- TOKENIZATION = 'tokenization'
- class silverspeak.TypesOfHomoglyphs(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Enumeration of the different types of homoglyphs supported by SilverSpeak.
- Values:
- IDENTICAL: Characters that are visually identical in most fonts but have
different Unicode code points.
CONFUSABLES: Characters identified as confusables by Unicode. OCR: Characters that might be confused in OCR systems. OCR_REFINED: A refined subset of OCR confusables with high visual similarity.
- CONFUSABLES = 'confusables'
- IDENTICAL = 'identical'
- OCR = 'ocr'
- OCR_REFINED = 'ocr_refined'
- silverspeak.get_version() str [source]
Get the current version of the SilverSpeak package.
- Returns:
The current version string
- Return type:
str
- silverspeak.greedy_attack(text: str, percentage: float = 0.1, random_seed: int = 42, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, same_script: bool = False, same_block: bool = False) str [source]
Replace characters with homoglyphs using a greedy approach.
This function replaces characters in the input text with homoglyphs (visually similar characters with different Unicode code points) using a greedy approach. The function will attempt to replace every eligible character up to the specified percentage limit.
- Parameters:
text (str) – The input text to transform.
percentage (float, optional) – The percentage of characters to replace (0.0-1.0). Defaults to 0.1 (10%).
random_seed (int, optional) – Seed for the random number generator to ensure reproducible results. Defaults to 42.
unicode_categories_to_replace (Set[str], optional) – Unicode categories of characters to consider for replacement. Defaults to predefined categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs], optional) – Types of homoglyphs to use for replacements. Defaults to predefined types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. Defaults to False.
same_script (bool, optional) – Whether to only use homoglyphs from the same Unicode script as the dominant script in the text. Defaults to False.
same_block (bool, optional) – Whether to only use homoglyphs from the same Unicode block as the dominant block in the text. Defaults to False.
- Returns:
The transformed text with homoglyph replacements.
- Return type:
str
- Raises:
ValueError – If the text is None or empty, or if percentage is out of range.
Example
```python # Replace 5% of characters with homoglyphs modified_text = greedy_attack(“Hello world”, percentage=0.05, random_seed=42)
# Replace 10% of characters with homoglyphs, prioritizing replacements modified_text = greedy_attack(“Hello world”, percentage=0.1, replace_with_priority=True) ```
- silverspeak.normalize_text(text: str, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, strategy: NormalizationStrategies = NormalizationStrategies.LOCAL_CONTEXT, **kwargs) str [source]
Normalize text by replacing homoglyphs with their standard equivalents.
This function provides a convenient interface to the HomoglyphReplacer’s normalize method, creating a temporary HomoglyphReplacer instance with the specified parameters.
- Parameters:
text (str) – The text to normalize.
unicode_categories_to_replace (Set[str]) – Unicode categories to replace. Defaults to predefined common categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs]) – Types of homoglyphs to consider. Defaults to predefined common homoglyph types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. When True, replacements are chosen by order in the homoglyph lists. When False, replacements are chosen based on context or other strategies. Defaults to False.
strategy (NormalizationStrategies) – The normalization strategy to use. Defaults to LOCAL_CONTEXT, which selects replacements based on surrounding characters.
- Returns:
The normalized text with homoglyphs replaced.
- Return type:
str
- Raises:
ValueError – If the text is None or invalid parameters are provided.
NotImplementedError – If an unsupported normalization strategy is specified.
Example
```python # Normalize text using the default local context strategy normalized_text = normalize_text(“Hеllo wоrld”) # Contains Cyrillic ‘е’ and ‘о’
# Normalize text using dominant script strategy from silverspeak.homoglyphs.utils import NormalizationStrategies normalized_text = normalize_text(
“Hеllo wоrld”, strategy=NormalizationStrategies.DOMINANT_SCRIPT
)
- silverspeak.random_attack(text: str, percentage: float = 0.1, random_seed: int | None = None, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, same_script: bool = False, same_block: bool = False) str [source]
Replace random characters in text with visually similar homoglyphs.
This function replaces a specified percentage of characters in the input text with homoglyphs (visually similar characters with different Unicode code points). The replacements are chosen randomly, and the function can be configured to maintain script and block consistency.
- Parameters:
text (str) – The input text to transform.
percentage (float, optional) – The percentage of characters to replace (0.0-1.0). Defaults to 0.1 (10%).
random_seed (Optional[int], optional) – Seed for the random number generator to ensure reproducible results. Defaults to None (non-reproducible).
unicode_categories_to_replace (Set[str]) – Unicode categories to consider for replacement. Defaults to predefined categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs], optional) – Types of homoglyphs to use for replacements. Defaults to predefined types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. Defaults to False.
same_script (bool, optional) – Whether to only use homoglyphs from the same Unicode script as the dominant script in the text. Defaults to False.
same_block (bool, optional) – Whether to only use homoglyphs from the same Unicode block as the dominant block in the text. Defaults to False.
- Returns:
The transformed text with homoglyph replacements.
- Return type:
str
- Raises:
ValueError – If the text is None or empty, or if percentage is out of range.
Example
```python # Replace 5% of characters with homoglyphs modified_text = random_attack(“Hello world”, percentage=0.05, random_seed=42)
# Replace 10% of characters with homoglyphs from the same script modified_text = random_attack(“Hello world”, percentage=0.1, same_script=True) ```
- silverspeak.targeted_attack(text: str, percentage: float = 0.1, random_seed: int | None = None, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False) str [source]
Replace a percentage of characters in text with property-matched homoglyphs.
This function replaces a specified percentage of characters in the input text with homoglyphs (visually similar characters with different Unicode code points). The function selects homoglyphs based on matching the Unicode properties of the original character being replaced, without considering surrounding context. The implementation selects the highest scoring homoglyph replacements based on property matching.
- Parameters:
text (str) – The input text to transform.
percentage (float, optional) – The percentage of characters to replace (0.0-1.0). Defaults to 0.1 (10%).
random_seed (Optional[int], optional) – Seed for the random number generator to ensure reproducible results. Defaults to None (non-reproducible).
unicode_categories_to_replace (Set[str]) – Unicode categories to consider for replacement. Defaults to predefined categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs], optional) – Types of homoglyphs to use for replacements. Defaults to predefined types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. Defaults to False.
- Returns:
The transformed text with property-matched homoglyph replacements.
- Return type:
str
- Raises:
ValueError – If the text is None or empty, or if percentage is out of range.
Example
```python # Replace 5% of characters with property-matched homoglyphs modified_text = targeted_attack(
“Hello world”, percentage=0.05, random_seed=42
)
# Replace 10% of characters with property-matched homoglyphs modified_text = targeted_attack(
“Hello world”, percentage=0.1
)
SilverSpeak Homoglyphs Module
This module provides functionality for detecting, replacing, and normalizing homoglyphs (characters that look visually similar but have different Unicode code points).
The module implements both attack mechanisms (to replace standard characters with homoglyphs) and normalization strategies (to convert homoglyphs back to standard characters).
Main components: - random_attack: Generate text with random homoglyph replacements - greedy_attack: Generate text with strategically chosen homoglyph replacements - targeted_attack: Generate text with targeted homoglyph replacements - normalize_text: Normalize text by replacing homoglyphs with standard characters - HomoglyphReplacer: Core class for homoglyph replacement operations
Author: Aldan Creo (ACMC) <os@acmc.fyi> License: See LICENSE file in the project root
- class silverspeak.homoglyphs.HomoglyphReplacer(unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, random_seed: int = 42)[source]
Bases:
object
A class to replace characters with their homoglyphs and normalize homoglyph text.
This class is the core component of SilverSpeak, providing functionality to: 1. Replace characters with their visually similar homoglyphs 2. Normalize text by replacing homoglyphs with their standard characters 3. Support various normalization strategies based on Unicode script, block, and context
- unicode_categories_to_replace
Unicode categories of characters to replace.
- Type:
Set[str]
- types_of_homoglyphs_to_use
Types of homoglyphs to use.
- Type:
List[TypesOfHomoglyphs]
- replace_with_priority
Whether to replace with priority.
- Type:
bool
- random_state
Random state for reproducibility.
- Type:
random.Random
- chars_map
Mapping of characters to their homoglyphs.
- Type:
Dict[str, List[str]]
- reverse_chars_map
Reverse mapping of homoglyphs to original characters.
- Type:
Dict[str, List[str]]
- base_normalization_map
Base table for normalizing text.
- Type:
Dict[str, List[str]]
- normalization_translation_maps
Cache of normalization maps by script.
- Type:
Dict[str, Dict[str, str]]
- get_homoglyph_for_char(char: str, same_script: bool = False, same_block: bool = False, dominant_script: str | None = None, dominant_block: str | None = None, context: str | None = None, context_window_size: int = 10) str | None [source]
Get a homoglyph replacement for a character, matching its properties.
This method selects an optimal homoglyph replacement for a character by comparing the Unicode properties of the potential homoglyphs with those of the original character.
- Parameters:
char (str) – The character to replace with a homoglyph.
same_script (bool) – Whether to use only homoglyphs from the same Unicode script. Defaults to False.
same_block (bool) – Whether to use only homoglyphs from the same Unicode block. Defaults to False.
dominant_script (Optional[str]) – The dominant script to use for filtering. If None and same_script is True, will be automatically detected.
dominant_block (Optional[str]) – The dominant block to use for filtering. If None and same_block is True, will be automatically detected.
context (Optional[str], optional) – Kept for API compatibility, not used.
context_window_size (int, optional) – Kept for API compatibility, not used.
- Returns:
A homoglyph replacement, or None if no suitable replacement is found.
- Return type:
Optional[str]
- get_normalization_map_for_script_block_and_category(script: str, block: str | None = None, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, only_replace_non_normalized=False, **kwargs) Dict[str, str] [source]
Generate a normalization map for a specific script and block.
This method creates a mapping for normalizing homoglyphs based on a specific Unicode script and optional block, considering character categories.
- Parameters:
script (str) – The target Unicode script (e.g., ‘Latin’, ‘Cyrillic’).
block (Optional[str], optional) – The target Unicode block. Defaults to None.
unicode_categories_to_replace (Set[str]) – Unicode categories of characters to consider for replacement. Defaults to _DEFAULT_UNICODE_CATEGORIES_TO_REPLACE.
only_replace_non_normalized (bool) – If True, only replace characters that aren’t already in normalized form. Defaults to False.
**kwargs – Additional keyword arguments.
- Returns:
A normalization map for the specified script and block.
- Return type:
Dict[str, str]
- normalize(text: str, strategy: NormalizationStrategies, **kwargs) str [source]
Normalize text by replacing homoglyphs with their standard characters.
This method applies the specified normalization strategy to convert text containing homoglyphs back to standard characters. Different strategies consider different aspects like dominant script, block, local context, tokenization, or language model.
- Parameters:
text (str) – Text to normalize.
strategy (NormalizationStrategies) – The normalization strategy to apply.
**kwargs – Additional arguments passed to the specific strategy implementation.
- Returns:
Normalized text with homoglyphs replaced by standard characters.
- Return type:
str
- Raises:
NotImplementedError – If the specified strategy is unknown.
- class silverspeak.homoglyphs.NormalizationStrategies(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Enumeration of text normalization strategies for homoglyph replacement.
- Values:
DOMINANT_SCRIPT: Normalize based on the dominant Unicode script in the text. DOMINANT_SCRIPT_AND_BLOCK: Normalize based on both dominant script and Unicode block. LOCAL_CONTEXT: Normalize based on surrounding character context. TOKENIZATION: Normalize based on tokenization of the text. LANGUAGE_MODEL: Normalize using a masked language model to determine the most likely characters. LLM_PROMPT: Normalize using a generative language model prompted to fix homoglyphs. SPELL_CHECK: Normalize using spelling correction algorithms with multilingual support. NGRAM: Normalize using character n-gram frequency analysis. OCR_CONFIDENCE: Normalize using OCR confidence scores or confusion matrices. GRAPH_BASED: Normalize using a graph-based character similarity network.
- DOMINANT_SCRIPT = 'dominant_script'
- DOMINANT_SCRIPT_AND_BLOCK = 'dominant_script_and_block'
- GRAPH_BASED = 'graph_based'
- LANGUAGE_MODEL = 'language_model'
- LLM_PROMPT = 'llm_prompt'
- LOCAL_CONTEXT = 'local_context'
- NGRAM = 'ngram'
- OCR_CONFIDENCE = 'ocr_confidence'
- SPELL_CHECK = 'spell_check'
- TOKENIZATION = 'tokenization'
- class silverspeak.homoglyphs.TypesOfHomoglyphs(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Enumeration of the different types of homoglyphs supported by SilverSpeak.
- Values:
- IDENTICAL: Characters that are visually identical in most fonts but have
different Unicode code points.
CONFUSABLES: Characters identified as confusables by Unicode. OCR: Characters that might be confused in OCR systems. OCR_REFINED: A refined subset of OCR confusables with high visual similarity.
- CONFUSABLES = 'confusables'
- IDENTICAL = 'identical'
- OCR = 'ocr'
- OCR_REFINED = 'ocr_refined'
- silverspeak.homoglyphs.greedy_attack(text: str, percentage: float = 0.1, random_seed: int = 42, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, same_script: bool = False, same_block: bool = False) str [source]
Replace characters with homoglyphs using a greedy approach.
This function replaces characters in the input text with homoglyphs (visually similar characters with different Unicode code points) using a greedy approach. The function will attempt to replace every eligible character up to the specified percentage limit.
- Parameters:
text (str) – The input text to transform.
percentage (float, optional) – The percentage of characters to replace (0.0-1.0). Defaults to 0.1 (10%).
random_seed (int, optional) – Seed for the random number generator to ensure reproducible results. Defaults to 42.
unicode_categories_to_replace (Set[str], optional) – Unicode categories of characters to consider for replacement. Defaults to predefined categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs], optional) – Types of homoglyphs to use for replacements. Defaults to predefined types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. Defaults to False.
same_script (bool, optional) – Whether to only use homoglyphs from the same Unicode script as the dominant script in the text. Defaults to False.
same_block (bool, optional) – Whether to only use homoglyphs from the same Unicode block as the dominant block in the text. Defaults to False.
- Returns:
The transformed text with homoglyph replacements.
- Return type:
str
- Raises:
ValueError – If the text is None or empty, or if percentage is out of range.
Example
```python # Replace 5% of characters with homoglyphs modified_text = greedy_attack(“Hello world”, percentage=0.05, random_seed=42)
# Replace 10% of characters with homoglyphs, prioritizing replacements modified_text = greedy_attack(“Hello world”, percentage=0.1, replace_with_priority=True) ```
- silverspeak.homoglyphs.normalize_text(text: str, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, strategy: NormalizationStrategies = NormalizationStrategies.LOCAL_CONTEXT, **kwargs) str [source]
Normalize text by replacing homoglyphs with their standard equivalents.
This function provides a convenient interface to the HomoglyphReplacer’s normalize method, creating a temporary HomoglyphReplacer instance with the specified parameters.
- Parameters:
text (str) – The text to normalize.
unicode_categories_to_replace (Set[str]) – Unicode categories to replace. Defaults to predefined common categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs]) – Types of homoglyphs to consider. Defaults to predefined common homoglyph types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. When True, replacements are chosen by order in the homoglyph lists. When False, replacements are chosen based on context or other strategies. Defaults to False.
strategy (NormalizationStrategies) – The normalization strategy to use. Defaults to LOCAL_CONTEXT, which selects replacements based on surrounding characters.
- Returns:
The normalized text with homoglyphs replaced.
- Return type:
str
- Raises:
ValueError – If the text is None or invalid parameters are provided.
NotImplementedError – If an unsupported normalization strategy is specified.
Example
```python # Normalize text using the default local context strategy normalized_text = normalize_text(“Hеllo wоrld”) # Contains Cyrillic ‘е’ and ‘о’
# Normalize text using dominant script strategy from silverspeak.homoglyphs.utils import NormalizationStrategies normalized_text = normalize_text(
“Hеllo wоrld”, strategy=NormalizationStrategies.DOMINANT_SCRIPT
)
- silverspeak.homoglyphs.random_attack(text: str, percentage: float = 0.1, random_seed: int | None = None, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False, same_script: bool = False, same_block: bool = False) str [source]
Replace random characters in text with visually similar homoglyphs.
This function replaces a specified percentage of characters in the input text with homoglyphs (visually similar characters with different Unicode code points). The replacements are chosen randomly, and the function can be configured to maintain script and block consistency.
- Parameters:
text (str) – The input text to transform.
percentage (float, optional) – The percentage of characters to replace (0.0-1.0). Defaults to 0.1 (10%).
random_seed (Optional[int], optional) – Seed for the random number generator to ensure reproducible results. Defaults to None (non-reproducible).
unicode_categories_to_replace (Set[str]) – Unicode categories to consider for replacement. Defaults to predefined categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs], optional) – Types of homoglyphs to use for replacements. Defaults to predefined types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. Defaults to False.
same_script (bool, optional) – Whether to only use homoglyphs from the same Unicode script as the dominant script in the text. Defaults to False.
same_block (bool, optional) – Whether to only use homoglyphs from the same Unicode block as the dominant block in the text. Defaults to False.
- Returns:
The transformed text with homoglyph replacements.
- Return type:
str
- Raises:
ValueError – If the text is None or empty, or if percentage is out of range.
Example
```python # Replace 5% of characters with homoglyphs modified_text = random_attack(“Hello world”, percentage=0.05, random_seed=42)
# Replace 10% of characters with homoglyphs from the same script modified_text = random_attack(“Hello world”, percentage=0.1, same_script=True) ```
- silverspeak.homoglyphs.targeted_attack(text: str, percentage: float = 0.1, random_seed: int | None = None, unicode_categories_to_replace: Set[str] = {'Ll', 'Lm', 'Lo', 'Lt', 'Lu'}, types_of_homoglyphs_to_use: List[TypesOfHomoglyphs] = [TypesOfHomoglyphs.IDENTICAL, TypesOfHomoglyphs.CONFUSABLES, TypesOfHomoglyphs.OCR_REFINED], replace_with_priority: bool = False) str [source]
Replace a percentage of characters in text with property-matched homoglyphs.
This function replaces a specified percentage of characters in the input text with homoglyphs (visually similar characters with different Unicode code points). The function selects homoglyphs based on matching the Unicode properties of the original character being replaced, without considering surrounding context. The implementation selects the highest scoring homoglyph replacements based on property matching.
- Parameters:
text (str) – The input text to transform.
percentage (float, optional) – The percentage of characters to replace (0.0-1.0). Defaults to 0.1 (10%).
random_seed (Optional[int], optional) – Seed for the random number generator to ensure reproducible results. Defaults to None (non-reproducible).
unicode_categories_to_replace (Set[str]) – Unicode categories to consider for replacement. Defaults to predefined categories.
types_of_homoglyphs_to_use (List[TypesOfHomoglyphs], optional) – Types of homoglyphs to use for replacements. Defaults to predefined types.
replace_with_priority (bool, optional) – Whether to replace characters based on priority. Defaults to False.
- Returns:
The transformed text with property-matched homoglyph replacements.
- Return type:
str
- Raises:
ValueError – If the text is None or empty, or if percentage is out of range.
Example
```python # Replace 5% of characters with property-matched homoglyphs modified_text = targeted_attack(
“Hello world”, percentage=0.05, random_seed=42
)
# Replace 10% of characters with property-matched homoglyphs modified_text = targeted_attack(
“Hello world”, percentage=0.1
)