🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "regex"

Found 50 matching component(s)

  • function clean_text

    Cleans and normalizes text content by removing HTML tags, normalizing whitespace, and stripping markdown formatting elements.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    text-processing text-cleaning normalization html-removal markdown-removal
  • function extract_warranty_data_improved

    Parses markdown-formatted warranty documentation to extract structured warranty data including IDs, titles, sections, disclosure text, and reference citations.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    markdown-parsing text-extraction warranty-processing document-parsing regex
  • function parse_references_section

    Parses a formatted references section string and extracts structured data including reference numbers, sources, and content previews using regular expressions.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    parsing text-processing references citations regex
  • function test_fixes

    A comprehensive test function that validates email template rendering and CDocs application link presence in a document management system's email notification templates.

    File: /tf/active/vicechatdev/test_comprehensive_fixes.py

    testing email-templates template-rendering validation document-management
  • function test_markdown_link_parsing

    A test function that validates markdown link parsing capabilities, specifically testing extraction and URL encoding of complex URLs containing special characters from Quill editor format.

    File: /tf/active/vicechatdev/test_complex_hyperlink.py

    testing markdown url-parsing regex url-encoding
  • function extract_warranty_data

    Parses markdown-formatted warranty documentation to extract structured warranty information including IDs, titles, sections, source document counts, warranty text, and disclosure content.

    File: /tf/active/vicechatdev/convert_disclosures_to_table.py

    markdown-parsing data-extraction warranty-processing text-processing regex
  • function format_inline_references

    Formats inline citation references (e.g., [1], [2]) in a Word document paragraph by applying italic styling to them while preserving the rest of the text.

    File: /tf/active/vicechatdev/enhanced_word_converter_fixed.py

    document-formatting word-processing python-docx text-formatting citations
  • class ReferenceManager_v2

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    class referencemanager
  • class FixedProjectVictoriaGenerator

    Fixed Project Victoria Disclosure Generator that properly handles all warranty sections.

    File: /tf/active/vicechatdev/fixed_project_victoria_generator.py

    class fixedprojectvictoriagenerator
  • class ReferenceManager_v3

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

    class referencemanager
  • class ImprovedProjectVictoriaGenerator

    Improved Project Victoria Disclosure Generator with proper reference management.

    File: /tf/active/vicechatdev/improved_project_victoria_generator.py

    class improvedprojectvictoriagenerator
  • class ReferenceManager_v4

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class referencemanager
  • class ProjectVictoriaDisclosureGenerator

    Main class for generating Project Victoria disclosures from warranty claims.

    File: /tf/active/vicechatdev/project_victoria_disclosure_generator.py

    class projectvictoriadisclosuregenerator
  • function test_attendee_extraction_comprehensive

    A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

    File: /tf/active/vicechatdev/leexi/test_attendee_comprehensive.py

    testing attendee-extraction meeting-minutes transcript-parsing speaker-identification
  • function parse_log_line

    Parses a structured log line string and extracts timestamp, logger name, log level, and message components into a dictionary.

    File: /tf/active/vicechatdev/SPFCsync/monitor.py

    logging parsing regex text-processing log-analysis
  • function validate_azure_client_id

    Validates that an Azure client ID string conforms to the standard GUID format (8-4-4-4-12 hexadecimal pattern) and is not a placeholder value.

    File: /tf/active/vicechatdev/SPFCsync/validate_config.py

    validation azure authentication guid uuid
  • function process_markdown_content

    Parses markdown-formatted text content and converts it into a structured list of content elements with type annotations and formatting metadata suitable for document export.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    markdown parser document-processing text-processing content-conversion
  • function add_inline_formatting_to_paragraph

    Parses markdown-formatted text and applies inline formatting (bold, italic, code) to a Microsoft Word paragraph object using the python-docx library.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    markdown word-document text-formatting docx inline-formatting
  • function clean_html_tags

    Removes HTML tags and entities from text strings, returning clean plain text suitable for PDF display or other formatted output.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    html text-processing sanitization string-manipulation pdf-generation
  • function format_inline_markdown

    Converts inline Markdown syntax (bold, italic, code, links) to HTML tags while escaping HTML entities for safe rendering.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    markdown html text-formatting conversion inline-formatting
  • function html_to_markdown

    Converts HTML text back to Markdown format using regex-based pattern matching and replacement, handling headers, code blocks, formatting, links, lists, and HTML entities.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    html markdown conversion text-processing regex
  • function convert_markdown_to_html

    Converts basic markdown formatting (bold, italic, code) to HTML markup suitable for PDF generation using ReportLab.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    markdown html conversion pdf-generation text-formatting
  • function convert_european_decimals

    Detects and converts numeric data with European decimal format (comma as decimal separator) to standard format (dot as decimal separator) in a pandas DataFrame, handling mixed formats and missing data patterns.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    data-processing data-cleaning decimal-conversion european-format locale-handling
  • function validate_sheet_format

    Analyzes Excel sheet structure using multiple heuristics to classify it as tabular data, information sheet, or mixed format, returning quality metrics and extraction recommendations.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    data-validation excel-processing sheet-classification data-quality heuristic-analysis
  • function html_to_markdown_v1

    Converts HTML markup to Markdown syntax, handling headers, code blocks, text formatting, links, lists, and paragraphs with proper spacing.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    html markdown conversion text-processing formatting
  • function clean_html_tags_v1

    Removes all HTML tags from a given text string using regular expression pattern matching, returning clean text without markup.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    html text-processing sanitization regex string-manipulation
  • function add_inline_formatting_to_paragraph_v1

    Parses markdown-formatted text and adds it to a Word document paragraph, converting markdown links [text](url) into clickable hyperlinks while delegating other markdown formatting to a helper function.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    markdown word-document docx hyperlink text-formatting
  • function add_markdown_formatting_to_paragraph

    Parses markdown-formatted text and applies corresponding formatting (bold, italic, code) to runs within a python-docx paragraph object.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    markdown formatting docx word-document text-processing
  • function convert_markdown_to_html_v1

    Converts basic Markdown syntax to HTML markup compatible with ReportLab PDF generation, including support for clickable links, bold, italic, and inline code formatting.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    markdown html conversion text-formatting reportlab
  • function initialize_document_counters

    Initializes document counters in Neo4j by analyzing existing ControlledDocument nodes and creating DocumentCounter nodes with values higher than the maximum existing document numbers for each department/type combination.

    File: /tf/active/vicechatdev/CDocs/db/schema_manager.py

    neo4j database-initialization document-management counter-initialization graph-database
  • function validate_document_number

    Validates a custom document number by checking its format, length constraints, and uniqueness in the database, returning a dictionary with validation results.

    File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

    validation document-management database-query uniqueness-check format-validation
  • function html_to_text

    Converts HTML content to plain text by removing HTML tags, decoding common HTML entities, and normalizing whitespace.

    File: /tf/active/vicechatdev/CDocs/utils/notifications.py

    html text-conversion html-parsing text-extraction html-entities
  • class TwoPassSqlWorkflow

    Two-pass SQL generation workflow with iteration and error correction

    File: /tf/active/vicechatdev/full_smartstat/two_pass_sql_workflow.py

    class twopasssqlworkflow
  • class DataProcessor

    Handles data loading, validation, and preprocessing

    File: /tf/active/vicechatdev/full_smartstat/data_processor.py

    class dataprocessor
  • class VendorEnricher

    A class that enriches vendor information by finding official email addresses and VAT numbers using RAG (Retrieval-Augmented Generation) with ChromaDB document search and web search capabilities.

    File: /tf/active/vicechatdev/find_email/vendor_enrichment.py

    vendor-enrichment data-enrichment RAG web-search ChromaDB
  • class VendorEmailExtractor

    Extract vendor email addresses from all organizational mailboxes

    File: /tf/active/vicechatdev/find_email/vendor_email_extractor.py

    class vendoremailextractor
  • class DataProcessor_v1

    Handles data loading, validation, and preprocessing

    File: /tf/active/vicechatdev/smartstat/data_processor.py

    class dataprocessor
  • class DocumentProcessor_v3

    A comprehensive PDF document processor that handles text extraction, OCR (Optical Character Recognition), layout analysis, table detection, and metadata extraction from PDF files.

    File: /tf/active/vicechatdev/invoice_extraction/core/document_processor.py

    pdf-processing ocr text-extraction document-processing invoice-processing
  • class EntityClassifier

    Classifies which ViceBio entity (UK, Belgium, or Australia) an invoice is addressed to using rule-based pattern matching and LLM fallback.

    File: /tf/active/vicechatdev/invoice_extraction/core/entity_classifier.py

    classification entity-recognition invoice-processing pattern-matching regex
  • class AUValidator

    Australia-specific invoice data validator that extends BaseValidator to implement validation rules for Australian invoices including ABN validation, GST calculations, and Australian tax invoice requirements.

    File: /tf/active/vicechatdev/invoice_extraction/validators/au_validator.py

    validation invoice australia abn gst
  • class BEValidator

    Belgium-specific invoice data validator that extends BaseValidator to implement Belgian invoice validation rules including VAT number format, address verification, IBAN validation, and legal requirements.

    File: /tf/active/vicechatdev/invoice_extraction/validators/be_validator.py

    validation invoice belgium vat iban
  • class UKValidator

    UK-specific invoice data validator that extends BaseValidator to implement validation rules specific to UK invoices including VAT number format, UK addresses, VAT rates, and banking details.

    File: /tf/active/vicechatdev/invoice_extraction/validators/uk_validator.py

    validation invoice UK VAT tax
  • class BEExtractor

    Belgium-specific invoice data extractor that uses LLM (Large Language Model) to extract structured invoice data from Belgian invoices in multiple languages (English, French, Dutch).

    File: /tf/active/vicechatdev/invoice_extraction/extractors/be_extractor.py

    invoice-extraction belgium llm ocr document-processing
  • class ReferenceManager_v5

    Manages extraction and formatting of references for LLM chat responses. Handles both file references and BibTeX citations, formatting them according to various academic citation styles.

    File: /tf/active/vicechatdev/datacapture_backup_16072025/OneCo_hybrid_RAG.py

    class referencemanager
  • function extract_document_code_v1

    Extracts a structured document code (e.g., 2.13.4.3.3.2) from a filename using regex pattern matching.

    File: /tf/active/vicechatdev/mailsearch/enhanced_document_comparison.py

    regex pattern-matching document-management filename-parsing code-extraction
  • function has_wuxi_coding_v1

    Validates whether a filename starts with a Wuxi coding pattern, which consists of numbers separated by dots (e.g., '2.13.4.1.2').

    File: /tf/active/vicechatdev/mailsearch/upload_non_wuxi_coded.py

    validation filename pattern-matching regex string-processing
  • function has_wuxi_coding

    Validates whether a filename starts with a Wuxi coding pattern consisting of dot-separated numeric segments (e.g., '2.13.4.1.2').

    File: /tf/active/vicechatdev/mailsearch/copy_signed_documents.py

    validation filename pattern-matching regex wuxi-coding
  • function extract_document_code

    Extracts a structured document code (e.g., '4.5.38.2') from a filename using regex pattern matching.

    File: /tf/active/vicechatdev/mailsearch/compare_documents.py

    document-management filename-parsing regex pattern-matching code-extraction
  • class Zebra_Label

    Used to build a ZPL2 label and send a print commandline to a specified printer on a specified server. Notes ----- all dimensions are given in millimeters and automatically converted to printer dot units. Parameters ---------- height : int height of the label in mm. width : int width of the label in mm. dpmm : int dots per millimeter, 8.0 for 203 dpi, 12 for 300 dpi. server : str The server on which to print. queue : str The printer to print on. Attributes ---------- height : int height of the label in mm. width : int width of the label in mm. dpmm : int dots per millimeter, 8.0 for 203 dpi, 12 for 300 dpi. server : str The server on which to print. queue : str The printer to print on. code : str ZPL code string, each function adds to this and this is what is eventually send to the printer. graph : py2neo.Graph The connection to the database

    File: /tf/active/vicechatdev/resources/printers.py

    class zebra_label
  • class SessionDetector

    Detects session information (conversation ID and exchange number) from PDF files using multiple detection methods including metadata, filename, footer, and content analysis.

    File: /tf/active/vicechatdev/e-ink-llm/session_detector.py

    pdf-processing session-detection conversation-tracking metadata-extraction pattern-matching

Search Examples