🔍 Code Extractor

Browse Components

Showing 20 of 2143 components

  • function test_similarity_threshold_effect

    A pytest test function that validates the behavior of SimilarityCleaner with different similarity threshold values, ensuring that higher thresholds retain more texts while lower thresholds are more aggressive in removing similar content.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py | Lines: 41-59

    testing pytest text-deduplication similarity-detection data-cleaning
  • function test_single_text_input

    A pytest test function that verifies the SimilarityCleaner correctly handles a single text document by returning it unchanged.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py | Lines: 36-39

    testing unit-test pytest text-processing similarity
  • function test_empty_input

    A pytest test function that verifies the SimilarityCleaner correctly handles empty input by returning an empty list.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py | Lines: 31-34

    testing unit-test pytest edge-case empty-input
  • function test_nearly_similar_text_handling

    A pytest test function that verifies the SimilarityCleaner's ability to identify and remove nearly similar text entries while preserving distinct ones.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py | Lines: 20-29

    testing pytest text-processing similarity-detection deduplication
  • function test_identical_text_removal

    A pytest test function that verifies the SimilarityCleaner's ability to remove identical duplicate text entries from a list while preserving unique documents.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py | Lines: 9-18

    testing pytest unit-test deduplication text-processing
  • function setup_similarity_cleaner

    A pytest fixture that creates and returns a configured SimilarityCleaner instance with a threshold of 0.8 for use in test cases.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py | Lines: 5-7

    pytest fixture testing similarity data-cleaning
  • function download_model

    Downloads a model file from a specified URL and saves it to a local file path using HTTP GET request.

    File: /tf/active/vicechatdev/chromadb-cleanup/scripts/download_model.py | Lines: 4-11

    download http file-io model-management network
  • function save_data_to_chromadb

    Saves a list of document dictionaries to a ChromaDB vector database collection, optionally including embeddings and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py | Lines: 109-167

    chromadb vector-database document-storage embeddings persistence
  • function save_data_to_chromadb_v1

    Saves a list of document dictionaries to a ChromaDB collection, with support for batch processing, embeddings, and metadata storage.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py | Lines: 168-239

    chromadb vector-database document-storage embeddings batch-processing
  • function load_data_from_chromadb_v1

    Retrieves all documents from a specified ChromaDB collection, including their IDs, text content, embeddings, and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py | Lines: 69-107

    chromadb database document-retrieval vector-database embeddings
  • function load_data_from_chromadb

    Connects to a ChromaDB instance and retrieves all documents from a specified collection, returning them as a list of dictionaries with document IDs, text content, embeddings, and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py | Lines: 123-165

    chromadb vector-database data-loading document-retrieval embeddings
  • function clean_collection

    Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py | Lines: 71-120

    data-cleaning deduplication chromadb vector-database similarity-detection
  • class DocumentProtector

    A class that handles protecting PDF documents from editing by applying encryption and permission restrictions using pikepdf and PyMuPDF libraries.

    File: /tf/active/vicechatdev/document_auditor/src/security/document_protection.py | Lines: 9-118

    pdf security encryption document-protection permissions
  • class SignatureManager

    A class that manages digital signature images for documents, providing functionality to store, retrieve, and list signature files in a designated directory.

    File: /tf/active/vicechatdev/document_auditor/src/security/signature_manager.py | Lines: 6-141

    signature-management document-processing file-management image-processing digital-signatures
  • class Watermarker

    A class that adds watermark images to PDF documents with configurable opacity, scale, and positioning options.

    File: /tf/active/vicechatdev/document_auditor/src/security/watermark.py | Lines: 8-178

    pdf watermark document-processing image-processing pdf-manipulation
  • class HashGenerator

    A class that provides cryptographic hashing functionality for PDF documents, including hash generation, embedding, and verification for document integrity checking.

    File: /tf/active/vicechatdev/document_auditor/src/security/hash_generator.py | Lines: 11-215

    cryptography hashing SHA-256 PDF document-integrity
  • class SignatureGenerator

    A class that generates signature-like images from text names using italic fonts and decorative flourishes.

    File: /tf/active/vicechatdev/document_auditor/src/utils/signature_generator.py | Lines: 10-134

    image-generation signature PIL graphics text-rendering
  • class PDFAConverter

    A class that converts PDF files to PDF/A format for long-term archiving and compliance, supporting multiple compliance levels (1b, 2b, 3b) with fallback conversion methods.

    File: /tf/active/vicechatdev/document_auditor/src/utils/pdf_utils.py | Lines: 8-145

    pdf pdf-a document-conversion archiving compliance
  • class AuditPageGenerator

    A class that generates comprehensive PDF audit trail pages for documents, including document information, reviews, approvals, revision history, and event history with electronic signatures.

    File: /tf/active/vicechatdev/document_auditor/src/audit_page_generator.py | Lines: 55-434

    pdf-generation audit-trail document-management compliance electronic-signature
  • class DocumentMerger

    A class that merges PDF documents with audit trail pages, combining an original PDF with an audit page and updating metadata to reflect the audit process.

    File: /tf/active/vicechatdev/document_auditor/src/document_merger.py | Lines: 5-72

    pdf document-processing merge audit-trail file-operations