🔍 Code Extractor

function smartstat_run_analysis

Maturity: 54

Flask API endpoint that initiates a SmartStat statistical analysis in a background thread, tracking progress and persisting results to a data section.

File:
/tf/active/vicechatdev/vice_ai/new_app.py
Lines:
5272 - 5380
Complexity:
complex

Purpose

This endpoint handles POST requests to start statistical analysis on uploaded datasets. It validates user permissions, recovers sessions if needed, spawns a background thread to run the analysis asynchronously, and returns a job ID for progress tracking. The analysis uses LLM models to generate and execute statistical code based on natural language queries, with optional context from previous analyses and interpretation templates.

Source Code

def smartstat_run_analysis(session_id):
    """Start SmartStat analysis in background thread"""
    user_email = get_current_user()
    data = request.get_json()
    
    # Verify session exists - recreate if needed
    session = smartstat_service.get_session(session_id)
    if not session:
        logger.warning(f"Session {session_id} not found - attempting to recover")
        all_sections = data_section_service.get_user_data_sections(user_email)
        data_section = next((ds for ds in all_sections if ds.analysis_session_id == session_id), None)
        
        if data_section:
            session = SmartStatSession(session_id, data_section.id, data_section.title)
            smartstat_service.sessions[session_id] = session
            # Reload CSV data if it exists
            if data_section.csv_data:
                import pandas as pd
                import json
                df = pd.read_json(json.dumps(json.loads(data_section.csv_data)))
                session.dataframe = df
        else:
            return jsonify({'error': 'Session not found'}), 404
    
    # Verify data section ownership
    data_section = data_section_service.get_data_section(session.data_section_id)
    if not data_section or data_section.owner != user_email:
        return jsonify({'error': 'Access denied'}), 403
    
    user_query = data.get('query', '')
    model = data.get('model', 'gpt-4o')
    include_previous_context = data.get('include_previous_context', False)
    interpretation_template_id = data.get('interpretation_template_id')  # New parameter
    
    if not user_query:
        return jsonify({'error': 'Query is required'}), 400
    
    # Generate unique job ID for this analysis
    import uuid
    job_id = str(uuid.uuid4())
    
    # Initialize progress tracking
    smartstat_progress[job_id] = {
        'status': 'running',
        'progress': 0,
        'message': 'Starting analysis...',
        'session_id': session_id
    }
    
    # Run analysis in background thread
    def run_analysis_background():
        try:
            smartstat_progress[job_id]['progress'] = 10
            smartstat_progress[job_id]['message'] = 'Generating analysis script...'
            
            result = smartstat_service.run_analysis(
                session_id, 
                user_query, 
                model,
                include_previous_context=include_previous_context,
                interpretation_template_id=interpretation_template_id  # Pass template ID
            )
            
            if result.get('success'):
                smartstat_progress[job_id]['status'] = 'completed'
                smartstat_progress[job_id]['progress'] = 100
                smartstat_progress[job_id]['message'] = 'Analysis completed'
                smartstat_progress[job_id]['result'] = result
                
                # Save analysis history to data section metadata for persistence
                try:
                    session = smartstat_service.get_session(smartstat_progress[job_id]['session_id'])
                    if session:
                        data_section = data_section_service.get_data_section(session.data_section_id)
                        if data_section:
                            if not data_section.metadata:
                                data_section.metadata = {}
                            data_section.metadata['analysis_history'] = session.analysis_history
                            data_section_service.update_data_section(data_section)
                except Exception as e:
                    logger.error(f"Error saving history to metadata: {e}")
            else:
                smartstat_progress[job_id]['status'] = 'failed'
                smartstat_progress[job_id]['progress'] = 0
                smartstat_progress[job_id]['message'] = 'Analysis failed'
                smartstat_progress[job_id]['error'] = result.get('error', 'Unknown error')
                smartstat_progress[job_id]['result'] = result
                
        except Exception as e:
            logger.error(f"Background analysis error: {e}")
            import traceback
            traceback.print_exc()
            smartstat_progress[job_id]['status'] = 'failed'
            smartstat_progress[job_id]['progress'] = 0
            smartstat_progress[job_id]['message'] = 'Analysis failed'
            smartstat_progress[job_id]['error'] = str(e)
    
    # Start background thread
    import threading
    thread = threading.Thread(target=run_analysis_background)
    thread.daemon = True
    thread.start()
    
    # Return job ID immediately
    return jsonify({
        'success': True,
        'job_id': job_id,
        'message': 'Analysis started in background'
    })

Parameters

Name Type Default Kind
session_id - - positional_or_keyword

Parameter Details

session_id: String identifier for the SmartStat session, passed as a URL path parameter. Links to a specific data section and its associated dataframe for analysis.

request.json.query: Natural language query describing the statistical analysis to perform (required). Example: 'Calculate correlation between age and income'

request.json.model: LLM model to use for generating analysis code. Defaults to 'gpt-4o'. Other options may include 'gpt-3.5-turbo', 'claude-3', etc.

request.json.include_previous_context: Boolean flag indicating whether to include previous analysis history as context for the current query. Defaults to False.

request.json.interpretation_template_id: Optional identifier for a predefined interpretation template to structure the analysis output

Return Value

Returns a JSON response with Flask jsonify. On success: {'success': True, 'job_id': '<uuid>', 'message': 'Analysis started in background'}. On error: {'error': '<error_message>'} with HTTP status codes 400 (missing query), 403 (access denied), or 404 (session not found). The job_id can be used to poll for analysis progress and results.

Dependencies

  • flask
  • pandas
  • uuid
  • threading
  • json
  • logging

Required Imports

from flask import request, jsonify
import uuid
import threading
import pandas as pd
import json
import logging

Conditional/Optional Imports

These imports are only needed under specific conditions:

import pandas as pd

Condition: only when recovering a session and reloading CSV data from data_section.csv_data

Required (conditional)
import json

Condition: only when recovering a session and parsing stored CSV data

Required (conditional)

Usage Example

# Client-side usage example
import requests

# Assuming user is authenticated and has a session_id
session_id = 'abc-123-def-456'
api_url = f'https://your-app.com/api/smartstat/{session_id}/analyze'

# Prepare analysis request
payload = {
    'query': 'Calculate mean and standard deviation for all numeric columns',
    'model': 'gpt-4o',
    'include_previous_context': False,
    'interpretation_template_id': 'template-001'
}

headers = {
    'Authorization': 'Bearer YOUR_AUTH_TOKEN',
    'Content-Type': 'application/json'
}

# Start analysis
response = requests.post(api_url, json=payload, headers=headers)
result = response.json()

if result.get('success'):
    job_id = result['job_id']
    print(f'Analysis started with job ID: {job_id}')
    
    # Poll for progress (separate endpoint needed)
    # GET /api/smartstat/progress/{job_id}
else:
    print(f'Error: {result.get("error")}')

Best Practices

  • Always check the returned job_id and implement polling mechanism to track analysis progress
  • Handle all three error status codes (400, 403, 404) appropriately in client code
  • The background thread is daemon=True, so it will terminate if the main process exits
  • Session recovery logic attempts to recreate sessions from data_section metadata if not found in memory
  • Analysis history is persisted to data_section.metadata for durability across server restarts
  • The function uses global smartstat_progress dictionary for tracking - ensure thread-safe access in production
  • User ownership is verified before allowing analysis to prevent unauthorized access
  • The query parameter is required and should be validated on client side before submission
  • Consider implementing timeout mechanisms for long-running analyses
  • Monitor the smartstat_progress dictionary size to prevent memory leaks from abandoned jobs
  • The background thread catches exceptions but logs them - implement proper error monitoring
  • Template IDs should be validated against available templates before submission

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function smartstat_get_progress 78.0% similar

    Flask API endpoint that retrieves the progress status of a SmartStat analysis job by job_id, returning progress data and completed results if available.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function smartstat_get_history 74.2% similar

    Flask API endpoint that retrieves analysis history for a SmartStat session, with automatic session recovery from saved data if the session is not found in memory.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function analyze_data 72.8% similar

    Flask route handler that initiates an asynchronous data analysis process based on user query, creating a background thread to perform the analysis and returning an analysis ID for progress tracking.

    From: /tf/active/vicechatdev/full_smartstat/app.py
  • function smartstat_save_to_document 71.4% similar

    Flask route handler that saves SmartStat statistical analysis results back to a data section document, generating a final report with queries, results, and plots.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
  • function smartstat_download_log 71.3% similar

    Flask API endpoint that generates and downloads an execution log file containing the analysis history and debug information for a SmartStat session.

    From: /tf/active/vicechatdev/vice_ai/new_app.py
← Back to Browse