Data Connectors & Privacy Activity Logging¶
Automated privacy-preserving data processing activity logging for Google Drive and other cloud storage platforms with built-in compliance monitoring and data subject rights management.
Integration Documentation
Setup: Integration Overview • API Authentication • Webhook Configuration
Setup Guide API Documentation Troubleshooting
Key Capabilities
- Privacy-preserving activity logging across cloud storage platforms
- Compliance audit trails with GDPR Article 30 processing activity records
- Real-time monitoring with privacy risk assessment and alert notifications
- Data subject rights fulfillment with automated activity discovery and logging
Privacy-First Data Discovery Ecosystem¶
Enterprise-grade data processor platform providing comprehensive privacy compliance logging and audit trail management:
-
Google Drive Privacy Logging
Privacy-preserving Google Drive activity logging with automated compliance audit trails for data processing transparency
Key Capabilities: - Automated file discovery activity logging (no file content stored) - Privacy-preserving metadata collection for compliance auditing - Change detection and monitoring for processing activity transparency - Integration with Dxtra compliance system for audit trails - Support for GDPR Article 30 processing activity requirements
-
Privacy-Preserving PII Detection
Privacy-first personal data detection using Microsoft Presidio analyzer with metadata-only logging for compliance audit trails
Detection Capabilities: - High accuracy across multiple personal data categories - Privacy-preserving activity logging (no actual PII content stored) - Contextual analysis with confidence scoring for audit purposes - Extensible pattern recognition for custom data types - Built-in support for GDPR and privacy regulation audit requirements
-
:material-shopify: E-commerce Platform Integration
Deep e-commerce integration with specialized connectors for Shopify, WooCommerce, Magento, and other leading platforms
E-commerce Capabilities: - Complete customer journey and behavioral data processing activity logging - Automated order history and payment data privacy compliance audit trails - Product review and rating data management with GDPR processing activity records - Marketing automation and email campaign privacy activity integration - Multi-store and multi-currency privacy compliance processing activity management
-
Custom Integration Framework
Flexible privacy activity logging platform enabling compliance audit trail integration with any data source
Framework Benefits: - Pre-built processing activity templates for rapid compliance integration - SDK support for Python, Node.js, Java, and .NET - Real-time activity logging and batch processing capabilities - Built-in privacy compliance validation and audit trail generation - Enterprise-grade security and access control for processing activities
Privacy Activity Architecture¶
flowchart TD
A[Data Processors] --> B[Dxtra Activity Loggers]
B --> C[Privacy Activity Detection]
C --> D[Processing Activity Logging]
D --> E[Compliance Audit Trails]
D --> F[Data Subject Rights Support]
subgraph "Data Processors"
G[Google Drive]
H[File Systems]
I[Shopify Store]
J[Third-party APIs]
end
subgraph "Privacy Logging"
K[Activity Detection]
L[Compliance Assessment]
M[Article 30 Records]
end Privacy-Preserving PII Detection¶
Dxtra's privacy-first PII detection system creates processing activity logs across various file types without storing actual content:
Privacy-Preserving Approach
- No content storage: Only processing activity metadata is logged
- No PII content: Actual personal data is never stored or transmitted
- Audit trails only: Creates compliance records for data processing activities
- GDPR Article 30: Automatic processing activity record generation
Privacy-First Detection Engine¶
Compliance-focused processing activity logging with privacy-by-design principles:
Privacy-preserving document processing activity logging without storing actual document content or personal data.
Processing Activity Logging:
# Privacy-preserving activity logging (no content stored)
class PrivacyActivityClient:
async def log_pii_scan_activity(
self,
data_controller_id: str,
file_identifier: str, # Hashed identifier, not filename
pii_metadata: PiiDetectionMetadata # Aggregate metadata only
) -> Optional[DataProcessingActivity]:
"""
Log PII scanning activity in privacy-preserving way.
Creates data_processing_activities record with:
- Data controller ID (tenant)
- Source ID (Google Drive = 2)
- Type ID (PII Scan = 2)
- Field IDs (mapped from entity types found)
- No actual PII content or file content
"""
What Gets Logged (Privacy-Preserving):
Compliance audit trail generation for personal data processing activities without exposing sensitive information:
| Processing Activity Type | Field ID Mapping | Privacy Protection |
|---|---|---|
| Contact Information Processing | Field IDs 1, 2, 3 | No actual contact info stored |
| Financial Data Processing | Field IDs 4, 5, 6 | No financial data stored |
| Health Information Processing | Field IDs 7, 8, 9 | No health data stored |
| Government ID Processing | Field IDs 10, 11, 12 | No ID numbers stored |
Privacy-Preserving Implementation:
def _map_pii_types_to_field_ids(self, entity_types: List[str]) -> List[int]:
"""Map PII entity types to field IDs (privacy-preserving)."""
field_mapping = {
"EMAIL": 1, # Email processing activity
"PHONE": 2, # Phone processing activity
"SSN": 3, # SSN processing activity
"CREDIT_CARD": 4 # Financial processing activity
}
# Returns field IDs for processing activity logging only
return [field_mapping.get(entity_type, 99) for entity_type in entity_types]
Google Drive Privacy Logging¶
Privacy-first Google Drive integration that creates compliance audit trails without storing file content or personal data.
Privacy-Preserving File Discovery¶
Privacy-by-Design Setup
- AWS Secrets Manager for secure OAuth credential storage
- SSM Parameter Store for change detection tokens
- No file content or personal data storage
- Aggregate activity logging only
Privacy-Compliant Discovery Process:
-
Processing Activity Logging
Aggregate activity logging without storing file content or personal information
What Gets Logged: - Number of files discovered (aggregate count) - MIME types found (categories, not actual files) - Processing activity timestamp - Data controller association - What's NOT logged: File names, content, personal data
Privacy Protection: - File discovery creates single processing activity record - No individual file records stored - No access to actual file content - Compliance audit trail only
-
Secure Credential Management
Enterprise-grade security with OAuth 2.0 and AWS-native credential management
Security Features: - AWS Secrets Manager for OAuth tokens - SSM Parameter Store for change detection - No credential exposure in logs or databases - Automatic token refresh and rotation - Comprehensive audit trails for access patterns
Compliance Integration: - Automated GDPR Article 30 activity record generation - Data processor registration in compliance system - Processing activity audit trails - No personal data collection or storage
-
Privacy-Compliant Monitoring
Audit trail monitoring with privacy risk assessment for compliance reporting
Monitoring Capabilities: - Processing activity frequency tracking - Data controller activity oversight - Compliance audit trail generation - Privacy risk assessment based on activity patterns
Analytics & Reporting: - Processing activity compliance dashboards - Automated regulatory compliance reports for activities - Data processing inventory management - Audit trail verification and validation
Privacy-First Implementation¶
Enterprise Google Drive setup focused on privacy compliance and audit trail generation.
Prerequisites:
# AWS infrastructure for secure credential management
# Dxtra enterprise subscription with compliance features
# Google Cloud Platform OAuth application
Step 1: Secure Credential Configuration
{
"aws_secrets_manager": {
"google_drive_credentials": {
"client_id": "your-oauth-client-id",
"client_secret": "your-oauth-client-secret",
"refresh_token": "oauth-refresh-token"
}
},
"privacy_settings": {
"content_access": false,
"metadata_only": true,
"audit_trail_generation": true,
"compliance_logging": "gdpr_article_30"
}
}
Step 2: Processing Activity Configuration
google_drive_processing_activity:
data_processor:
name: "Google Drive Discovery"
source_id: 2
type_id: 3 # File discovery activity
privacy_protection:
no_content_access: true
no_personal_data_storage: true
aggregate_logging_only: true
compliance_settings:
article_30_compliance: true
audit_trail_generation: true
data_controller_association: true
AWS Lambda-based privacy detection that creates audit trails without accessing or storing sensitive data.
Lambda Architecture:
# Google Drive Discovery Lambda (privacy-preserving)
async def lambda_handler_async(event, context):
"""
Privacy-preserving Google Drive file discovery.
Creates processing activity audit trails without storing:
- File content
- File names
- Personal data
- Individual file records
"""
data_controller_id = event.get("tenantId")
# Discover files (metadata only for counting)
files = discover_files_from_drive() # Count only
# Log privacy-preserving activity
if files:
activity_logged = await _log_file_discovery_activity(
data_controller_id,
files, # Used for aggregate counting only
full_scan_requested
)
# Return aggregate data for Step Functions
return {
"statusCode": 200,
"body": json.dumps({
"tenantId": data_controller_id,
"files": files, # For Step Functions processing only
"privacy_activity_logged": activity_logged
})
}
Privacy-Preserving Activity Logging:
async def _log_file_discovery_activity(
data_controller_id: str,
files: List[DriveFile],
full_scan: bool = False
) -> bool:
"""
Log file discovery activity (privacy-preserving).
Creates single data_processing_activities record with:
- Aggregate file count (not individual files)
- MIME type categories (not specific files)
- Processing timestamp
- Data controller association
"""
discovery_metadata = FileDiscoveryMetadata(
files_discovered_count=len(files),
file_types_discovered=[...], # Categories only
full_scan=full_scan
)
# Single processing activity record (privacy-preserving)
result = await client.log_file_discovery_activity(
data_controller_id,
discovery_metadata
)
Privacy-Compliant Workflow¶
Lambda-Based Processing¶
- Google Drive Discovery Lambda: Creates aggregate processing activity records
- PII Scanner Lambda: Logs PII detection activities (no content storage)
- Step Functions Orchestration: Coordinates privacy-compliant processing
- AWS-Native Security: Secrets Manager + SSM Parameter Store
Processing Activity Creation¶
When files are discovered, the system automatically: 1. Creates aggregate data processing activity record in Dxtra 2. Associates activity with appropriate data controller 3. Logs processing type and source (no file content) 4. Generates compliance audit trail for Article 30 requirements
PII Scanner Privacy Logging¶
Privacy-First Detection Methods¶
The PII scanner creates processing activity audit trails without storing actual PII content:
- Pattern detection for compliance activity logging
- Aggregate metadata collection (counts and types only)
- Privacy-preserving processing with no content storage
- Compliance audit trails for GDPR Article 30 requirements
Processing Activity Categories¶
| Activity Category | Field ID | Privacy Protection |
|---|---|---|
| Contact Data Processing | Field IDs 1-3 | No contact info stored |
| Financial Data Processing | Field IDs 4-6 | No financial data stored |
| Health Data Processing | Field IDs 7-9 | No health data stored |
| Identity Data Processing | Field IDs 10-12 | No identity data stored |
Privacy-Preserving Implementation¶
# PII Scanner Lambda (privacy-preserving)
async def log_pii_scan_activity(
data_controller_id: str,
file: DriveFile,
entities: List[Entity],
scan_duration_ms: Optional[int] = None,
) -> bool:
"""
Log PII scanning activity in privacy-preserving way.
Creates data_processing_activities record with only:
- Entity types found (not locations or content)
- Entity counts (not actual entities)
- Confidence scores (aggregate)
- Processing timestamp
"""
# Privacy-preserving metadata (no actual PII)
entity_types = list(set(entity.entity_type for entity in entities))
pii_metadata = PiiDetectionMetadata(
entity_types_found=entity_types, # Types only
entity_count=len(entities), # Count only
confidence_scores=[...], # Aggregate scores
scan_duration_ms=scan_duration_ms # Performance only
)
# Create processing activity record (privacy-preserving)
result = await client.log_pii_scan_activity(
data_controller_id=data_controller_id,
file_identifier=hash(f"{file.id}:{file.name}"), # Hash only
pii_metadata=pii_metadata # Aggregate metadata only
)
Lambda Architecture¶
Event-Driven Processing:
flowchart LR
A[Step Functions] --> B[PII Scanner Lambda]
B --> C[Presidio Analysis]
C --> D[Privacy Activity Client]
D --> E[Hasura GraphQL]
E --> F[data_processing_activities] Privacy-Preserving Flow: 1. File Processing: Lambda downloads and scans file temporarily 2. PII Detection: Presidio analyzes content (in memory only) 3. Activity Logging: Creates processing activity record (no content stored) 4. Cleanup: All file content and PII data discarded 5. Audit Trail: Only processing activity metadata retained
Custom Integration Framework¶
Privacy-Compliant Connector Architecture¶
Base Privacy Activity Connector¶
from privacy_activity_client import PrivacyActivityClient
class CustomPrivacyConnector:
def __init__(self, config):
self.client = PrivacyActivityClient(config.hasura_config)
self.data_controller_id = config.data_controller_id
async def log_processing_activity(self, activity_type, metadata):
"""Log processing activity without storing personal data"""
return await self.client.log_processing_activity(
data_controller_id=self.data_controller_id,
activity_type=activity_type,
privacy_metadata=metadata # Aggregate metadata only
)
Privacy-First Configuration¶
connector_config:
name: "Custom Privacy Connector"
version: "1.0.0"
privacy_settings:
content_access: false
metadata_only: true
processing_activity_logging: true
compliance_mode: "gdpr_article_30"
data_processor:
source_id: 10 # Custom connector source ID
type_ids: [20, 21, 22] # Custom activity types
Monitoring and Compliance¶
Processing Activity Dashboard¶
Real-Time Compliance Monitoring¶
{
"processing_activities": {
"google_drive_discovery": {
"status": "active",
"last_activity": "2025-08-07T10:30:00Z",
"activities_logged": 15420,
"data_controllers": 50,
"compliance_status": "gdpr_compliant"
},
"pii_detection": {
"status": "active",
"activities_logged": 2847,
"privacy_compliance": "full",
"content_storage": "none"
}
}
}
Compliance Metrics¶
- Processing Activities Logged: Total audit trail entries
- Privacy Compliance: Full privacy-by-design implementation
- Content Storage: Zero content or personal data storage
- Audit Trail Coverage: Complete GDPR Article 30 compliance
API Reference¶
Processing Activity API¶
Lambda Functions (Internal)¶
# These are internal AWS Lambda functions, not public APIs
lambda_functions:
google_drive_discovery:
handler: "fetch_lambda.lambda_handler"
privacy_mode: "audit_trail_only"
pii_scanner:
handler: "scanner_lambda.lambda_handler_sync"
privacy_mode: "metadata_only"
GraphQL Integration¶
# Processing activities are logged via existing Hasura schema
mutation LogProcessingActivity($object: dataProcessingActivities_insert_input!) {
insertDataProcessingActivity(object: $object) {
id
createdAt
dataSubjectId
sourceId
typeId
fieldIds
}
}
Best Practices¶
Privacy Implementation Guidelines¶
- Privacy by Design:
- Never store actual file content or personal data
- Log only aggregate processing activities
- Compliance First:
- Focus on GDPR Article 30 audit trail requirements
- Generate processing activity records for transparency
- Security:
- Use AWS-native credential management
- Implement proper access controls and audit logging
Data Protection Principles¶
- Data Minimization:
- Collect only processing activity metadata necessary for compliance
- Purpose Limitation:
- Use data only for compliance audit trail generation
- Storage Limitation:
- No personal data or content storage
- Privacy by Default:
- All connectors implement privacy-preserving design
Support and Resources¶
Getting Help¶
- Dxtra Support: connectors-support@dxtra.io
- Privacy Documentation: Focus on compliance and audit trail generation
- Developer Integration: Privacy-first connector development guides
Related Documentation¶
Last updated: 2025-08-12
Privacy-preserving implementation with GDPR Article 30 compliance
All connectors implement privacy-by-design with zero content storage