SPECIFICATION
SYSTEM
- A system that the Distributed Parallel Processing-based data collection, storage, processing, management
- A system that the Distributed Parallel Processing-based the advanced analysis of R function
- A system that supports the distributed parallel processing for data acquisition, storage, processing, analysis and supports Hadoop for data storage
- Systems that support all Linux-based x86 servers without hardware dependencies
DATA COLLECTION
- Remote log file collection function
- Formal data collection (sqoop, etc.)
- Unstructured data streaming collection function (Flume etc.)
- Supports various types of unstructured data
- Irregular multi-line data collection function
- Parallel processing collection function for bulk loading
- File data collection function with SSL / TCP
- Import to HDFS, Import to Hive, Import to Hbase
- User pre-confirmation with collection data preview function
- GUI-based import / export of structured data
- Agent and agentless data collection function
- Data collection function through data transmission interval encryption (SSL)
DATA STORAGE & PROCESSING
- Parallel distributed processing function of big data store
- Storage features that integrate NoSQL and HDFS
- Ability to process NoSQL and HDFS based on Hive (using SQL processing)
- DB data processing function of repository data
- Storage directory navigation
- Supports data encryption / decryption of Storage Level (storage & inquiry)
- GUI-based access control (ACL) processing and storage
- Process design features for ETL processing
- R linkage processing function for ETL processing
- Data consistency check function in ETL processing
- Support Collection target data mapping function (ETL)
- Collection data processing / filtering function (ETL)
- RDBMS interworking function of Oracle, MS-SQL, MySQL etc
- Provides separate GUI (Workbench) Tool for SQL based processing
- Unified SQL Engine that can process data in SQL-based
- Multi-SQL processing in GUI
- Provides extended UDF for various SQL processing
- SQL-based DB, table access control processing function (grant processing function)
- Provides GUI environment for SQL-based DB / Table / View creation
- Ability to store big data processing results in HDFS, NoSQL, File, RDBMS, etc.
- Scheduling function for distributed parallel processing jobs
- GUI-based workflow for data management and batch processing
- Branch, dependency add-on function in workflow
- Provide Extension interface with external application / server on workflow
ADVANCED DATA ANALYSIS
- Formal analysis and atypical analysis
- Provides the distributed parallel processing for analysis of hundreds of millions of big data
- Parallel distributed processing function based on SQL by linking R to Hive
- Provides separate advanced analysis functions when execute Parallel distributed processing by linking with R to Hive
- Provide a separate working area for analysts to use
- Provides compatibility with various analysis / BI tools Including R for securing upgrading and expandability of analysis results.
BIG DATA PLATFORM H/W SPECIFICATION
- Managed node & Collection node (2 EA)
- Data node (3 EA)
- Analysis node (1EA)
- Switch (2EA)
- Rack (1EA)
- System installation and optimization support