Share This
AJAX progress indicator
Search: (clear)
  • a

  • Algorithm
    Algorithm, in the context of data, is a mathematical formula or statistical process used to perform an analysis of data. Data science is replete with algorithms in machine learning, artificial intelligence, data mining, and in all data related problems. Some algorithms are generic and in(...)
  • b

  • Batch Processing
    Batch data processing is an efficient way of processing high volumes of data where a group of transactions is collected over a period of time.
  • c

  • Cassandra
    Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple(...)
  • Cloud Computing
    Storing, accessing, and processing data and /or programs on remote servers that are accessible from anywhere on the internet as opposed to using local computers (whether desktop or servers located on-premise). 
  • Cluster Computing
    When computing is done by two or more loosely or tightly computers or systems (called nodes) that work together to perform tasks so that, in many respects, they can be viewed as a single system.
  • d

  • Dark Data
    Data that is gathered and processed by enterprises and organizations not used for any meaningful purposes and hence it is ‘dark' and may never be analyzed
  • Data Analytics
    Data analytics often involves studying past historical data to research potential trends, to analyze the effects of certain decisions or events, or to evaluate the performance of a given tool or scenario. This can involve predicting and prescribing future actions. The goal of data analytics is(...)
  • Data Lake
    data lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed.
  • Data Manager
    Data manager is someone who'll help collect, analyze, and apply data towards a business goal such as increase revenues or reduce costs. They'll have deep understanding of the with respect to their sources, various attributes, applicability to business functions, and ability to analyze.(...)
  • Data Mining
    Data mining is the process of sifting through large data sets to identify and describe patterns, discover and establish relationships with an intent to predict future trends based on those patterns and relationships. 
  • Data Scientist
    Data Scientist is a person who can work with massive amounts of data (structured and unstructured) and use their skills in math, statistics, and programming to clean, massage and organize the data and be able to tell stories with those visualizations.
  • Data Warehouse
    A large store of data accumulated from a wide range of sources within a company and used to guide management decisions.
  • Descriptive Analytics
    Descriptive analytics 'describes' historical data by identifying patterns and trends to yield useful information and possibly prepare the data for further analysis. Descriptive Analytics is the msot fundamental of data analytics.
  • Diagnostic Analytics
    Diagnostic analytics is a form of data analytics which examines data or content to answer the question “Why did it happen?”. It involves techniques such as drill-down, data discovery, data mining and correlations.
  • Distributed File System
    Distributed File System is a data storage system meant to store large volumes of data across multiple storage devices and will help decrease the cost and complexity of storing large amounts of data.
  • e

  • ETL
    ETL or also known as 'Extract, Transform, Load' is the process of ‘extracting’ raw data, ‘transforming’ by cleaning/enriching the data for ‘fit for use’ and ‘loading’ into the appropriate repository for the system’s use.
  • f

  • Fuzzy Logic
    Fuzzy logic is an approach to computing based on "degrees of truth" or truth values of variables vary between 0 and 1 rather than the usual "true or false" (1 or 0) of Boolean logic. It originated with natural language processing and is meant to address the concept of partial truth. 
  • g

  • Gamification
    Gamification in big data is using game concepts (scoring points, competing with others, etc.) concepts to collecting data or analyzing data or generally motivating users to participate and engage. Gamification takes the data-driven techniques that game designers use to engage players, and(...)
  • Graph databases
    Graph databases use concepts such as nodes and edges representing people/businesses and their interrelationships to mine data from social media. A key concept of the system is the graph (or edge or relationship), which directly relates data items in the store.
  • h

  • Hadoop
    Hadoop (with its cute elephant logo) is an open source, java-based software framework that consists of what is called a Hadoop Distributed File System (HDFS) and allows for storage, retrieval, and analysis of very large data sets using distributed computing environment. It is part of the(...)
  • i

  • In-Memory Computing
    In-memory computing is a technique to moving the working datasets entirely within a cluster’s collective memory and avoid writing intermediate calculations to disk. One example is Apache Spark and this method is considered to be faster.
  • IOT
    IOT, also known as Internet Of Things, is the interconnection of computing devices in embedded objects (sensors, wearables, cars, fridges or people/animals etc.) via internet and they enable sending / receiving data.
  • j

  • Java
    Java is a widely used programming language expressly designed for use in the distributed environment of the internet. Many open source systems and programming environments are built on Java.
  • k

  • Kafka
    Kafka,or Apache Kafka, is used for building real-time data pipelines and streaming apps. Kafka enables storing, managing, and processing of streams of data in a fault-tolerant way and supposedly ‘wicked fast’. Given that social network environment deals with streams of data, Kafka is currently(...)
  • l

  • Load Balancing
    Load balancing refers to distributing workload across multiple computers or servers in order to achieve optimal results and utilization of the system. 
  • m

  • Machine Learning
    Machine learning is a method of designing systems that can learn, adjust, and improve based on the data fed to them. Using predictive and statistical algorithms that are fed to these machines, they learn and continually zero in on “correct” behavior and insights.
  • MapReduce
    MapReduce is a programming model with Map and Reduce being two separate items. In this, the programming model first breaks up the big data dataset into pieces called tuples so it can be distributed across different computers in different locationswhich is essentially the Map part. Then the(...)
  • n

  • NoSQL
    NoSQL which stands for Not Only SQL (Structured Query Language) refers to database management systems that are designed to handle large volumes of data that does not have a structure or what’s technically called a ‘schema’ (like relational databases have). NoSQL databases are often well-suited(...)
  • o

  • Object database
    An object database (also object-oriented database management system, OODBMS) is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented.
  • p

  • Predictive Analytics
    Predictive analytics is the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data. It is not so much about ‘predicting the future’ rather ‘forecasting with probabilities’ of what might happen.
  • Prescriptive Analytics
    Prescriptive analytics is about 'prescribing' a number of possible actions for a given situation and guide the users towards a solution. Prescriptive analytics attempt to quantify the effect of future decisions in order to advise on possible outcomes before the decisions are actually made.
  • q

  • Query
    A query is a formal question.  Used in Structured Query Language (SQL), queries can add/update/delete data.
  • r

  • R
    R is a programming language for statistical computing and acts as an alternative to traditional statistical packages such as SPSS, SAS, and Stata. It is an extensible, open-source language and computing environment for Windows, Macintosh, UNIX, and Linux platforms. Such software allows for the(...)
  • s

  • Spark
    Apache Spark is a fast, in-memory data processing engine to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.
  • Stream Processing
    Stream processing is designed to act on real-time and streaming data with “continuous” queries. Combined with streaming analytics i.e. the ability to continuously calculate mathematical or statistical analytics on the fly within the stream, stream processing solutions are designed to handle(...)
  • Structured Data
    Structured data is basically anything than can be put into relational databases and organized in such a way that it relates to other data via tables.
  • t

  • Terabyte
    A relatively large unit of digital data, one Terabyte (TB) equals 1,000 Gigabytes. It has been estimated that 10 Terabytes could hold the entire printed collection of the U.S. Library of Congress, while a single TB could hold 1,000 copies of the Encyclopedia Brittanica.
  • u

  • Unstructured Data
    Unstructured data is data that is not contained in a database or some other type of data structure– email messages, social media posts and recorded human speech etc.
  • v

  • Visualization
    Visualization is any technique for creating images, diagrams, or animations to communicate a message. Data Visualization has become very important to tell the story of data analysis and it has become important skill for data scientists.
  • w

  • Weather Data
    Weather data  is an open public data source that can provide information about weather around the world and this can be manipulated to obtain lot of insights if combined with other sources
  • x

  • XML Database
    XML Databases allow data to be stored in XML (Xtensible Markup Language) format. XML databases are often linked to document-oriented databases. The data stored in an XML database can be queried, exported and serialized into any format needed.
  • y

  • Yottabyte
    A Yottabyte is a measure of storage capacity equal to 2 to the 80th power bytes or, in decimal, approximately 1,000 zettabytes, a trillion terabytes (TB) or a million trillion megabytes. Approximately 1,024 yottabytes make up a brontobyte.
  • z

  • Zettabyte
    A zettabyte is a measure of storage capacity and is 2 to the 70th power bytes, also expressed as 1021 or 1 sextillion bytes. One zettabyte is approximately equal to a thousand exabytes or a billion terabytes.
  • Zookeeper
    ZooKeeper is a software project of the Apache Software Foundation, a service that provides centralized configuration and open code name registration for large distributed systems. ZooKeeper is a subproject of Hadoop.
Download Tooltip Pro