Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.

Slides:



Advertisements
Similar presentations
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Advertisements

CloudMoV: Cloud-based Mobile Social TV
Optimizing Cloud Resources for Delivering IPTV Services Through Virtualization.
Energy-Optimum Throughput and Carrier Sensing Rate in CSMA-Based Wireless Networks.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
Back-Pressure-Based Packet-by-Packet Adaptive Routing in Communication Networks.
Personalized QoS-Aware Web Service Recommendation and Visualization.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Secure Encounter-based Mobile Social Networks: Requirements, Designs, and Tradeoffs.
Minimum Cost Blocking Problem in Multi-path Wireless Routing Protocols.
Cross-Domain Privacy-Preserving Cooperative Firewall Optimization.
A Survey of Mobile Cloud Computing Application Models
NICE :Network Intrusion Detection and Countermeasure Selection in Virtual Network Systems.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Fast Nearest Neighbor Search with Keywords. Abstract Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions.
Understanding the External Links of Video Sharing Sites: Measurement and Analysis.
Security Evaluation of Pattern Classifiers under Attack.
A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data.
BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform.
Privacy Preserving Data Sharing With Anonymous ID Assignment
m-Privacy for Collaborative Data Publishing
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Optimal Client-Server Assignment for Internet Distributed Systems.
Protecting Sensitive Labels in Social Network Data Anonymization.
Identity-Based Secure Distributed Data Storage Schemes.
Incentive Compatible Privacy-Preserving Data Analysis.
Enabling Dynamic Data and Indirect Mutual Trust for Cloud Computing Storage Systems.
LARS*: An Efficient and Scalable Location-Aware Recommender System.
Cooperative Caching for Efficient Data Access in Disruption Tolerant Networks.
Anonymization of Centralized and Distributed Social Networks by Sequential Clustering.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
A System for Denial-of- Service Attack Detection Based on Multivariate Correlation Analysis.
Securing Class Initialization in Java-like Languages.
Scalable Distributed Service Integrity Attestation for Software-as-a-Service Clouds.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs.
A Generalized Flow-Based Method for Analysis of Implicit Relationships on Wikipedia.
Keyword Query Routing.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Facilitating Document Annotation using Content and Querying Value.
Traffic Pattern-Based Content Leakage Detection for Trusted Content Delivery Networks.
Two tales of privacy in online social networks. Abstract Privacy is one of the friction points that emerges when communications get mediated in Online.
Participatory Privacy: Enabling Privacy in Participatory Sensing
Preventing Private Information Inference Attacks on Social Networks.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Abstract We propose two novel energy-aware routing algorithms for wireless ad hoc networks, called reliable minimum energy cost routing (RMECR) and reliable.
Supporting Privacy Protection in Personalized Web Search.
Twitsper: Tweeting Privately. Abstract Although online social networks provide some form of privacy controls to protect a user's shared content from other.
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
Multiparty Access Control for Online Social Networks : Model and Mechanisms.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Data Mining with Big Data. Abstract Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development.
Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.
Securing Broker-Less Publish/Subscribe Systems Using Identity-Based Encryption.
Security Analysis of a Privacy-Preserving Decentralized Key-Policy Attribute-Based Encryption Scheme.
Dealing With Concept Drifts in Process Mining. Abstract Although most business processes change over time, contemporary process mining techniques tend.
Privacy-Enhanced Web Service Composition. Abstract Data as a Service (DaaS) builds on service-oriented technologies to enable fast access to data resources.
Privacy-Preserving and Content-Protecting Location Based Queries.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Distributed Processing of Probabilistic Top-k Queries in Wireless Sensor Networks.
Facilitating Document Annotation Using Content and Querying Value.
Fast Transmission to Remote Cooperative Groups: A New Key Management Paradigm.
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Presentation transcript:

Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm

Abstract In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussians are generally not applicable to sentence clustering. This paper presents a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pairwise similarities between data objects.

Abstract con… The algorithm uses a graph representation of the data, and operates in an Expectation- Maximization framework in which the graph centrality of an object in the graph is interpreted as a likelihood. Results of applying the algorithm to sentence clustering tasks demonstrate that the algorithm is capable of identifying overlapping clusters of semantically related sentences, and that it is therefore of potential use in a variety of text mining tasks. We also include results of applying the algorithm to benchmark data sets in several other domains.

Existing system Sentence clustering plays an important role in many text¬processing activities. For example, various authors have argued that incorporating sentence clustering into extrac¬tive multidocument summarization helps avoid problems of content overlap, leading to better coverage [1], [2], [3], [4]. However, sentence clustering can also be used within more general text mining tasks. For example, consider web mining [5], where the specific objective might be to discover some novel information from a set of documents initially retrieved in response to some query. By clustering the sentences of those documents we would intuitively expect at least one of the clusters to be closely related to the concepts described by the query terms; however, other clusters may contain information pertaining to the query in some way hitherto unknown to us, and in such a case we would have successfully mined new information.

Architecture Diagram

System specification HARDWARE REQUIREMENTS Processor : intel Pentium IV Ram : 512 MB Hard Disk : 80 GB HDD SOFTWARE REQUIREMENTS Operating System : windows XP / Windows 7 FrontEnd : Java BackEnd : MySQL 5

CONCLUSION The FRECCA algorithm was motivated by our interest in fuzzy clustering of sentence-level text, and the need for an algorithm which can accomplish this task based on relational input data. The results we have presented show that the algorithm is able to achieve superior performance to bench¬mark Spectral Clustering and k-Medoids algorithms when externally evaluated in hard clustering mode on a challen¬ging data set of famous quotations, and applying the algorithm to a recent news article has demonstrated that the algorithm is capable of identifying overlapping clusters of semantically related sentences. Comparisons with the ARCA algorithm on each of these data sets suggest that FRECCA is capable of identifying softer clusters than ARCA, without sacrificing performance as evaluated by external measures. Although motivated by our interest in text clustering, FRECCA is a generic fuzzy clustering algorithm that can in principle be applied to any relational clustering problem, and application to several nonsentence data sets has shown its performance to be comparable to Spectral Clustering and k-Medoid benchmarks.

THANK YOU