Solr的功能介绍 收藏 评论
2013年08月22日



SolrTM Features

Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.

    Advanced Full-Text Search Capabilities
    Optimized for High Volume Web Traffic
    Standards Based Open Interfaces - XML, JSON and HTTP
    Comprehensive HTML Administration Interfaces
    Server statistics exposed over JMX for monitoring
    Linearly scalable, auto index replication, auto failover and recovery
    Near Real-time indexing
    Flexible and Adaptable with XML configuration
    Extensible Plugin Architecture

Solr Uses the LuceneTM Search Library and Extends it!

    A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
    Powerful Extensions to the Lucene Query Language
    Faceted Search and Filtering
    Geospatial Search with support for multiple points per document and geo polygons
    Advanced, Configurable Text Analysis
    Highly Configurable and User Extensible Caching
    Performance Optimizations
    External Configuration via XML
    An AJAX based administration interface
    Monitorable Logging
    Fast near real-time incremental indexing and index replication
    Highly Scalable Distributed search with sharded index across multiple hosts
    JSON, XML, CSV/delimited-text, and binary update formats
    Easy ways to pull in data from databases and XML files from local disk and HTTP sources
    Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika
    Apache UIMA integration for configurable metadata extraction
    Multiple search indices

Detailed Features
Schema

    Defines the field types and fields of documents
    Can drive more intelligent processing
    Declarative Lucene Analyzer specification
    Dynamic Fields enables on-the-fly addition of new fields
    CopyField functionality allows indexing a single field multiple ways, or combining multiple fields into a single searchable field
    Explicit types eliminates the need for guessing types of fields
    External file-based configuration of stopword lists, synonym lists, and protected word lists
    Many additional text analysis components including word splitting, regex and sounds-like filters
    Pluggable similarity model per field

Query

    HTTP interface with configurable response formats (XML/XSLT, JSON, Python, Ruby, PHP, Velocity, CSV, binary)
    Sort by any number of fields, and by complex functions of numeric fields
    Advanced DisMax query parser for high relevancy results from user-entered queries
    Highlighted context snippets
    Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
    Multi-Select Faceting by tagging and selectively excluding filters
    Spelling suggestions for user queries
    More Like This suggestions for given document
    Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.
    Range filter over Function Query results
    Date Math - specify dates relative to "NOW" in queries and updates
    Dynamic search results clustering using Carrot2
    Numeric field statistics such as min, max, average, standard deviation
    Combine queries derived from different syntaxes
    Auto-suggest functionality for completing user queries
    Allow configuration of top results for a query, overriding normal scoring and sorting
    Simple join capability between two document types
    Performance Optimizations

Core

    Dynamically create and delete document collections without restarting
    Pluggable query handlers and extensible XML data format
    Pluggable user functions for Function Query
    Customizable component based request handler with distributed search support
    Document uniqueness enforcement based on unique key field
    Duplicate document detection, including fuzzy near duplicates
    Custom index processing chains, allowing document manipulation before indexing
    User configurable commands triggered on index changes
    Ability to control where docs with the sort field missing will be placed
    "Luke" request handler for corpus information

Caching

    Configurable Query Result, Filter, and Document cache instances
    Pluggable Cache implementations, including a lock free, high concurrency implementation
    Cache warming in background
    When a new searcher is opened, configurable searches are run against it in order to warm it up to avoid slow first hits. During warming, the current searcher handles live requests.
    Autowarming in background
    The most recently accessed items in the caches of the current searcher are re-populated in the new searcher, enabling high cache hit rates across index/searcher changes.
    Fast/small filter implementation
    User level caching with autowarming support

SolrCloud

    Centralized Apache ZooKeeper based configuration
    Automated distributed indexing/sharding - send documents to any node and it will be forwarded to correct shard
    Near Real-Time indexing with immediate push-based replication (also support for slower pull-based replication)
    Transaction log ensures no updates are lost even if the documents are not yet indexed to disk
    Automated query failover, index leader election and recovery in case of failure
    No single point of failure

Admin Interface

    Comprehensive statistics on cache utilization, updates, and queries
    Interactive schema browser that includes index statistics
    Replication monitoring
    SolrCloud dashboard with graphical cluster node status
    Full logging control
    Text analysis debugger, showing result of every stage in an analyzer
    Web Query Interface w/ debugging output
    Parsed query output
    Lucene explain() document score detailing
    Explain score for documents outside of the requested range to debug why a given document wasn't ranked higher.

The Apache Software Foundation

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are defined by collaborative consensus based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field. Apache Lucene, Apache Solr, Apache PyLucene, Apache Open Relevance Project and their respective logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Download
Click to begin
of Apache Solr 4.4
Apache Solr 4.4
Resources




http://blog.webinno.cn/article/view/72

本文地址:http://blog.webinno.cn/article/view/72

发表于 @ 2013年08月22日 | 浏览5014次| 编辑 |评论(loading... ) | 分享到:QQ空间新浪微博腾讯微博微信

评论列表

发表评论