Flush the log of ansible to fluentd
Chef のログを Fluentd に流す I read an article called. Even though Ansible has the function for that, I have not got any more information so I tried it together.
Callback plugin
Ansible has a mechanism called plugin apart from module. For example, "{{lookup('file', '/etc/foo.txt') }}" ', '/etc/foo.txt')}}" * There is a lookup plugin to use and so on. There are other connection and vars plugins, but one of them is the callback plugin.
As the name implies, callback plugin is a plugin for registering callback which is automatically called when various events occur. As an example of the event,
- At the start of playbook
- At the start of task
- When task is successful
- When task fails
- At the end of playbook
There are.
Example : send to fluentd plugin
Module is ansible which can be implemented in various languages, but unfortunately the plugin can only be implemented in the current state python.
So, the result of implementing the callback plugin is the following code. You can see that you can implement callback functions.
However, there are too many handlers that can be registered and only some of them are listed here. gist to because I've mentioned, there also visit. Please understand at what time it is called from the function name in the atmosphere. (There is no official document at this time, so I will not read the code)
import json
import urllib
import urllib2
url = 'http://localhost:8888/ansible'
def post(category, data):
data['category'] = category
invocation = data.pop('invocation', None)
if invocation:
data['module_name'] = invocation['module_name']
data['module_args'] = invocation['module_args']
values = {'json': json.dumps(data)}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
urllib2.urlopen(req)
class CallbackModule(object):
def on_any(self, *args, **kwargs):
pass
def runner_on_failed(self, host, res, ignore_errors=False):
res["host"] = host
post(host, 'FAILED', res)
def runner_on_ok(self, host, res):
res["host"] = host
post('OK', res)
def runner_on_error(self, host, msg):
post('ERROR', {"host": host, 'error': msg})
def runner_on_skipped(self, host, item=""):
post('SKIPPED', {"host": host, "item": item})
This is sent to fluentd via in_http, and the results written out_file are as follows.
2014-02-16T00:18:04+09:00 ansible {"category":"Play_on_start"}
2014-02-16T00:18:05+09:00 ansible {"category":"Play_start"}
2014-02-16T00:18:10+09:00 ansible {"category":"OK","changed":true,"end":"2014-02-1515:18:10.810734","stdout":"Sat Feb 15 15:18:10 UTC 2014","cmd":["date"],"rc":0,"start":"2014-02-1515:18:10.808307","host":"docker","stderr":"","delta":"0:00:00.002427","module_name":"command","module_args":"date"}
2014-02-16T00:18:11+09:00 ansible {"verbose_always":true,"category":"OK","host":"docker","msg":"Sat Feb 15 15:18:10 UTC 2014","module_name":"debug","module_args":"msg=\"Sat Feb 15 15:18:10 UTC 2014\""}
Although it is a bit confusing, the results ("OK") of the task in the third line include start (start time), end (end time), and execution time called delta. If it is in the log, delta is pretty useful, is not it? Of course, stdout standard output is also taken.
Incidentally, the tasks executed are as follows.
tasks:
- name: get date
command: date
register: date
- name: debug
debug: msg="{{ date.stdout }}"
Installation method
There are three ways to use the callback plugin.
1. inventory file with the same hierarchy to callback_plugins create a directory named, put the plugin file in it. 2. By default it is under / usr / share / ansible / plugins put under callback_plugin 3. Specify the directory where callback_plugin is placed in ansible.cfg
callback_plugins = /usr/share/ansible_plugins/callback_plugins
Both can be used simply by placing a python file. Conversely, please pay attention to the fact that if you place carelessly it will be executed arbitrarily.
If multiple callback plugins are in the directory, all of them will be executed.
Summary
We introduced the callback plugin which obtains the execution result of ansible and implements the plugin which sends it to fluentd as an example.
I think that it is useful not only for fluentd but also for skipping to nagios and skipping to hipchat when failing, for example.
Watch MQTT packet with Wireshark
Tip
別のLuaプラグインがありましたので追記します。(2014/02/26)
wireshark Generic Dissector _ from your tailored to the architecture generic.so download
MQTT dissector / decoder for Wireshark http://false.ekta.is/2011/06/mqtt-dissector-decoder-for-wireshark/>` _ download the zip from. When deployed
- Mqtt3.1.fdesc
- Mqtt3.1.wsgd
Two files will come out.
I throw three files in total into the plugin directory.
In Ubuntu 13.04, wireshark 1.10.2 it was / usr / lib / x86 _ 64 - linux - gnu / wireshark / libwireshark 3 / plugins /.
Afterwards if you start wireshark normally it is OK.
Because I could not take a screen shot anyhow, I borrowed the image of the page just before.
It seems that broker development is making progress.
Lua Plugin
Wireshark can run Lua as a plugin. There is an MQTT dessector implemented by Lua, so you can use it here. In a feeling that I used it lightly it is easier to see this one such as flags.
How to use
Download mqtt.lua from the above github. after that
% wireshark -X lua_script:mqtt.lua
And if you run it is ok. If you want to use it permanently, put it in the wireshark plugins directory and use it.
Authority problem
Note that wireshark has been built with Lua enabled with default from 1.8 and above, but at the same time it is disabled for Lua when launching as root user.
When capturing packets with general users, on Linux set as follows. Reference : Platform-Specific information about capture privileges
% sudo setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' /usr/bin/dumpcap
- CAP_NET_RAW
- Authority to use RAW socket and PACKET socket
- CAP_NET_ADMIN
- Authority to perform network related operations
So please cut it if you do not need it.
Summary of MQTT
Note
MQTT As a Service: sangoをリリースしました
In August 2014, we are beginning to use the MQTT and can be easily registered in the GitHub account sango the 時雨堂 is Released.
There is also a free plan, so if you want to use MQTT once, we recommend using sango.
Recently voluntas Mr. 活動 to you is, we have suddenly MQTT related hot. I think that it is probably because the observation range is narrow.
However, there are buzzwords such as M2M (Machine to Machine) and IoT (Internet of Things), and it is becoming an era when things are connected to the Internet, and I feel that the value of MQTT is getting higher. It may also be drawing attention in the meaning of a protocol suitable for the mobile age.
I will summarize here about MQTT once.
What is MQTT
MQTT (MQ Telemetry Transport) is a lightweight message protocol based on the publish / subscribe model. It is characterized by the function to operate in a place where the network is unstable and the weight saving for moving with a non powerful device.
It comes to MQ and the name, but in the so-called Job Queue for dispersing the load ありません . Please use AMQP etc. for such use.
MQTT is specialized in delivering messages, especially for applications that have to have many Publishers / Subscribers. In addition, it is also used for applications that need real-time communication, taking advantage of lightweight points and synchronous communication points.
In particular
- Data collection from sensors (example of Kitakyushu city of IBM)
- Message Service ( Facebook Messanger )
- Car and smartphone of information synchronization ( Connected Car from CES 2014 )
And so on.
history
Originally MQTT was formulated by IBM and Eurotech since 1999. IBM is IBM MessageSight have called products.
Then in 2011 Eclipse FoundationにMQTTのコードが寄贈され , also, in 2013 OASISという国際標準化団体にて標準化 is now to be done.
Specification is a royalty-free 公開 have been. Current specification 3.1 is the latest, so 3.1.1 will appear soon.
Characteristic
I will briefly summarize the features of MQTT.
- Less fixed overhead of 2 bytes and less processing
- One-to-many, many-to-many message distribution, and flexible by Topic wildcard
- Three types of QoS can be specified according to application characteristics
- Three types of "one at a time", "at least once", "exactly once"
- "Durable subscribe" which can receive disconnected messages after disconnection occurs and reconnected
- "Last Will and Testament" which sends a message decided in advance when the Publish side disconnects
- Save the last message so that it can be sent if you subscribe later "Retain"
QoS, Will etc are characteristic features of MQTT.
Note that MQTT operates on TCP / IP, and payload is not specially specified in the specifications. It seems that strings are often used, and there are also stories about using MessagePack.
Let's explain these.
lightweight
MQTT has a fixed length header with minimum 2 bytes, low overhead, and the protocol is simple. Therefore, it is superior to HTTP over network bandwidth and processing speed. Also, due to less processing, power consumption has also decreased, making it suitable for mobile devices.
IBMの資料 According to the, compared to HTTP
- Traffic from one tenth to one hundredth, ie 10 to 100 times throughput
- Battery consumption is less than 1/10
That is.
Besides, it is a connection oriented protocol that sends heart beat at fixed intervals instead of connectionless like HTTP, and performs bidirectional communication. Therefore, high immediacy communication is possible.
But is a performance, 別のIBMの資料 In
- Linux, 4 x 4-core 2.93 GHz Intel Xeon with 32 GB of RAM and 10 Gbit LAN
so
- 100,000 clients
- 13,000 messages / sec
To
- 25% CPU
It seems that he decided it. Although it is the lightest case (QoS 0), it seems that 6000 messages / sec was flagged with 30% CPU load even in the heaviest case (QoS 2).
Topic
Each message has "Topic". Topic is
/r_rudi/second_house/room10/light/watt
And so on / has a hierarchical structure of the separator.
On the Subscriber side, you specify the Topic you want to subscribe to. At this time you can specify a Wildcard, not an exact match.
For example, it can be specified as follows.
- / R_rudi / second_house / room 10 / light / watt
- Specify exact match
- / R_rudi / second_house / #
- / R_rudi / second_house / all below topic
- / R_rudi / second_house / + / light / watt
- Broad match (in this case you get light watt of all rooms)
Of course one subscriber can subscribe more than one topic.
QoS
In MQTT, QoS can be set for message delivery.
- 0 QoS : At Most Once - most once. I do not guarantee that it will arrive.
- 1 QoS : At Least Once - at least once. There is a possibility of overlapping.
- 2 QoS : Exactly Once - exactly once
Since this can be changed for each message to be sent, normally you can send it with QoS 0 even though it does not arrive, but you can set QoS 1 or 2 so that the control message always arrives.
For QoS 1 and 2, ACK and retransmission etc are also defined in the protocol specification.
Durable subscribe
MQTT assumes a situation where the network is unstable. Therefore, in the case of QoS 1 and QoS 2, the server (Broker) keeps the sent message for a certain period.
If the Subscriber disconnects suddenly without explicit DISCONNECT or UNSUBSCRIBE, there is a function to retransmit the message between them when reconnecting.
This makes it possible to operate on unstable networks and servers.
Will
The client can add information called Will when first CONNECTing the server. When the server can not communicate with that client, it sends the topic and payload specified by this Will.
By doing this, you can judge that the Publish side is dead on Subscriber side.
The death check is judged by answering ping sent at regular intervals or not. Since this interval is also decided for each connection, you can do something like this device because you are worried about batteries so you can increase the interval.
Retain
The Retain function is a function that the MQTT server holds the message that was last published, and passes the message to a new Subscriber.
MQTT is a Publish / Subscribe model. Therefore, messages will only be sent to clients that subscribed when they were published. Therefore, for example, even if you subscribe to get updated information every hour, you will not get any information for up to an hour.
However, even in this case you can get the latest information using the Retain function.
Security
MQTT allows you to specify a user name and password the first time you connect to the server (Broker). However, since the password itself is sent in clear text, security is to be established through SSL.
Also, there is no Subscriber's certification, so it is not in the specification that this Topic will only deliver to this Subscriber. However, Mosquitto, which is described later, implements a function that allows access management of Read and Write for each user name.
I wrote about specifying the user name and password before, but this handling is implementation dependent. In addition to normal authentication, it is also possible to implement authentication using LDAP or OAuth.
Implementation
Since the protocol is simple, there are many implementations. However, there are not many things that are properly implemented, such as QoS 0 alone.
Among them, we list enumerated items.
Mosquitto is an old implementation written in C and implements almost all functions and is often treated as a reference implementation.
RabbitMQ is a well-established MQ server made by erlang. It also supports AMQP, but it also supports MQTT.
Paho is an implementation donated by IBM as mentioned above, and it supports various languages such as Java version, Python, JS etc.
ActiveMQ Apollo can also make other STOMP, AMQP, etc. of MQTTT.
Also, it is closed source, but there is also one called HiveMQ.
Note
時雨堂 You have been made the MQTT Broker. It is implemented in erlang language, it seems to have considerable speed and fault tolerance. Unfortunately closed source, such as the spec and development log こちら can be referenced from.
WebSocket
WebSocket is a protocol to carry out two-way communication over HTTP, and alternative operation if the recent browser.
Since this protocol provides the same function as MQTT bidirectional communication, there is also a library that connects to the MQTT server using WebSocket on the browser.
Summary
Recently I summarized about MQTT that it seems slightly slightly hot slightly but maybe not.
Light protocols, connection oriented, disconnection detection, retransmission functions, etc. are various protocols that can be used. It may fit a lot of usages unexpectedly.
It is OK to do "do everything with HTTP", but it may be one idea to try using other protocols.
By the way, the MQTT related software I wrote is the following three. Everything is at the concept level, but as long as it becomes a reference for something.
- Gostat : https://bitbucket.org/r_rudi/gostat
- Kibana-Mqtt : https://bitbucket.org/r_rudi/mqtt-kibana
- Mqtt-Storm : https://github.com/shirou/storm-mqtt
Ansible adhoc command
Ansible can run a command on the remote nodes. quite easily.
ansible command
Use ansible command, not ansible-playbook command which you use often.
% ansible webservers -a "free -m"
Web01 | success | rc=0 >>
total used free shared buffers cached
Mem: 3831 444 3387 0 87 157
-/+ buffers/cache: 199 3632
Swap: 0 0 0
Web02 | success | rc=0 >>
total used free shared buffers cached
Mem: 3831 2656 1174 0 401 306
-/+ buffers/cache: 1949 1881
Swap: 0 0 0
If you want to use pipe or redirect, use -m shell.
% ansible webservers -m shell -a "free -m | grep Mem"
Web01 | success | rc=0 >>
Mem: 3831 444 3386 0 87 157
Web02 | success | rc=0 >>
Mem: 3831 2657 1174 0 401 306
Hence, you can get server information from all hosts.
-m specifies module. There are many examples.
ex 1: copy files to remote nodes
% ansible webservers -m copy -a "src=/tmp/spam dest=/tmp/ham"
ex 2: make a directory on the remote nodes
% ansible webservers -m file -a "dest=/path/to/c mode=644 owner=mdehaan group=mdehaan state=directory"
ex 3: package install
% ansible webservers -m yum -a "name=dstat state=installed"
ex 4: deploy via git
% ansible webservers -m git -a "repo=git://foo.example.org/repo.git dest=/srv/myapp version=HEAD"
ex5: managing services
% ansible webservers -m service -a "name=httpd state=started"
Tips: store command results to files
ansible command has a -t option. When you specify this option, the command results will be outputted to the specified directory with JSON format. Files are separated by the hosts.
% ansible webservers -a "free -m" -t tmp
% ls tmp/
Web01 Web02
This is an example of the json file.
{
"changed": true,
"cmd": [
"free",
"-m"
],
"delta": "0:00:00.003748",
"end": "2014-01-21 22:32:18.565043",
"rc": 0,
"start": "2014-01-21 22:32:18.561295",
"stderr": "",
"stdout": " total used free shared
buffers cached\nMem: 3831 444 3386
0 87 157\n-/+ buffers/cache: 198
3632\nSwap: 0 0 0"
You can use ansible command with some monitoring system or record long-live command, or something else. Try it and find your use-case.
Ansible of architecture : beyond the configuration management
You have passed about already February, 2013/11 / Michael DeHaan's a Ansible Works of CTO to 29, Ansible's Architecture: Beyond Configuration Management management /> `_ that I am writing an article.
I thought that this article is a very good article to explain the architecture of Ansible, so I will publish the translated version with permission of DeHaan.
However, there are many places where this person does not translate successfully because it is poetical and meaning is difficult to say that this sentence is one sentence long. Please pointed out when there is a wrong place.
Ansible of architecture : beyond the configuration management
There was something nice argument about what Ansible is like, it was very strange, so let's classify what is Ansible here.
Ansible is often introduced as a "configuration management" system. But is this really a correct explanation?
Genuine 構成管理 system defines a remote machine in a strict model. These were derived from CFEngine 2 which began 15 years ago. The model runs in Pull mode and Push mode, and the model decides what that server should be, such as by checking with machines that are managed by its own variables (we call "facts"). End users are very welcome to automate, but they are not interested in configuration management and their architecture. Mostly, I just send a file and see if the service is working properly. The architecture to be compiled is good for configuration management, but lacks flexibility for multistage applications with DB, web services, applications and so on. In the typical case of deploying modern distributed applications, many people use other application deployment tools and launch other automation tools. Classic configuration management - "/ sbin / service foo start " declarative that "service is moving to stand up at the time of start-up" instead of リソースモデル , that model to conceal the service of implementation - is Of course it works properly. Ansible has these. A lot. More than 170 of the module is included with the Core, 「バッテリー同梱(batteries included)」 it. Module, your 道具箱 is like a tool that is in the, Ansible is like a 6 foot tall NASCAR-garage-worth tool box (Yakuchu : this parable do not know a little better).
In contrast, アプリケーションデプロイシステム (Capistrano and Fabric is very famous in the classic configuration management user), giving the script that describes the operation for the series of interest, to run in the following. Scripts execute remote machines in parallel through abstracted (eg ssh) connectors and rich features. Simply not only to compile a rigid model of the remote system, the ability to jump over a groove located between the system and the service, box (Yakuchu : beyond the groove between the thought is that to say that for each server) application To be sent. However, typical of these tools, there is a need in the configuration management system 宣言的な抽象化 there is no function of. However, because Ansible uses a declarative resource model, configuration management and application deployment can be executed in the same way as easily. Also, because it uses a data- oriented automation language, there is no need to implement rules for automatic execution in code and there is no danger of complicating software projects. For developers, Ansible probably feels like an application deployment system written by someone with configuration management background. Because you can handle all the resources - git repo, service, configuration file, load balancer etc - from the settings for automating your application. The problem with the script is that it is software code, not what it is executed as it is "wonderful" you wrote. Software code requires effort to write and maintain, possibly growing to such a degree that it becomes hard to read. - And it will not support a resource model that avoids a fragile condition. So Ansible uses a reliable declarative model and abstraction layer brought from the world of configuration management, rather than writing the automation process using Ruby or Python, to run something like an application deployment system I decided to decide.
Well, as I talked about both, I will bring two together. We 構成管理 and アプリケーションデプロイ believe that it is somewhat arbitrary is to separate. - In the real world, the ultimate goal is to deploy business applications. Therefore, I am ambiguous about this boundary. Everyone is thinking about this direction. We usually do not have a "configuration management" user group, we have DevOps and an automated user group. This is all to deploy the application, it is happy with the product. Ansible is also a configuration management tool and application deployment tool. 両方の世界のいいところ uptake, has been realized a hybrid solution to remove the legacy aspects of each system.
In addition オーケストレーション (Orchestration) there is a concept. This is a word used for many different things, I think that we need some more soft words, but there is no other suitable word. This means one of the following completely different things depending on the context.
- Basic function of sending commands to the remote system
- Basic function to send instructions to resources to remote system
- Trigger to execute automated things in configuration management
- Before other systems indicate how the system is configured
- For example, a function to build a workflow system that can be used for many purposes, such as a pipeline depending on the step of writing an IT process
Ansible can do all of these. And say more, Ansible can not only all of the above, the last of the kind - 多くの目的に使えるワークフロー - to automate in a text-based, can be built. You probably graphical IT program (for example, Cisco Process Orchestrator _ old but it is powerful that like) It uses too large a tool and it does not fit the recent DevOps which uses best practice of source code management system which can handle very easily with a text editor.
Many people think that Orchestration is the concept of instructing things already built on some of the connection layers (glue layer). Taking in Ansible, Orchestrationはすべての心臓部分である。 - Dealing with your computer resources and services as orchestra. She plays a brass instrument session combined with the violin, plays a string instrument, and comes back to the solo of the first clarinet. It is very natural to describe these.
For this reason, Ansible is 即時のローリングアップデート excel in Continuous Deployment like. - I burn languages a lot of functions to make things very easy. We include concepts such as "host loops", "delegation", rolling updates in the concept from the beginning, and all of them are the first class of languages. The very first use case when we decided to launch the Ansible project was to make rolling update possible for everyone in the conference room. Ansible is less than half of the text on the page, and it can perform rolling update with one click of the button.
Provisioning Do not forget also about. - This phase is used when deploying software, but the more frequently used scenario is when there is no computer resource, including hardware, or when requesting the server where it is necessary to change. This means that you need to deploy the system itself before setting up the infrastructure system and deploying business applications. Ansible has lots of Cloud Provisioning modules. This works for existing ones, a feature that is difficult to implement with a configuration management architecture. On the other hand, Ansible has flexibility and Provisioning can be done easily. As a result, we have 40 modules to manage many kinds of cloud services. Starting with S3 or Elastic Load Balancer on Amazon Web Service, you can even create isolated network in Rackspace Cloud. You may not need this function for Ansible, but many people need it. Many of our users use Ansible to simplify the way to deploy services on cloud providers.
How did you reach the way to cover all the things you have said so far? - In order to become a great configuration management system, Ansibleはたったひとつのユースケースを解決するために作られたわけではなく、また、革新的なプロセスの結果できたものではない . Ansible was created as a hybrid architecture that synthesized three from the beginning, the configuration management system, the application deployment system, and the workflow-based Orchestration system. And it is based on extensive experience on these systems. - Although there are, such as some of the automation system in previous Red Hat (Yakuchu : Mr. DeHaan previous RedHat Niida), in fact, IT automation is a problem that has not yet been resolved. The hybrid approach adopted by Ansible means that any script can be executed with Ansible and it means the model and the 1-system-at-a-time compilation architecture described in CMS. I can not take it. Starting with any bash script, you can gradually move to a wonderful model of the system topology. On the other hand, it is not easy to migrate from the configuration management system to the Orchestration system. What we made is Orchestration (Orchestrator), and it has all attached to it. - Automating any arbitrary IT workflow is part of Ansible's soul.
Of course this is not over. We 全てに対してエージェントを必要としない have adopted approach. This is implemented by implementing unique functions like SSH and (accelareted mode) SSH (using SSH for key exchange), which is a secure transport. An approach that does not require an agent is a secure push-by-default architecture for customers who are concerned about security against business that reduces demands on remote nodes and gives privileges only to necessary people There. This is the reason why Ansible is becoming popular in the area of Big Data. There is no need to think about maintenance, safety, management architecture, because it's a big bonus point. We are not interested in inventing and managing encryption and homemade safety layers. Therefore, it is the safest and reliable, using SSH that many people use for products.
Finally, we 自動化におけるコンピューターサイエンスの領域を少なくしたい . We feel that a lot of people are annoyed by the complex concept of actual applications. We 自動化ツールを忙しい人々のために作りたい have hope. Although I am a developer, I am crazy about the idea that I want to make it easier to deal with complex deployment systems. Ansible's language has been heavily influenced by workflow Orchestrator - many basic functions (currently having over 170 modules in core) - advanced, very easy (loop, condition, Using role, role dependencies etc.) It is a part that combines the basic functions and makes it a big function.
Since everything is text based, you can have a history that you can see what you have done. However, in order to operate on a confirmable system, it is necessary to use a particularly clear language. - This is the reason why Ansible uses 100% pure data format which can read diff line by line. Even if you did not write the automated part, you can read about that infrastructure's behavior and see what happens with that history. The function of Ansible 's architecture enables this. We 言語が一番大事 believe it. Because it is an interface that you touch every day.
Ansible is 多くの用途に使える自動化パイプライン it is. Source-controlled automation is made with living data-oriented perspectives in a world where you click on mouse and not in the world, 100% pure machine readable world. In addition to the hybrid architecture that we created, IT shops that are busy with the idea of Infrastructure-as-code are making it easier to handle codes. I resist this calling "hybrid DevOps", but this is my idea of infrastructure automation.
My idea is this. - Learn from all things of the tools people and teams use in their daily cycles. - And, tools and 実際に起きること will dig between. "Configuration management" and not as something like "application deployment", which take out the application on the outside of the door, if I dare say, ... すべてのものを自動化する .
With translation
Although it is a poor translation,
- Configuration management
- Provisioning
- Orchestration
It is nice to know that it is designed to be able to do all three.
How to use pip (2014/1 version) ==================================
Previous pipの使い方 and say I wrote the article, this is and January of 2011, just three years ago . I have changed so much from now, so I would like to summarize it here again.
The pip version mentioned here is 1.5. If the version of your pip is old pip install -U pip as, please update.
Warning
Major changes : such as from pip 1.5 "pre" Ya "b" has been removed from the target of the package is the default in the search and installation that is attached to the version. Therefore, packages installed with less than 1.5 version of pip may not be included in 1.5.
--pre to quote, you can put these versions.
What is pip?
Pip is a package management system in Python.
Install pip
Tip
pipはcpythonの2.6, 2.7, 3.1, 3.2, 3.3, 3.4のバージョンをサポートしています。 Python 2.5 is no longer available from pip 1.3.1.
Tip
pip 1.5.1からsetuptoolsのインストールが要らなくなりました。get-pip.pyが自動的に入れてくれます。
get-pip.py to download. Please pay attention to security.
Run it with python. Sometimes sudo is necessary.
Code-Block .. :: None
% Python get-pip.py
Or use the package manager of the distribution.
% sudo apt-get install python-pip # debuan/ubuntu
% sudo yum install python-pip # fedora
Help
% pip help # 全体的なhelpを表示
% pip help install # installに関するhelpを表示
% pip help freeze # freezeに関するhelpを表示
Search pypi
pip is Python Package Index(pypi) https://pypi.python.org/pypi>` _ a near Ru package, you can search in the search command.
% pip search pycrypto
pycryptopp - Python wrappers for a few algorithms from theCrypto++ library
pycrypto - Cryptographic modules for Python.
INSTALLED: 2.6.1 (latest)
pycryptopan - A python implementation of Crypto-PAna ip anonymization algorithm
As indicated by INSTALLED, the already installed package will be displayed as such.
- --index <url>
- When searching other than pypi, specify the base URL.
install
Use install.
% pip install pycrypto
Automatically install freeze package
Later-described freeze the package list, which had been writing with the whole and install. The exported package list is just a text file, so you can easily edit it. Even those with dependencies are installed without permission, so it is okay if you leave as much as you need.
% pip freeze > packages_requirements.txt
(必要ならばファイルを編集)
% pip install -r packages_requirements.txt
(複数指定も可能)
% pip install -r basic_requirements.txt -r packages_requirements.txt
By the way, if you want to insert the latest version at the time you hit pip install, you can skip the version.
MarkupSafe
pycrypto
It should be noted that, -–no-deps to quote, exactly the same package and the package was written out list will be installed. This prevents dependencies from installing unexpected packages.
Set proxy
% pip install pycrypto --proxy=http://user@proxy.example.jp:8080
Tip
2014/07追記: "http"が必要になりました。
Install in user directory
In environments where there is no root, you can only install it in the user directory. To do so, add - user.
% pip install pycrypto --user
If you install with --user like this, it will be placed under ~ / .local /. If you do not pass PATH to ~ / .local / bin etc., so be careful.
Install a specific version
You can also specify the version and install it.
% pip install Flask==0.10.1
Please note that since the package file exported with freeze is in this format, the exported version will be installed as the name of freeze.
How to specify the version is more flexible and can do this.
% pip install 'Markdown<2.0'
In this case, if already installed version 2.0 or higher, uninstall it and install version satisfying <2.0.
Also,
% pip install 'Markdown>2.0,<2.0.3'
You can do it.
Install your local repository
You can install it directly from the repository at hand.
% pip install -e .
Install directly from Subversion / git / mercurial / bazaar
It can also be installed directly from a remote repository.
% pip install -e git+https://git.repo/some_pkg.git#egg=SomePackage # git
% pip install -e hg+https://hg.repo/some_pkg#egg=SomePackage # mercurial
% pip install -e svn+svn://svn.repo/some_pkg/trunk/#egg=SomePackage # svn
% pip install -e git+https://git.repo/some_pkg.git@feature#egg=SomePackage # 'feature' ブランチから
# リポジトリのサブディレクトリからインストール
% pip install -e git+https://git.repo/some_repo.git@egg=subdir&subdirectory=subdir_path
Install directly from the archive file
It can be installed directly from tar.gz or zip file.
% pip install ./downloads/SomePackage-1.0.4.tar.gz
% pip install http://my.package.repo/SomePackage-1.0.4.zip
Try it without installing it
You may want to see what is downloaded. In that case, -d use the option.
% pip install pycrypto -d /tmp/
This downloads tar.gz of pycrypto below / tmp.
I want to install it again
-I (--ignore-installed) use.
% pip install pycrypto -I
I want to upgrade
If you are upgrading the once the packages installed, -U (--upgrade) use.
% pip install pycrypto -U
Install from other than pypi
You may want to install from a location other than pypi such as the local mirror environment.
Search and install from the specified location without searching from pypi.
% pip install --index-url http://my.package.repo/simple/ SomePackage
In addition to pypi, specify the location to search and install.
% pip install --extra-index-url http://my.package.repo/simple
Install it locally, from the directory where the file is located flat. --no-index to quote, do not search the index.
% pip install --no-index --find-links=file:///local/dir/ SomePackage
% pip install --no-index --find-links=/local/dir/ SomePackage
% pip install --no-index --find-links=relative/dir/ SomePackage
Install from wheel
You can install the wheel created with the wheel command described later.
% pip install --use-wheel --no-index --find-links=/tmp/wheelhouse pycrypto
I want to use Mirror
Mirror no longer exists. For more details, PEP 449 Please look at the
Show currently installed packages
Use list.
% pip list
ansible (1.4.2)
argparse (1.2.1)
ecdsa (0.10)
Flask (0.10.1)
httplib2 (0.8)
The frequently used options are as follows.
- -o
- Show updateable packages
- -u
- Show package with latest version
- -e
- Show package with install -e
- -l
- In the case of the virtualenv environment, do not display packages installed on the system
- --pre
- Also show beta version
Show packages currently installed for inclusion later
With Freeze, pip install -r in a format that can be installed together in, writes the packages that are currently installed.
% pip freeze > requirementx.txt
To uninstall
Use uninstall.
% pip uninstall pycrypto
By the way, if you add -y, answer yes to all questions. Sometimes useful, please be careful and use it.
I would like to specify requirements file
-r and remove them in bulk only files that began to write in the freeze using.
% pip uninstall -r requirements.txt
I want to know the details of the installed package
Use show.
% pip show pycrypto
- -f
- Displays all the files contained in the package.
Create wheel
Create a wheel. Wheel is a substitute for egg, it is a format for saving pre-built packages, and by using wheel you can save time and effort to compile each time. For more details, Wheelドキュメント Please look at.
The wheel package is required to execute the wheel command. Prior to pip install wheel and, please be sure that you install.
% pip wheel pycrypto # パッケージ名指定
% pip wheel -r requirements.txt # ファイル指定
% pip wheel hg+https://hg.repo/some_pkg # vcs指定
% pip wheel . # ローカル指定
% pip wheel hoge.tar.gz # アーカイブファイル指定
The default is wheelhouse will be made wheel file is created in the following directory called. Because it follows the dependence relationship, there are cases where multiple wheel files are created.
Configuration
Pip uses the following file as the setting file.
- UNIX type and Max OS X are $ HOME / .pip / pip.conf
- Windows% HOME% pip pip.ini
It is a description in ini format.
[global]
timeout = 60
index-url = http://download.zope.org/ppix
Alternatively, if you want to specify the install command
[install]
ignore-installed = true
no-dependencies = yes
find-links =
http://mirror1.example.com
http://mirror2.example.com
You can write. Of course you can write both global and install at the same time.
Environment variable
You can also use environment variables.
% export PIP_FIND_LINKS="http://mirror1.example.com http://mirror2.example.com"
You can capitalize option names like PIP_FIND_LINKS and PIP_DEFAULT_TIMEOUT.
Shell completion
She will write out a script for shell completion.
% pip completion --zsh >> ~/.zprofile
If you are still using bash, you can also use the following options
% pip completion --bash >> ~/.profile
Or you can directly eval without writing it out.
% eval "`pip completion --zsh`"
Read Riak 2.0 Plumtree
Shinohara in the last year of Riak Advent calendar, Riak 2.0 : クラスタ全体のデータ共有を効率 has written an article called.
According to this, in addition to the Riak 2.0 gossip protocol Plumtreeという論文 tree-like, which is implemented on the basis of the A protocol that communicates through the path of.
Has been introduced スライド to have read because the interest has been infested read riak_core I will look at it.
By the way, looking at log, it seems that these changes first entered on August 1, 2013.
Warning
There is much possibility of being wrong by example. We are waiting for your comments.
Tl; dr; --------------
From Riak 2.0 the gossip protocol will be more efficient
10台以上から有効化されるよ01/23 postscript : It was a mistake! Plumtree is always used, and using gossip was the only way to make the first tree.
Term Definition
- Eager
- Node to send immediately
- Lazy
- Sending node list later
- Outstanding
- A message sent by lazy but ack has not returned yet
- Plumtree
- It is a communication protocol that passes through a tree-like route introduced this time. In the original paper I say Epidemic Broadcast Trees and I have not written plumtree everywhere in the code though. (1/14 postscript : had been normally written plumtree in the original paper ... array)
Riak_core_broadcast.erl
If you grep it with eager, this file will get caught. It is the part where the function to broadcast on the whole cluster is implemented.
In this init_peers(Members) There is a function called. This function decides which communication protocol to use.
- When there is only one node
- Since it is only myself, there is nothing special
- When there are two nodes
- Just send each other
- When the node is 2 to 4
- Connect with each other (connect with mesh)
- When the node is 5 to 9
riak_core_gossip.erl で定義されているgossipプロトコルを使うuse the tree structure that is used in gossip as the initial tree- When the node is 10 or more
- Use plumtree
In other words, it is necessary to use more than 10 nodes to use the path on the tree from 2.0. If it is less then gossip will be enough for the original tree structure. (I am worried about where the number 10 is coming from)
It should be noted, init_peers / 1 is handle_cast({ring_update, Ring} ...) ...) * even has been called, you can see that re-determine the communication protocol each time a ring is updated.
init_peers / 1 of the in riak_core_util:build_tree/3 has been called, realities that make the route is like in this.
Riak_core_util: build_tree / 3
Build_tree / 3 builds an N-ary tree from the passed in list. N is 1 for mesh, 2 for gossip, round (math: log (N) + 1) for plumtree.
By the way, the third argument options when calling build_tree / 3 from init_parse / 1 contains [cycles]. If this cycle is included, it will have a link from leaf to the upper node, ie double - linked. (maybe…)
Flat = [1,
11, 12,
111, 112, 121, 122,
1111, 1112, 1121, 1122, 1211, 1212, 1221, 1222],
The list of nodes
CTree = [{1, [ 11, 12]},
{11, [ 111, 112]},
{12, [ 121, 122]},
{111, [1111, 1112]},
{112, [1121, 1122]},
{121, [1211, 1212]},
{122, [1221, 1222]},
{1111, [ 1, 11]},
{1112, [ 12, 111]},
{1121, [ 112, 121]},
{1122, [ 122, 1111]},
{1211, [1112, 1121]},
{1212, [1122, 1211]},
{1221, [1212, 1221]},
{1222, [1222, 1]}],
That's it.
I got an initial Tree, so I will return to broadcast.
When a node actually broadcasts, it calls broadcast / 2.
broadcast(Broadcast, Mod) ->
{MessageId, Payload} = Mod:broadcast_data(Broadcast),
gen_server:cast(?SERVER, {broadcast, MessageId, Payload, Mod}).
Broadcast is data, Mod is module (). This Mod will be executed on all nodes in the ring. Defined in riak_core_broadcast_handler.
Let's see a little bit here.
Riak_core_broadcast_handler.erl
here
- Broadcast_data
- Returns tuple of message id and payload
- Merge
- Receive message locally. It returns false if it has already been received
- Is_stale
- Returns true if a message has already been received
- Graft
- It returns a message from the given message id. This message may already have been sent. In that case stale will be returned
- Exchange
- Trigger to exchange handlers between local and given node. Exchange exchanges messages not on its own with local and remote respectively.
There are five callbacks defined. If you keep this in mind, you will understand well after this.
By the way, it is only with respect to messages already received by exchange, and the next exchange is responsible for the message on the way, which has been sent and has not been received by anyone yet.
Again broadcast.erl
In broadcast, this is called first.
handle_cast({broadcast, MessageId, Message, Mod}, State) ->
State1 = eager_push(MessageId, Message, Mod, State),
State2 = schedule_lazy_push(MessageId, Mod, State1),
{noreply, State2};
Eager_push sends a message to the eager list. Schedule_lazh_push sends a message to the lazy list at a later time.
Eager_push looks at the eager list and sends messages to the target node. This is around here.
%% 一番最初は自分自身から
eager_push(MessageId, Message, Mod, State) ->
eager_push(MessageId, Message, Mod, 0, node(), node(), State).
%% あとはeagerリストに基づいて送っていく
eager_push(MessageId, Message, Mod, Round, Root, From, State) ->
Peers = eager_peers(Root, From, State),
send({broadcast, MessageId, Message, Mod, Round, Root, node()}, Peers),
State.
If you receive broadcast
If you have not received it
handle_cast({broadcast, MessageId, Message, Mod, Round, Root, From}, State) ->
Valid = Mod:merge(MessageId, Message),
State1 = handle_broadcast(Valid, MessageId, Message, Mod, Round, Root, From, State),
{noreply, State1};
First of all, take in locally with merge / 2 described above. If you have not received it yet, add the source to the eager list with add_eager / 3, increase the Round and send it to eager.
handle_broadcast(true, MessageId, Message, Mod, Round, Root, From, State) -> %% valid msg
State1 = add_eager(From, Root, State),
State2 = eager_push(MessageId, Message, Mod, Round+1, Root, From, State1),
schedule_lazy_push(MessageId, Mod, Round+1, Root, From, State2).
If you have received
However, if you have already received it, put it in the lazy list and send prune back.
handle_broadcast(false, _MessageId, _Message, _Mod, _Round, Root, From, State) -> %% stale msg
State1 = add_lazy(From, Root, State),
send({prune, Root, node()}, From),
State1;
When you receive prune, put it in the lazy list.
handle_cast({prune, Root, From}, State) ->
State1 = add_lazy(From, Root, State),
{noreply, State1};
This is the illustration on page 35 of the slide.
Eager list and lazy list
Add it to the eager list and lazy list with add_eager / 3 and add_lazy / 3. If you add eager, remove from lazy. Also, if added to lazy, remove from eager.
add_eager(From, Root, State) ->
update_peers(From, Root, fun ordsets:add_element/2, fun ordsets:del_element/2, State).
add_lazy(From, Root, State) ->
update_peers(From, Root, fun ordsets:del_element/2, fun ordsets:add_element/2, State).
In case of trouble?
If a failure occurs on the way, use the lazy list.
Lazy list
Lazy is a node managed to prepare for failures. Process lazy 1000 ms after schedule_lazy_tick / 0 sends the message first.
schedule_lazy_tick() ->
schedule_tick(lazy_tick, broadcast_lazy_timer, 1000).
Lazy sends various i_have messages to the lazy list with send_lazy / 4 after passing through various things.
send_lazy(MessageId, Mod, Round, Root, Peer) ->
send({i_have, MessageId, Mod, Round, Root, node()}, Peer).
I_have message
When i_have is received, it checks whether it has already received it.
I will not do anything particularly if I have already received it (stale). As a result, there is nothing in normal time.
Otherwise, if you receive a message that you have not seen in i_have, a graft message is sent to the sender.
handle_ihave(false, MessageId, Mod, Round, Root, From, State) -> %% valid i_have
%% TODO: don't graft immediately
send({graft, MessageId, Mod, Round, Root, node()}, From),
add_eager(From, Root, State).
Graft
When graft message is received, graft is tried.
handle_cast({graft, MessageId, Mod, Round, Root, From}, State) ->
Result = Mod:graft(MessageId),
State1 = handle_graft(Result, MessageId, Mod, Round, Root, From, State),
{noreply, State1};
If it has already been received it will return ack_outstanding.
handle_graft(stale, MessageId, Mod, Round, Root, From, State) ->
ack_outstanding(MessageId, Mod, Round, Root, From, State);
When you receive ack, delete the message from outstanding.
If an unreceived message is sent in graft, something is wrong. I will send a message from the beginning again.
handle_graft({ok, Message}, MessageId, Mod, Round, Root, From, State) ->
%% we don't ack outstanding here because the broadcast may fail to be delivered
%% instead we will allow the i_have to be sent once more and let the subsequent
%% ignore serve as the ack.
State1 = add_eager(From, Root, State),
send({broadcast, MessageId, Message, Mod, Round, Root, node()}, From),
State1;
Is outstanding overflowing?
Since outstanding contains messages that have not been sent, I think that it will overflow in time,
When ring_update comes, neighbors_down / 2 is called, remove the nodes that fell from eager and lazy in this, and delete outstanding. So there is no worry overflowing.
Return from failure
If the failed node returns, ring_update will run and it will start over from the first init_peers / 1.
Summary
I read riak_core about the protocol that broadcasts through a tree- like route entering at Riak 2.0.
For example, if you become 1000 or something, it is better not to use gossip but plumtree. (In the original dissertation, the disability ratio and so on are listed in detail)
But erlang is easy to read.
Making erchef
Opscode publish the new Chef server called erchef. It is build from erlang.
This blog is a log of making erchef on Ubuntu 12.04.1.
1st. erlang install
A default ubuntu erlang apt-package is a little bit old. I compile from the source.
% sudo apt-get install make gcc libncurses5-dev libssl-dev \
libssl1.0.0 openssl libstdc++6 libstdc++6-4.6-dev
% curl -O http://www.erlang.org/download/otp_src_R15B02.tar.gz
% gzip -dc otp_src_R15B02.tar.gz | tar xvf -
% cd otp_src_R15B02
% ./configure --prefix=/usr/local/ && make
% sudo make install
Next, compile the reber(dependency management tool).
% git clone git://github.com/basho/rebar.git
% cd rebar
% ./bootstrap
% sudo cp -p rebar /usr/local/bin/
2. make
make.
% git clone git://github.com/opscode/erchef.git
% cd erchef
% make rel
That's it. easy.
I was just tought.
until launching erchef....
3. creating app.config
A configuration file called app.config is required to launching erchef. However, that repository does not contain app.config.
After several googling, I found the omnibus-chef repository.
% sudo apt-get install ruby-bundler rake
% git clone git://github.com/opscode/omnibus-chef.git
% cd omnibus-chef
% bundle install
% mv omnibus.rb.example omnibus.rb
% sudo CHEF_GIT_REV=10.14.4 rake projects:chef-server
But I could not launch this omnibus-chef-server (sorry, I forgot why it could not). So, I use only a attribute and template to build app.config.
Here is the app.config I made after a lot of, a lot of, trial and error.
There are some notes to create app.config.
- use relative path to choose log directory. not Absolute.
- I use '/' as rabbitmq directory. A guest can read/write to there. But it should be change.
- should be change postgres dbname and password.
4. Running erchef
OK. now I got the shiny app.config. Let's launch erchef!
Install dependency.
(solr does not required?)
% apt-get install postgres rabbitmq-server openjdk-7-jre solr-common solr-jetty
Create log directory.
% mkdir -p log/chef-server/erchef/
Use postgres schema (I cutting corner to use postgres user).
% sudo -u postgres createdb opscode_chef
% sudo -u postgres psql opscode_chef -f deps/chef_db/priv/pgsql_schema.sql
Create chef certificate.
% sudo escript bin/bootstrap-chef-server
client <<"admin">> created. Key written to
<<"/etc/chef-server/admin.pem">>
client <<"chef-validator">> created. Key written to
<<"/etc/chef-server/chef-validator.pem">>
client <<"chef-webui">> created. Key written to
<<"/etc/chef-server/chef-webui.pem">>
environment '_default' created
Place app.config under the erchef/etc.
% mv ~/app.config etc
Launch erchef.
% sudo bin/erchef start
Confirm.
% sudo bin/erchef ping
pong
Yah. erchef started. finally.
next, configure chef-client.
% knife configure -i
Overwrite /home/ubuntu/.chef/knife.rb? (Y/N) Y
Please enter the chef server URL: [http://blah.example:4000] http://localhost:4000
Please enter a clientname for the new client: [ubuntu] user1 <-- !Another user!
Please enter the existing admin clientname: [chef-webui]
Please enter the location of the existing admin client's private key:
[/etc/chef/webui.pem] /etc/chef-server/chef-webui.pem
Please enter the validation clientname: [chef-validator]
Please enter the location of the validation key:
[/etc/chef/validation.pem] /etc/chef-server/chef-validator.pem
Please enter the path to a chef repository (or leave blank):
Creating initial API user...
Created client[user1]
Configuration file written to /home/ubuntu/.chef/knife.rb
That's it.
% knife client list
admin
chef-validator
chef-webui
user1
5. However...
I can get list, role, node. However, I still could not upload cookbook or rebuild index. If you have any suggestions, please let me know.
Presentation on Sphinx Conference 2012
We, Sphinx-users-jp held the Sphinx conference at 2012-09-16 in Japan. It might be a first Sphinx conference of the World!
Sphinx conference was a joint conference of the Python Conference 2012. Though, to be honest, I expected not so many people will comes. But there were a lot of standee. wow!
Explore Extensions
I made a presentation "Explore Extensions" which introduces many-many Extensions on the PyPI.
The hand out is placed at here. Almost translated to the poor English. If you can not understand, please contact me.
fluentd PostgreSQL hstore plugin
May be you already know the fluentd ,a very good log collector daemon.
fluentd sends logs with JSON format and have output plugins such as mongod or Amazon S3.
And also, I write the plugin it can output to the PostgreSQL hstore.
HStore is an extension of PostgreSQL which can store information with Key-Value. More about PostgreSQL hstore, see PostgreSQL document .
Here is the repository.
Install plugin
Just type gem install. easy.
% gem install fluent-plugin-pghstore
apache log
For example, if you want to apache access log which is collected using in_tail plugin to postgresql, write this config.
<source>
type tail
path /var/log/apache/access_log_sym
tag apache.access
format apache
</source>
<match apache.*>
type pghstore
database test
</match>
It generates,
tag | time | record
----------------+------------------------+---------------------------------------
{apache,access} | 2012-04-01 22:55:15+09 | "code"=>"200",
"host"=>"XXX.XXX.XXX.XXX", "path"=>"/", "size"=>"2608",
"user"=>"-", "agent"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X
10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.83 Safari/535.11", "method"=>"GET",
"referer"=>"-"
fluetnd tag is splitted with "." and stored in the array. yes, PostgreSQL can manage array.
Ehe record column is hstore type. Above example is a little bit difficult to read, but all key and value is stored.
http
Next, add this config,
<source>
type http
port 9880
and type curl,
curl -F 'json={"log":"hoge"}' "http://localhost:9880/apache.curl"
then
tag | time | record
----------------+------------------------+---------------------------------------
{apache,access} | 2012-04-01 22:55:15+09 | "code"=>"200",
"host"=>"XXX.XXX.XXX.XXX",
{apache,curl} | 2012-04-01 23:28:44+09 | "log"=>"hoge"
Other schema data is inserted to the record column. hstore can add key dynamically, plugin-hstore can manage any type of input plugin.
What do you want to do?
After data inserted into the PostgreSQL, you can everything using SQL.
Getting UserAgent
SELECT
COUNT(*) AS c,
record->'agent'
FROM apache_log
GROUP BY record->'agent'
ORDER BY c;
Access count last 10 minitues
SELECT count(*) FROM apache_log WHERE time > (CURRENT_TIMESTAMP -
interval '10 min')
Statuc code last 10 minitues
SELECT
count(CASE WHEN record->'code' = '200' THEN 1 ELSE NULL END) AS
OK_200,
count(CASE WHEN record->'code' = '301' THEN 1 ELSE NULL END) AS
MOVED_301,
count(CASE WHEN record->'code' = '302' THEN 1 ELSE NULL END) AS
FOUND_302,
count(CASE WHEN record->'code' = '304' THEN 1 ELSE NULL END) AS
NOTMODIFIED_304,
count(CASE WHEN record->'code' = '401' THEN 1 ELSE NULL END) AS
UNAUTHORIZED_401
FROM apache_log
WHERE time > (CURRENT_TIMESTAMP - interval '10 min')
Limitation
However, nested JSON is not allowed such as...
'json={"log":"hoge", "nest":{"a":"hoge", "b":"hige"}}'
This is difficult because hstore itself is not allowed nest. If you really want to use such kind of structure, you may wait the JSON type in the PostgreSQL 9.2.
Since this plugin uses only one connection, there are possiblity to drop the under highly-loaded circumstance. fluentd itself has retry function, I think it never happen, but not sure.
Finally
mongodb is good but PostgreSQL is also good.